from:"Joe Hourcle"

Re: [CODE4LIB] Survey

2012-11-27 Thread Joe Hourcle

On Nov 27, 2012, at 12:20 PM, Karen Coyle wrote:

 Peter,
 
 again I worry about this being self-selecting. People who report on surveys 
 are  the people who report on surveys. A code4lib survey would be nice, 
 but I'm really interested in on the ground troops. And I think the 
 questions would have to be specific to what one does:
 
 - installs and fixes equipment
 - runs updates/backups on ILS
 - writes scripts
 - writes code
 - manages local network
 - modifies ILS tables for local customization
 - creates web pages
 - makes decisions on tech purchasing
 - supervises staff that runs ILS/local network
 
 Well, that's probably a stupid list, but a smarter list could be made. In 
 other words, I would want what you actually do to define whether you are a 
 techie -- not whether you consider yourself a techie (many women demean their 
 own skills -- Oh, I just push a few buttons). [1] I'd like to see it be 
 very broad, and later we can decide if we think modifying ILS tables counts 
 as being a real techie.

I admit, I'm no expert on surveys (I tried doing one once for a class ... got 
shut down for an IRB violation as I said I'd share the results  back with the 
organization we were surveying ... which is pretty sad, as the organization I 
was surveying was the library school itself)

... but you could do a much larger survey, trying to get all people who work in 
libraries, and ask questions about specific IT-related tasks that they might be 
doing, even if they don't self-identify as IT.

Of course, then you might miss those of us who don't work in libraries, but who 
may identify with this group.

... and make sure that whoever does it isn't at an academic institution, to 
avoid that IRB crap.

-Joe

Re: [CODE4LIB] anti-harassment policy for code4lib?

2012-11-26 Thread Joe Hourcle

On Nov 26, 2012, at 5:16 PM, Bess Sadler wrote:

 Why have an official anti-harassment policy for your conference? First, it 
 is necessary (unfortunately). Harassment at conferences is incredibly common 
 - for example, see this timeline 
 (http://geekfeminism.wikia.com/index.php?title=Timeline_of_incidents) of 
 sexist incidents in geek communities. Second, it sets expectations for 
 behavior at the conference. Simply having an anti-harassment policy can 
 prevent harassment all by itself. Third, it encourages people to attend who 
 have had bad experiences at other conferences. Finally, it gives conference 
 staff instructions on how to handle harassment quickly, with the minimum 
 amount of disruption or bad press for your conference.
 
 If the conference already has something like this in place, and I'm just 
 uninformed, please educate me and let's do a better job publicizing it. 
 
 Thanks for considering this suggestion. If the answer is the usual code4lib 
 answer (some variation on Great idea! How are you going to make that 
 happen?) then I hereby nominate myself as a member of the Anti-Harrassment 
 Policy Adoption committee for the code4lib conference. Would anyone else like 
 to join me? 

We had no Anti-Harassment Policy for the DC-Baltimore Perl Workshop as it was 
all covered under our general Code of Conduct:

Don't be an asshole.

I think there was a second line of it, about how we had the right to remove 
people who refused to follow that advice and no refunds would be given.

I might be wrong on the exact language.  The e-mail I found referenced 'Don't 
be a dick', in an attempt to paraphrase the legalese of the Code of Conduct for 
our venue ... but the reference to gender-specific anatomy would be kinda 
sexist in itself.

-Joe

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-02 Thread Joe Hourcle

On Nov 2, 2012, at 2:09 PM, Mita Williams wrote:

 +1 to web-hosting as it gives the ability install one's own software on
 one's domain (which feels great) *and* easy access to shell.
 
 And when web-hosting feels like too much of a barrier to access, sites like
 jsfiddle where you can immediately start adding *and* sharing code is key.
 IMHO the initial appeal of Code Academy was that it removed all barriers to
 getting started.  Getting a laptop's localhost set up is too daunting for a
 first step, I think.

If that's a problem for people, it might be worth looking at the various
*AMP (LAMP, WAMP, MAMP) stacks for an easy install of Apache, mySQL + perl
/ python / php.

We're probably moving away from locally hosted services towards 'the cloud'
for the most part (remember when they used to be called 'service providers'?)
but it's still useful to learn a little something about configuring a
webserver / database / etc.

And it's generally more locked down in the various *AMP stacks than if
you went and installed them individually, so there aren't quite the
same level of problems w/ security.

-Joe

Re: [CODE4LIB] Just Solve the File Format Problem month: can you help?

2012-11-02 Thread Joe Hourcle

On Nov 2, 2012, at 3:48 PM, Roy Tennant wrote:

Um...how is this better/different from already existing sites/efforts
around this?

http://en.wikipedia.org/wiki/List_of_file_formats
http://www.wotsit.org/
http://www.ace.net.nz/tech/TechFileFormat.html
http://www.fileformat.info/

At the very least, this new effort shouldn't start from scratch...

They could also extract a lot of information / links from:

http://www.digitalpreservation.gov/formats/index.shtml

Although, admittedly, it's more intended for creators rather than
those trying to figure out what it is they have. (archaeology?
forensics?)

-Joe

On Fri, Nov 2, 2012 at 2:36 AM, Ed Summers e...@pobox.com wrote:
I imagine you've heard about the Just Solve the Problem month already,
but if not, I thought Chris Rusbridge's email to the
digital-preservation list was a good call for participation in the
project ...

//Ed

-- Forwarded message --
From: Chris Rusbridge c.rusbri...@googlemail.com
Date: Thu, Nov 1, 2012 at 4:00 PM
Subject: Just Solve the File Format Problem month: can you help?
To: digital-preservat...@jiscmail.ac.uk

Some of you will know that Jason Scott, Rogue Archivist, is raising a
citizen's army to attempt to solve the file format problem* in the
month of November, 2012. The work is taking place via a wiki at
http://justsolve.archiveteam.org/index.php/Main_Page, with a band of
volunteers (you need to register to make changes to the wiki, by
sending a username and email address to justso...@textfiles.com). I've
added a few formats and groups of formats myself (at least as
skeletons or empty placeholders).

The best form of help is for some of you who know more about rarer
data formats to register and help by editing the wiki yourself. It's
pretty easy; I've never used MediaWiki before, and everything I've
done so far has been by finding something like it and adapting the
wiki source. Other people can make it beautiful and standardised later
on!

If you can't do that, you could email me information about missing
data formats. This should include as much as possible of:

- name, and what it's for (ie brief description)
- web site with some authoritative information
- web site with some examples, etc.

Let's try and capture ALL these formats. As Jason says in his own
inimitable way Let's make that goddam army!.

* Note, the problem is only vaguely defined, and after some angst
(eg see
http://unsustainableideas.wordpress.com/2012/07/04/the-solution-is-42-what-was-the-problem/),
I think that's OK. Gathering a huge amount of information about file
formats in one place will be a BIG HELP.

--
Chris Rusbridge
Mobile: +44 791 7423828
Email: c.rusbri...@gmail.com
Adopt the email charter! http://emailcharter.org/

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Joe Hourcle

On Nov 1, 2012, at 5:02 PM, Ethan Gruber wrote:

 Google is more useful than any reference book to find answers to
 programming problems.

Too bad they got rid of codesearch.



On Nov 1, 2012, at 5:06 PM, Nate Hill wrote:

 Huh.  Michael, I'd love to know more about why I should care about SASS.
 I kinda like writing CSS.
 I see why LESS http://lesscss.org/ makes sense, but help me under stand why
 SASS does?

For the most part, using *any* CSS pre-processor is better than not
using one. 

LESS's problem was that it's javascript based ... so if they have
JS off ... you've got nothing.  And it's got to be done for each user,
rather than re-generate the files after you've made a modification.
You can get around this with the 'lessc' compiler, and serve valid
css files rather than having each client have to do the processing.

They've also got different syntaxes, so it's really up to which one
makes sense to you.  

Functionality wise ... I think they're about equal these days.  I suspect
that if one comes up with a useful new feature, the other group will copy
it.



On Nov 1, 2012, at 5:21 PM, Suchy, Daniel wrote:

 I can already feel the collective rolling of eyes for this, but what about
 Twitter? It's not a guide or manual, but start following and engaging
 talented developers and library geeks on Twitter and you'll soon have more
 help than you know what to do with.  Plus, no Zoia ;)


Too much misinformation:

http://twitter.com/danhooker/status/5630099300



On Nov 1, 2012, at 5:06 PM, Kam Woods wrote:

 foss4lib is a good resource that I'm sure many use, but isn't (as far as I
 can tell) linked anywhere on the current code4lib site. How would this
 differentiate itself from that?

The best tool isn't necessarily free or open source.  (and it isn't necessarily
software).

So that being said ...

my whiteboard.  And a digital camera ... none of that 'smartboard' crap.


-Joe

Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?

2012-11-01 Thread Joe Hourcle

On Nov 1, 2012, at 6:56 PM, Kam Woods wrote:

 Apologies, everyone (and especially Bohyun). You may still want to consider
 pointing people to foss4lib as a useful resource, but amend it with the
 following statement:
 
 Free and open source tools may not be the best tools. You might not even
 NEED software to handle whatever problem you have. Please consider
 contacting onei...@grace.nascom.nasa.gov for further insight.

Oh ... sure... just get me in trouble ...

We're supposed to use our 'OneNASA' e-mail address, so you'd have to change
it to

joseph.a.hour...@nasa.gov

... and I said that in part as I've been in the past a beta tester for 
BareBones's BBEdit.  If you're not doing HTML work, TextWrangler will
probably do what you need (which is ... whatever the 'free' is that
isn't 'libre')

And there's plenty of other good software out there that isn't free,
and there's lots of free software out there that's crap (some of which
I might've been involved with).


 Personally, I was unaware of either of these issues. It's a good thing I
 came here today for some edification.

Yes.  'smart' whiteboards are over priced crap.  I hope I've educated
everyone today.

-Joe

[CODE4LIB] Crappy AJAX (was: [CODE4LIB] Q: Discovery products and authentication (esp Summon))

2012-10-25 Thread Joe Hourcle

On Oct 25, 2012, at 6:46 AM, Gary McGath wrote:
 On 10/24/12 8:58 PM, Ross Singer wrote:
 On Oct 24, 2012, at 6:06 PM, Gary McGath develo...@mcgath.com wrote:
 On 10/24/12 4:00 PM, Ross Singer wrote:
 On Oct 24, 2012, at 3:48 PM, Gary McGath develo...@mcgath.com wrote:


 Also, why wouldn't your AJAX-enabled app be prepared for such an event?
 
 Are you asking how an AJAX-enabled application can handle such cases?
 
 No, I know how an AJAX-enabled application should handle such cases, 
 I'm saying why, if you're implementing an AJAX-enabled application,
 why you think this would be an issue.  Because I just don't see this
 being an issue.
 
 This has always been a tricky thing to explain; it's not just you, if
 that's any consolation. Someday I'll figure out how to make it clear on
 the first try. The point is that if a service redirects to a login page,
 it assumes the browser can display the login page. Normally this is
 true, but only if the resource would be delivered as a web page. AJAX
 components are received as elements, not pages.
 
 If you like, I can go into more detail off-list. This is really too much
 of a side technical issue to be worth taking up a lot of space on the list.

You didn't answer the question -- why would you not have some sort of
check on the AJAX application (or any application, web or otherwise)
to do at least minimal sanity checking on the result of an external
call?

In the case of something requiring authentication, if it's a well
designed back-end, it should return some HTTP status other than 200;
401 or 403 would be most appropriate.  I've unfortunately worked with
ColdFusion in the early days before they added cfheader to allow you
to change the status code so that it was something other than 200.

I've also seen websites that cheat to install a 'handler' for all
requests by linking to a PHP script using Apache's 404 ErrorHandler
directive.  This also has the side effect that search engines won't
index your site at all (as they assume it's all errors)

In both of these cases, I'd say the service is poorly designed if you
can't easily identify a failure.  You can send a login page along with
your 401 status, but you *should* *not* send a 30x redirect to a login
page, as then the actual status message is lost.  (the content hasn't
been moved ... you just want someone to go to the login page ... the
HTTP specs don't forbid a Location field w/ a 40x status, although I
admit I've never verified that major browsers support it)

If you have something pulling in content using something AJAX-like,
and it *doesn't* check the result, then the client's poorly designed
as well.  It might be something as simple as checking to ensure that
expected elements are included in the response.

The only valid example that I can think of where you may have blind
inclusion (ie, you don't have a chance to verify what the results are
before displaying) are frames (including iframes) and image links.
I'm assuming we've all see those horrible websites that have a 
'authentication required' message for every frame, but images
are a little more subtle.

The best thing to do for images is to serve an image back in
response, rather than HTML.  It's not a new thing; I remember
doing it back when I worked for a university in the mid 1990s.
We had your standard 'image-counter' CGI ... but when we realized
that the majority of HTTP-referers were from outside the university,
it was changed to instead return an image that said 'access denied'
or something similar.

-Joe

Re: [CODE4LIB] Crappy AJAX

2012-10-25 Thread Joe Hourcle

On Oct 25, 2012, at 9:20 AM, Gary McGath wrote:

 On 10/25/12 7:37 AM, Joe Hourcle wrote:
 
 You didn't answer the question -- why would you not have some sort of
 check on the AJAX application (or any application, web or otherwise)
 to do at least minimal sanity checking on the result of an external
 call?
 
 Because putting the onus of sanity checking on the web page isn't the
 best solution in this case. Of course, it should be set up to handle
 unexpected results sensibly in any case.

I view it like using JavaScript for form validation -- don't trust it,
and still re-do the validation in the backend.

If the costs to check tainted inputs are minimal, *do* *it*.  Even
when the back-end is well designed, there are enough other things
out there that are outside your control.

... like when IE decided to start re-writing 404 and other status
pages unless they happened to be at least 1k ... so even when we *were*
giving informative messages about what was going on, links to report
the problem, etc ... it never made it back to the user.

(and yes, I know, I've officially hit old fogey status by complaining
about changes that IE made more than 10 years ago ... I'm also not a
fan of the br tag ... one of the worst mistakes of HTML+)

But for more recent situations ... mobile browsers w/ spotty reception.
Man-in-the-middle attacks ... deep-packet filtering (the firewall
doesn't like some phrase used in the response, so replaces the content
with a 'blocked' message ... they may not be common, but they *do*
happen.

-Joe

Re: [CODE4LIB] Crappy AJAX

2012-10-25 Thread Joe Hourcle


On Thu, 25 Oct 2012, Chris Fitzpatrick wrote:


http://en.m.wikipedia.org/wiki/Sayre's_law



I'm guessing the other people participating in this thread have never had 
men with guns show up to take your server because of a 'security 
incident'.



Or block your server's IP address, and then make you jump through hoops 
for two weeks because they were unhappy with someone uploading an image to 
your trouble ticket system that accepted anonymous submissions ... with 
the explaination that if they managed to get a file on there, the whole 
system was compromised, and had to be blanked and the OS reinstalled.
... it didn't help that the image was text saying something to the effect 
of 'I've hacked your computer'.  And they didn't realize at the time it 
actually had a JPEG exploit in it, so it was the people who downloaded it 
could've been compromised, but it wasn't even a valid exploit against the 
OS we were running.



Or have all of the sysadmins in your group stop work for a day while we 
have a comprehensive scan of all of our machines by the security group 
because someone on the security auditing group noticed that a machine on 
our network sent out a request to some random webserver in the middle of 
the night, and then there was a connection attempted back to that machine 
and another one on our network. ... but they failed to mention was that 
the connection back was from a completely different IP range, and they had 
selectively filtered what they were looking for, so the incoming 
connections were attempted against *all* machines on our network and not a 
sign that someone was being selective in their attempts and cause for 
concern ... and the 'middle of the night' just meant 'before we got in 
this morning', but we have folks who have to work earlier shifts depending 
on when we get assigned antenna time to talk to the spacecraft.



... it makes the people who e-mail convinced that NASA's hiding evidence 
of the existance of alien life seem reasonable by comparison.*


So I actually *do* have a stake in validating what we use as inputs. 
Other people might not, but I do my best to avoid a DOS from our security 
group.**


-Joe


* They don't like that we get highly compressed data for 'space weather'
  purposes, and we replace them with a higher-quality image once it's been
  downloaded through a higher bandwidth link.  They also seem convinced
  that a compression artifact must be at the same distance from us as the
  sun for their size and speed calculations, rather than highly energetic
  particles right at the telescope.

** I've got other stories, too ... but I thought I'd keep it to only the
   ones that actually affected me.

Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)

2012-10-24 Thread Joe Hourcle

On Oct 24, 2012, at 2:40 PM, Jonathan Rochkind wrote:

 On 10/24/2012 2:04 PM, Ben Florin wrote:
 We use Primo, but we've never bothered with their restricted search scopes.
 
 Apparently the answer to my question is that nobody has thought about this 
 before, heh.
 
 Primo, by default, will suppress some content from end-users unless they are 
 authenticated, no?  Maybe that's what restricted search scopes are? I'm not 
 talking about your locally indexed content, but about the PrimoCentral 
 index of scholarly articles.
 
 At least I know the Primo API requires you to tell it if end-users are 
 authenticated or not, and suppresses some results if they are not. I assume 
 Primo 'default' interface must have the same restrictions?
 
 Perhaps the answer to my question is that at most discovery customers, 
 off-campus users always get the 'restricted' search results, have no real way 
 to authenticate, and nobody's noticed yet!



Do they even get a message that they've been restricted?

I would think that having a message such as :

74 records not shown because you weren't authenticated

would be enough to spur most folks to log in.

What I hate is when you do a search for something that you *know* should be 
there, and it's not ... then you find out that they're using IP range or DNS 
matching, and not telling the user that they've intentionally hid stuff.  I 
think I've gotten most of the stuff straightened out with our local library, 
but I have no way of knowing for sure.

(my desktop machine's doesn't resolve in the 'gsfc.nasa.gov' domain, and not on 
the most common network here ... so most systems' test for 'is this a local 
person' fail, and I get treated as an outsider ... I actually get better 
service using my personal laptop on the wireless network for visitors)

-Joe

Re: [CODE4LIB] Event Registration System Question

2012-10-19 Thread Joe Hourcle

On Oct 18, 2012, at 6:08 PM, Brian McBride wrote:

 Greeting!
 
 I was wondering if anyone out there has found or knows of a good open source 
 solution for event scheduling? We would need users to be able to register, 
 allow instructors to set enrollment caps, and basic email reminder functions. 
 Any information would be great!

I know there have been a lot of suggestions already, but I'd have to ask what 
the scope of the system is.

Most of the ones that I know if are for conferences; you effectively need to 
stand up a new instance of the software for each event.  Some are designed for 
hosting purposes (eg, the perl ACT software), and may have features like using 
a single registrant table, so that you don't have to set up a new login for 
each conference.  (in the case of ACT, it also allows you to see what other 
conferences someone's attended ... but there's a separate 'user page' for each 
conference which shows which sessions they're planning to attend, so it's most 
likely not 100% what you'd want)


For a library system, particularly an academic one, I'd assume that this isn't 
for a single event, but for lots of events (eg, there's a 'intro to (excel) 
class on the (first tuesday) of the month', but the instructors may change).

If this is what you're looking for, it might be easier to look into class or 
room scheduling software for schools, and add whatever additional functionality 
you might need.

... and conveniently, when searching for 'open source room scheduling 
software', a code4lib journal article popped up:

http://journal.code4lib.org/articles/2941


You also get quite a few hits for 'open source classroom scheduling software', 
which may have more of the features you're looking for (eg, managing the 
individual class registrations vs. just managing the room allocation) 

... but of course, search engine hits don't actually mean they're necessarily 
good, just that the exist, so it's probably worth explaining what you're 
looking for, so that the other folks on the mailing list can give 
recommendations.

-Joe

Re: [CODE4LIB] email to FTP or something?

2012-10-17 Thread Joe Hourcle

On Oct 17, 2012, at 11:46 AM, Nate Hill wrote:

 Maybe someone can offer me a suggestion here...
 I bought a nifty new gadget that records data and spits out csv files as
 email attachments.
 I want to go from csv  MySQL and build a web application to do cool stuff
 with the data.
 The thing is, the device can only email the files as attachments, it
 doesn't give me the ability to upload them to a server.
 Can anyone suggest how I can securely email a file directly to a folder on
 a server?
 
 The scenario is nearly identical to what is described here:
 http://www.quora.com/How-can-I-upload-to-an-FTP-site-via-email


It depends if you're hosting the mail server or not.  If you are,
and it's a unix box, you change your .forward file to pipe into
a program to do the processing, eg:

|/path/to/program


If you're already using procmail for local mail delivery, you
can do more complex things with a .procmailrc file.  (eg, only
pass along to the processing program messages that match
certain characteristics):

http://www.procmail.org/


If you're not hosting your own mail server, you might be able
to cobble something together with fetchmail, which retrieves
mail from IMAP or *POP* services and then processes it for
local delivery:

http://www.fetchmail.info/

-Joe

Re: [CODE4LIB] email to FTP or something?

2012-10-17 Thread Joe Hourcle

On Oct 17, 2012, at 12:15 PM, Cary Gordon wrote:

 The securely part is a gotcha. I would venture a guess that whatever
 the gadget does to produce emails doesn't include encryption or key
 verification.

What do you qualify as 'securely'?

You scan the message  attachment to make sure it's valid, process it, and then 
either put it in place (if local) or scp over to the server that's doing the 
hosting.

If you're concerned about the e-mail itself being unsecure, then you have to 
look into what protocols the appliance supports.  If it does ASMTP 
(Authenticated SMTP) over TLS, then you're fine:

http://www.ietf.org/rfc/rfc2554.txt

If it doesn't, well, then you set up a local mail relay that's firewalled off 
so that only the appliance can talk to it, and have that one do the processing 
/ transfer.

...

We used to use these sorts of things at the university where I used to work.

One would process the class schedules (generated as a nightly report from the 
registration system), and make a series of pages for gopher (later modified to 
generate HTML).  Another was used so that authorized users could modify the 
'university status' message (eg, closed due to snow) years before there were 
protocols such as webdav.

It's also quite useful for generating status pages based on cronjob messages.

-Joe




 On Wed, Oct 17, 2012 at 9:05 AM, Joe Hourcle
 onei...@grace.nascom.nasa.gov wrote:
 On Oct 17, 2012, at 11:46 AM, Nate Hill wrote:
 
 Maybe someone can offer me a suggestion here...
 I bought a nifty new gadget that records data and spits out csv files as
 email attachments.
 I want to go from csv  MySQL and build a web application to do cool stuff
 with the data.
 The thing is, the device can only email the files as attachments, it
 doesn't give me the ability to upload them to a server.
 Can anyone suggest how I can securely email a file directly to a folder on
 a server?
 
 The scenario is nearly identical to what is described here:
 http://www.quora.com/How-can-I-upload-to-an-FTP-site-via-email
 
 
 It depends if you're hosting the mail server or not.  If you are,
 and it's a unix box, you change your .forward file to pipe into
 a program to do the processing, eg:
 
|/path/to/program
 
 
 If you're already using procmail for local mail delivery, you
 can do more complex things with a .procmailrc file.  (eg, only
 pass along to the processing program messages that match
 certain characteristics):
 
http://www.procmail.org/
 
 
 If you're not hosting your own mail server, you might be able
 to cobble something together with fetchmail, which retrieves
 mail from IMAP or *POP* services and then processes it for
 local delivery:
 
http://www.fetchmail.info/
 
 -Joe
 
 
 
 
 
 -- 
 Cary Gordon
 The Cherry Hill Company
 http://chillco.com

[CODE4LIB] Job : Senior Software Engineer (mostly Perl / SOAP work)

2012-10-10 Thread Joe Hourcle

For those of you who saw Mitzi's job announcement, but are more of
a backend person rather than a web developer*, my group has a job
opening on the other side of the building**, writing connectors for
the Virtual Solar Observatory, a distributed federated search
system for solar physics data:

http://www.sesda3.com/careers/ss062-senior-software-engineer/

The quick summary of the main task:

Most of the existing system's in Perl, using SOAP::Lite.

Most of the catalogs are in MySQL or PostgreSQL.

Much of the issues are reconciling data models, so
having a physics or other science background is
useful.


Pros:

Pretty laid back environment.

Working for NASA.

Learn about the sun.

Working with interesting people.


Cons:

Can be aggressively laid back if you don't conform (I was threatened
with bodily harm in my first week if I continued to wear ties, even
though they featured cartoon characters ... I still don't understand
how someone couldn't appreciate a Dogbert tie)

And it's only laid back in some regards; anything that might affect
a spacecraft or human spaceflight is taken *really* seriously; men
with guns have been known to show up and seize machines when we have
security breaches.

Trying to explain to your grandmother the difference between working
for a contractor at a NASA center, and actually directly working as
a civil servant.

Dealing with bureaucratic rules that make no sense (which our boss
does his best to shield us from) and having to do tons of extra work
when Congress threatens to shut down the government (see con #2).

Hour long phone calls with your grandmother explaining that no, the
sun is not going to blow up this year, and how unrealistic it is
that the Mayans were able to pinpoint to a specific day more than
a millennia ago when we can't be sure if it's going to rain next
Tuesday.

Interesting people occasionally involves scientists who are
convinced their PhD makes them an expert in *everything* including
your job (see http://xkcd.com/793/ )... and some of them write
code that you have to interface with.

You'd have to work with me.

I can answer questions about the work that needs to be done, the
group you'd work with, stuff like that.

Everything else has to go through ADNET HR.  (I couldn't even tell you
about the benefits, as I work for one of the sub-contractors)

-Joe

* Although, I wouldn't mind a web developer; our site's been in
  need of some work for years, but that's another long story.  Those
  skills were in the 'preferred' list that I was told that I should
  not have titled 'minion wishlist'

** ie, Goddard Space Flight Center, Greenbelt, MD.  But we're a little
  more relaxed in that we'll accept U.S. citizens *or* permanent
  residents.


-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?

2012-09-04 Thread Joe Hourcle

On Sep 4, 2012, at 10:48 AM, Matthew LeVan wrote:

 It's like a google search challenge!  Looks like they changed their student
 home link patterns...
 
 http://home.ubalt.edu/nicole.kerber/idia642/Final_Usability_Report.pdf


That's a challenge?

http://www.google.com/search?q=Final_Usability_report.pdf+site:ubalt.edu

(although my normal first step would've been archive.org, but they didn't
have it in their cache)

-Joe


 On Tue, Sep 4, 2012 at 10:44 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
 Hi helpful code4lib community, at one point there was a report online at:
 
 http://student-iat.ubalt.edu/**students/kerber_n/idia642/**
 Final_Usability_Report.pdfhttp://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usability_Report.pdf
 
 David Walker tells me the report at that location included findings about
 SFX and/or other link resolvers.
 
 I'm really interested in reading it. But it's gone from that location, and
 I'm not sure if it's somewhere else (I don't have a title/author to search
 for other than that URL, which is not in google cache or internet archive).
 
 Is anyone reading this familiar with the report? Perhaps one of the
 authors is reading this, or someone reading it knows one of the authors and
 can be put me in touch?  Or knows someone likely in the relevant dept at
 ubalt and can be put me in touch? Or has any other information about this
 report or ways to get it?
 
 Thanks!
 
 Jonathan

Re: [CODE4LIB] looking for an application to handle a large amount of redirects

2012-08-30 Thread Joe Hourcle

On Aug 30, 2012, at 2:17 PM, John A. Kunze wrote:

 If you run Apache server at the old location, and the original links and
 new links obey one or a few regular patterns, you could use one or a few
 RedirectMatch directives.
 
 If there are few patterns, you could use a big enumerated list of simple
 Redirect directives.  An ordinary Apache server can easily be loaded with
 a million directives; a little slow on 'restart', but very short redirect
 times when it comes up.  If you need more than that, you could look into
 installing a noid resolver. (google CPAN noid)

Yet another alternative in Apache when you're dealing with a larger number of 
items to redirect (and it's not simple directories moving about) is 
mod_rewrite's RewriteMap:

http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritemap

-Joe


 --- On Thu, 30 Aug 2012, Pottinger, Hardy J. wrote:
 Hi, we're in the process of migrating an existing digital library to a new
 platform, and we want to ensure that old URLs continue to resolve to the
 items in the new location. The new digital library will be built on
 Islandora, and I am pretty sure we can just map old URLs to new ones
 within Fedora Commons. But, in case we run into trouble, I was wondering
 if anyone might have some experience with an application that's more
 specific to our use case? Or, heck, if you have already migrated from a
 DLXS-based digital library to an Islandora-based digital library, and have
 already sorted out how to handle redirects, I'd love to hear from you.
 Thanks!
 
 --
 HARDY POTTINGER pottinge...@umsystem.edu
 University of Missouri Library Systems
 http://lso.umsystem.edu/~pottingerhj/
 https://MOspace.umsystem.edu/
 Debug only code. Comments lie.

Re: [CODE4LIB] Maker Spaces and Academic Libraries

2012-08-28 Thread Joe Hourcle

On Aug 28, 2012, at 9:07 AM, Emily Lynema wrote:

 I find this conversation interesting, mostly because the why do it
 reasons given parallel so closely what we are working on at NC State in our
 new library building. Except it doesn't have anything to do with
 makerspaces!
 
 Our emphasis is on taking expensive visualization and high performance
 computing capacity and making it available to students all across our
 campus. Some would ask why we are building massive visualization walls and
 working on creating a cloud computing environment where anyone can request
 temporary access to high performance computing in order to build stuff to
 render on the visualization walls. And it's just the same as the reason
 given for doing makerspaces in academic libraries: while faculty on fancy
 grant projects have access to high performance computing nodes, nowhere on
 campus is this kind of computing and visualization openly available for
 undergraduate students to creatively use.
 
 It's neat to see the different directions we go with the same underlying
 reason.

And in that regard, (high performance computing), I heard an interesting
story from someone who I think was from JHU Physics dept. a year or so
ago --

Basically, all of the professors were building their own personal beowulf
clusters (getting the money as either part of their condition on hire, or
using grant money to buy them) which caused a number of problems:

1. They weren't experts, so it'd take them a while to set up.
2. They typically didn't secure them properly, so they'd get hacked,
  and they had to take them down, and often didn't get them back up
  for many months, up to a year from original purchase 'til it was
  finally running at full tilt.
  (ie, it had already depreciated by a year)
3. So many clusters were built, that it overloaded the electrical
  in the building, and the whole building lost power.

...

So there really are some benefits to having a centralized cluster that
the faculty can submit jobs to, rather than all of the little ones.

The visualization stuff may be even more useful, as they're quite 
uncommon.  Besides some of the 'hiperwall' and 'cave' systems, there
was a project from one of the Harvard libraries on using a Microsoft
Surface (the table, not the yet-to-be-released table) for working with
huge images (telescope data, hi-res scans, etc.)

http://projects.iq.harvard.edu/harvardux/

-Joe

Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google

2012-08-28 Thread Joe Hourcle

On Aug 28, 2012, at 12:05 PM, Galen Charlton wrote:

 Hi,
 
 On 08/27/2012 04:36 PM, Karen Coyle wrote:
 I also assumed that Ed wasn't suggesting that we literally use github as
 our platform, but I do want to remind folks how far we are from having
 people friendly versioning software -- at least, none that I have seen
 has felt intuitive. The features of git are great, and people have
 built interfaces to it, but as Galen's question brings forth, the very
 *idea* of versioning doesn't exist in library data processing, even
 though having central-system based versions of MARC records (with a
 single time line) is at least conceptually simple.
 
 What's interesting, however, is that at least a couple parts of the concept 
 of distributed version control, viewed broadly, have been used in traditional 
 library cataloging.
 
 For example, RLIN had a concept of a cluster of MARC records for the same 
 title, with each library having their own record in the cluster.  I don't 
 know if RLIN kept track of previous versions of a library's record in a 
 cluster as it got edited, but it means that there was the concept of a 
 spatial distribution of record versions if not a temporal one. I've never 
 used RLIN myself, but I'd be curious to know if it provided any tools to 
 readily compare records in the same cluster and if there were any mechanisms 
 (formal or informal) for a library to grab improvements from another 
 library's record and apply it to their own.
 
 As another example, the MARC cataloging source field has long been used, 
 particularly in central utilities, to record institution-level attribution 
 for changes to a MARC record.  I think that's mostly been used by catalogers 
 to help decide which version of a record to start from when copy cataloging, 
 but I suppose it's possible that some catalogers were also looking at the 
 list of modifying agencies (library A touched this record and is 
 particularly good at subject analysis, so I'll grab their 650s).

I seem to recall seeing a presentation a couple of years ago from someone in 
the intelligence community, where they'd keep all of their intelligence, but 
they stored RDF quads so they could track the source.

They'd then assign a confidence level to each source, so they could get an 
overall level of confidence on their inferences.

... it'd get a bit messier if you have to do some sort of analysis of which 
sources are good for what type of information, but it might be a start.

Unfortunately, I'm not having luck finding the reference again.

It's possible that it was in the context of provenance, but I'm getting bogged 
down in too many articles about people storing provenance information using 
RDF-triples (without actually tracking the provenance of the triple itself)

-Joe

ps.  I just realized this discussion's been on CODE4LIB, and not NGC4LIB ... 
would it make sense to move it over there?

Re: [CODE4LIB] Maker Spaces and Academic Libraries

2012-08-27 Thread Joe Hourcle

On Aug 27, 2012, at 9:44 AM, BWS Johnson wrote:

 Salvete!
 
 Can't. Resist. Bait. Batman.
 
 
 Can anyone on the list help clarify for me why, in an academic setting,
 this kind of equipment and facility isn't part of a laboratory in an
 academic department?
 
 
 I'd say that I hate to play devil's advocate, but that would be a patent 
 misrepresentation of material fact.
 
 Conversely, could you please tell us why you think it *shouldn't* be at 
 the Library?


I can think of one reason they shouldn't be *anywhere*:  liability.

When I was working on my undergrad, in civil engineering, the university's 
science and engineering school had their own machine shop.

Officially, you were only supposed to use it if you were a grad student, or 
supervised by a grad student.

Yet, there were a number of us (the undergrad population) who had more 
experience than the grad students.  (I had done a couple years of shop class 
during high school, one of the other students had learned from his father who 
worked in the trade, another was going back to school after having been a 
professional machinist for years,  etc.).

So well, I know at least two of us would go down and use the shop without 
supervision.  (and in a few cases, all alone, which is another violation when 
you're working at 1am and there's no one to call for medical assistance should 
something go really, really wrong).

And in some cases, we'd teach the grad students who were doing stuff wrong 
(trying to take off too much material in a pass, using the incorrect tools, 
etc.  But I made just as many mistakes.  (when you're in a true machine shop, 
and there's two different blades for the bandsaw with different TPI, it's not 
that one's for metal and one's for wood ... as they don't do wood cutting there 
... but I must've broken and re-welded the blade a half dozen times and gone 
through a quart of cutting fluid to make only a few cuts, as I didn't realize 
that I should've been using the lower TPI blade for cutting aluminum)


I admit I don't know enough about these 'maker spaces' ... I assume there'd 
have to be some training / certification before using the equipment.  The other 
option would be to treat it more like a print shop, where someone drops off 
their item to be printed, and then comes back to pick it up after the job's 
been run.

And it's possible that you're using less dangerous equipment.  (eg, when in 
high school, my senior year we got a new principal who required that all 
teachers wear ties ... including the shop teachers.  Have you ever seen what 
happens when a tie gets caught in a lathe or a printing press?  He's lucky the 
teachers were experienced, as a simple mistake could've killed them)

But even something as simple as a polishing/grinding wheel could be a hazard to 
both the person using it and anyone around them.  (I remember one of my high 
school shop teachers not happy that I was so aggressive when grinding down some 
steel, as I was spraying sparks near his desk ... which could've started a fire)

... so the whole issue of making sure that no one gets injured / killed / 
damages others is one of the liability issues, but I also remember when I 
worked for the university computer lab, we had a scanner that you could sign up 
to use.  One day, one of the university police saw what one of the students was 
doing, and insisted that we were allowing students to make fake IDs.  (the 
student in question had scanned in a CD cover, which was a distorted drivers 
license looking thing ... if he was trying to make a fake ID, you'd think he'd 
have started from a genuine ID card) 

As we've now got people who are printing gun receivers, there's a real 
possibility that people could be printing stuff that might be in violation of 
the law.  (I won't get into the issue of if it's a stupid law or not ... this 
is something the legal department needs to weigh in on).  And conversely, if 
you're a public institution and you censor what people are allowed to make, 
then you get into first amendment issues.

...

On a completely unrelated note, when I first saw the question about libraries  
maker spaces, I was thinking in the context of public libraries, and thought 
the idea was pretty strange.  I see a much better fit for academic libraries, 
but I'm still not 100% sold on it.  In part, I know that it's already possible 
to get a lot of stuff 'made' at most universities, but you risk treading on 
certain trade's toes, which could piss off the unions.  Eg, we had a sign shop 
who had some CNC cutters for sheet goods (this was the mid 1990s), carpenters 
and such under the building maintenance, large scale printing and book binding 
through the university graphics department (they later outsourced the larger 
jobs, got rid of the binding equipment).

I could see the equipment being of use to these groups, but I don't know that 
they'd be happy if their lack of control over being able to make money by 
charging for

[CODE4LIB] Software/service to deal with matching up incomplete DVD/CD sets.

2012-08-23 Thread Joe Hourcle

So yesterday, I noticced a question on the libraries  info science stack 
exchange site on dealing with TV seres ... which led me to post a question 
about dealing with trying to match up libraries with incomplete sets of 
multi-disk packages:


http://libraries.stackexchange.com/q/1051/62

So far, the only response has been from someone who said that they use a 
a shared Google docs file for this.


I'm thinking that some software to better manage this could be useful to 
library consortia, multi-branch systems, etc.


So, a few questions for this group:

1. Does anyone know of software specifically designed for doing this?
   (if so, you can probably just answer it on the site)

2. Can anyone suggest existing software that might be able to be
   repurposed to handle this?
   (I've never used the various commercial DVD/book swapping sites, but
   I'm guessing it'd be a similar approach ... although maybe make it
   specifically track by ISBN so we don't get a 'special edition' mixed in
   with a 'regular' edition or widescreen vs. full screen)

3. Would anyone be interested in helping to build it?  (my time's rather
   scarse at this time ... if I manage to loose the election for AGU ESSI
   secretary, I might get a little time back, but once the new year rolls
   around, I'm going to barely be keeping my head above water ... I *am*
   willing/able to fund the hosting service and such, though)
   (I guess just reply directly to me for this one)

4. And to judge demand -- would people be interested in using it if it did
   exist? ... if so, let me know, as I'd need to spec out what the
   requirements are.  (eg, if it should be individual instances for
   different library systems, one big system open to all (with some
   confirmation the registered users work for libraries), or some larger
   system w/ rules set by the offerer on who they'll share with (only in
   this state, only in my consortia, etc.)


-Joe


(I'd attach my .sig, but this really has nothing to do with my day job ... 
although, there was that proposal for a 'tool exchange' at NASA that won 
the whitehouse SAVE award last year, and it could be construed as a 
similar concept ... is just won't help the local library, as they're 
dropping all of their physical items)

Re: [CODE4LIB] Intuitive Dual Boot on Mac (Mountain Lion)

2012-08-17 Thread Joe Hourcle

On Aug 17, 2012, at 2:50 PM, Ingersoll, Ryan wrote:

 Hi everyone,
 
 I am imaging and configuring 20 MacBook Pros for student check out. They will 
 have the option to dual boot (Mountain Lion or Windows 7). I am looking for 
 an intuitive way to inform how to boot to Windows. I was thinking of a 
 desktop background once they log in to the Mac side, but that doesn't seem 
 the friendliest or quickest (though logging in to the Mac is significantly 
 faster compared to Windows). I really don't want to tape instructions to the 
 physical computer either.
 
 Is is possible to tweak the login screen?


You can add a message, if that's what you're looking for.  It looks like it's 
gotten easier in more recent versions of the OS:


http://www.macobserver.com/tmo/article/os_x_lion_adding_custom_messages_to_the_login_window/

Previously, you'd have to go and edit the loginwindow.plist file:

http://hints.macworld.com/article.php?story=20020921074429845



-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] Browser Wars

2012-07-12 Thread Joe Hourcle

On Jul 12, 2012, at 3:39 PM, Cary Gordon wrote:

 It is almost worth getting an iPad just to see all the clueless
 messages. Borrow one and try some restaurant sites. The restaurant
 business seems to have the absolute worst relationship between what
 they spend and the usefulness of what they get.
 
 I understand and respect your view, but still contend that regardless
 of the reason that someone is using IE 6, they have certainly had
 enough time to figure it out by now. The only way that IE6 users will
 have a good experience is if you build a site for them.

iPads are especially annoying because there are so many websites
that must've been re-written to support iPhones, but haven't
had any updates, so they insist on redirecting you to the 'mobile'
version of the website.

Which often, goes something like:

http://xkcd.com/869/

I actually have fewer problems on my WebOS phone, as no one bothered
to write specifically for it.  (or be smart, and ask for the resolution
or window size, and deal with things that way ... or even use CSS sheet
with '@media handheld')

...

As for IE6, one of the many arguments against supporting is is that by
catering to people who are still using 12 year old web browsers, you're
keeping them from upgrading to a more secure browser.

Now, ideally, you don't make pages that are completely useless without
plugins and javascript and whatever turned on ... but we shouldn't be
forced to make it pretty for 'em.

...

And more scripting languages (javascript/ecmascript/whatever they want
to call it) that are intended to be used across platforms without knowing
what version it's going to be run from ... need to have some way of asking
'hey, do you support (x)', rather than all of the assumptions based on
the browser string (which in my case, is often a lie, specifically because
of those sites that make bad assumptions), and they may have no idea
what stuff I've specifically limited in my security preferences.

-Joe

(who complains every year when I have to re-take the annual security
training that won't work unless I (1) allow pop-ups, (2) allow plug-ins
and (3) allow java)

Re: [CODE4LIB] LoC job opening ???

2012-07-09 Thread Joe Hourcle

On Jul 9, 2012, at 2:04 PM, Chris Fitzpatrick wrote:

 This just seems like some sort of trap. The fact that it's a craigslist ad
 in all caps makes me pretty sure this person is working on a librarian
 centpede in their basement.

If that were the case, I think they'd also accept applicants from the
Folger Shakespeare Library, which may actually be closer.

So, the real question is why it must specifically be federal employee
librarians.  (and I don't know of any librarians with TS/SCI/Poly ...
but I *have* heard that some of the archivists at the National Archives
do, but that was a 'my son is fed up with his job' story from a librarian
at my local public branch)

-Joe


 On Jul 9, 2012 7:56 PM, Simon Spero sesunc...@gmail.com wrote:
 
 On Jul 9, 2012 1:27 PM, Joshua Gomez jngo...@gwu.edu wrote:
 
 WE NEED A CAT LOVER WHO IS ALSO A FEDERAL EMPLOYEE TO DO THIS JOB!
 
 Must have active TS/SCI clearance with FS Poly.
 
 All applicants must complete the attached 20 page KSA.

Re: [CODE4LIB] LoC job opening ???

2012-07-09 Thread Joe Hourcle

On Jul 9, 2012, at 3:00 PM, Joseph Montibello wrote:

 Um, did LC just stop referring to Library of Congress?

http://www.acronymfinder.com/LC.html

The closest that I can come to having the paragraph all make
sense is 'low carb', but the 'pay is lousy' doesn't work for
it.

-Joe


 But doing the LC thing isn't as bad as it soundsI did it for a few
 months when I first got out of school. The pay is lousy, but you do
 get pretty nice benefits (although it's hard to find a dentist that
 will actually see you when you're in that condition).

Re: [CODE4LIB] Storing lat / long

2012-06-28 Thread Joe Hourcle

On Jun 28, 2012, at 3:46 PM, Matthew LeVan wrote:

 I'd think it would depend on what you plan to do with the coordinates once
 you have them stored.  If you intend to do anything at all complicated
 (spatial queries, KML generation, your own custom maps, area/volume
 calculations), you might want to consider a spatial database extension (
 http://en.wikipedia.org/wiki/Spatial_database).
 
 I've used the SQLite SpatiaLite and Postgres PostGIS extensions, and
 they're fairly straightforward to setup.


Agreed.  If you're going to be searching on them (places w/in 50 miles of (x), 
closest to (y)) ... spatial database extensions are the way to go.

If you're just going to be returning them for display, it probably doesn't 
matter so much, but odds are someone in the future is going to ask about it.

(and that being said; I store two copies of most anything coordinate or unit 
related ... one for searching that's well normalized, and one for display 
purposes ... database normalization be damned)

-Joe

Re: [CODE4LIB] Best way to process large XML files

2012-06-08 Thread Joe Hourcle

On Jun 8, 2012, at 2:36 PM, Kyle Banerjee wrote:

 I'm working on a script that needs to be able to crosswalk at least a
 couple hundred XML files regularly, some of which are quite large.

[trimmed]

 How do you guys deal with large XML files? Thanks,

um ... I return ASCII tab-delim records, because IDL's XML processing
routines have some massive issue with garbage collection if you walk
down the DOM tree.  However, no one in their right mind uses IDL
for XML, as it's basically Fortran w/ multi-dimensional arrays.

...

Everyone else is going to tell you to use SAX, and they're probably
right, but as you sound to be as reluctant as I am on using SAX,
another alternative may be Perl's XML::Twig:

http://search.cpan.org/perldoc?XML::Twig

-Joe

Re: [CODE4LIB] viewer for TIFFs on iPad

2012-05-10 Thread Joe Hourcle

On May 10, 2012, at 11:16 AM, Edward Iglesias wrote:

 Hello All,
 
 I was wondering if any of you had experience viewing large ~300MB and
 up TIFF files on an iPad.  I can get them to the iPad but the photo
 viewer is less than optimal.  It stops enlarging after a while and I'm
 looking at Medieval manuscripts so...


Are there any other requirements?

If it doesn't have to be actually on that machine, and you can
interact with a webserver, you might want to consider converting it
to JPEG2000, and then using a JPIP server to serve them.

The group here that's using it is only serving 16 megapixel images,
but the advantage is that you can selectively send only the regions
and detail as needed ... but you don't have to generate lots of tiles
at different scaling:

http://wiki.helioviewer.org/wiki/ESA_JPIP_Server

-Joe

Re: [CODE4LIB] Anyone using node.js?

2012-05-08 Thread Joe Hourcle

On May 8, 2012, at 2:18 PM, Ross Singer wrote:

 On May 8, 2012, at 2:01 PM, Ethan Gruber wrote:

[trimmed]

 Thanks for the info.  To clarify, I don't develop in java, but deploy
 well-established java-based apps in Tomcat, like Solr and eXist (and am
 looking into a java triplestore to run in Tomcat) and write scripts to make
 these web services interact in whichever language seems to be the most
 appropriate.  Node looks like it may be interesting to play around with,
 but I'm wary of having to learn something completely new, jettisoning every
 application and language I am experienced with, to put a new project into
 production in the next 4-8 weeks.
 
 Eh, if your window is 4-8 weeks, then I wouldn't be considering node for this 
 project.  It does, however, sound like you could really use a new project 
 manager, because the one you have sounds terrible.

But project managers don't 'add value' unless they actually do something.  If 
they just let you do things the way that you've done in the past, even if they 
worked, they could be replaced by any other project manager who knew enough not 
to micro-manage things.

And, if you actually managed to do the project on time, with them staying 
mostly hands-off, what does that tell people?  That they're not needed ... they 
need a project that's going to hell, so they can step in and 'fix' stuff.


-Joe

ps. and besides the obvious 'this is not the opinion of my employer, and may or 
may not be sarcasm' disclaimer, I've had a few instances where there was a 
non-quite-as-tight deadline and I had to learn something new ... but they 
footed the bill for sending me to a week of training

pps.  in all seriousness -- I know of someone who pulled crap like this, and 
then used it as a reason to fire the developer and replace them with one of the 
PM's friends who had the 'needed' skills ... then another instance where an 
outside consultant did a 'peer review' of our system 2 weeks before we were 
supposed to go live and then somehow got a contract to design  build a 
different system which took a year and cost the university $250k? $500k?, but 
he never delivered (hardware was shipped w/ empty drive arrays) ... so I might 
be a little more jaded than most in this scenario.  (but neither of those two 
anecdotes were at my current employeer)

Re: [CODE4LIB] possible new stackexchange site for Digital Preservation

2012-04-26 Thread Joe Hourcle

On Apr 26, 2012, at 12:26 PM, Nada O'Neal wrote:

 I haven't seen the proposed new Stackexchange digital preservation site:
 http://area51.stackexchange.com/proposals/39787
 mentioned on code4lib yet. I'm sure most of you have turned to Stack Overflow 
 in your darkest hours of need, so if you think you might like such a site 
 specifically geared towards Digital Preservation, please take a look. 
 
 The proposal is currently in the commitment stage and needs about 900 more 
 committers to make it to the next stage. 

It was mentioned yesterday, but it doesn't need 900 more 'committers'.

If you click on the 'more info' near the 11% commitment score:

The commitment score is the minimum of three scores:

56%  112/200 committers in total
11%  11/100 committers with 200+ rep on any other site
40%  commitment score, based on committers' activity on all other sites 
and how old the commitment is

So ... yes, we need another 88 people to commit ... but what's going to be 
harder to get (as evidenced by the 'Libraries' proposal, which has dragged on 
for so long that the folks at Stack Exchange renamed it to 'Library and 
Information Science' incorrectly thinking that it'd be broadening the category 
: http://area51.stackexchange.com/proposals/12432/)

Now, the important thing is that the 'any other site' is specifically 'Stack 
Exchange 2.0' sites, which means that Unshelved Answers, even though it was a 
'Stack Exchange' site *does* *not* count.  It must be one of the sites listed 
at:

http://stackexchange.com/sites

And it's really not that hard ... ask a few good questions (make sure they're 
not a duplicate, or they'll mark you down), or answer some questions, and 
you'll get voted up.  Now, the thing is, some of the larger sites get so many 
questions that fewer people are going to look at them unless you make it really 
intriguing (which could get it marked down and closed as subjective).

So, I'd recommend sticking with some of the smaller sites, including these that 
haven't yet graduated out of 'beta'.  For example, likely relevant for those on 
here, being an intersection of MLS folks and programmers:

Databases : http://dba.stackexchange.com/
Drupal : http://drupal.stackexchange.com/
Wordpress : http://wordpress.stackexchange.com/
User Experience : http://ux.stackexchange.com/
Graphic Design : http://graphicdesign.stackexchange.com/
Unix / Linux : http://unix.stackexchange.com/
Apple : http://apple.stackexchange.com/
Ubuntu : http://askubuntu.com/

English Language : http://english.stackexchange.com/
Linguistics : http://linguistics.stackexchange.com/

Project Management : http://pm.stackexchange.com/
Academia: http://academia.stackexchange.com/
eg, Is there any world-wide ranking of conferences/journals? 
: http://academia.stackexchange.com/questions/1199/
or Preprint services other than arXiv (for other fields) : 
http://academia.stackexchange.com/questions/84/ 

(don't bother with Literature -- it's going to be culled)

And of course, the original three:

programmer questions :  http://stackoverflow.com/
sysadmin questions : http://serverfault.com/
other computer users : http://superuser.com/



So, and for advice on getting reputation ... writing good answers tends to be 
the best way to go, but you want to :

Format it clearly.  (bulleted lists are your friend;  they use 
MarkDown, but there's an editor to make it easy)
Use good grammar / punctuation (minor ones, not so bad ... if it looks 
like you're being sloppy and didn't even try ... not so good)
Cite authoritative sources when appropriate
Give an answer, not just a link (eg, summarize, then cite the authority)
Speak from a position of authority and you're more likely to get voted 
up even when you're wrong... a 'it might be (x)' or 'have you tried (x)?' isn't 
going to go was well as 'As you said (y), based on previous experience, there's 
a good probability of it being (x)'
Don't be repetitive; if there's already a similar answer, you're better 
off commented on that answer to improve it ...
Answer quickly; most people look to see what they can answer when they 
first see a new question, and so if there's already a good answer there will 
vote it up ... two weeks later, not so much.  (although, I find that I'll get 
sudden bursts of lots of old answers being voted up ... and I know that if 
someone gives an interesting answer, I'll look at what else they've posted, 
which often leads me to vote their stuff up)


If you're going to ask questions:

Make sure it's not something that can be answered easily with a search 
on the internet.
Select good 'tags' for it.  (although, others may change the tags, but 
having good ones up front helps)


... and, I should add

Re: [CODE4LIB] crowdsourced book scanning

2012-04-25 Thread Joe Hourcle

On Apr 25, 2012, at 1:36 PM, Michael Lindsey wrote:

 A colleague posed an interesting idea: patrons scan book pages to deliver to 
 themselves by email, flash drive, etc.
 What if the scans didn't disappear from memory, but went into a repository so 
 the next patron looking for that passage didn't have to jockey the flatbed 
 scanner?
 
 * Patron scans library barcode at the scanner
 * The system says, I have these pages available in cache.
 o Patron's project overlaps with the cache and saves time in the
   scanning, or
 o Patron needs different pages, scans them and contributes to the
   cache
 
 Now imagine a consortium of some sort where when the patron scans the 
 barcode, the system takes a hop via the ISBN number in the record to reach 
 out to a cache developed between a number of libraries.
 I know there are a number of cases where this may not apply, like loose-leaf 
 publications in binders that get updated, etc.  And I'm sure there are 
 discussions around how to handle copyright, fair use, etc.
 Do we as a community already have a similar endeavor in place?

It sounds like a great idea ... but I'm guessing that this is the sort of thing 
that Google got in trouble for, as they were storing copies of books.  It might 
be that as libraries, we have different exemptions from copyright law than I'm 
aware of, but I'm looking in Section 108 of Title 17 and I don't think it'd be 
allowed, or at the very least would increase the library's liability.

Per 108(g)

(g) The rights of reproduction and distribution under this section 
extend to the isolated and unrelated reproduction or distribution of a single 
copy or phonorecord of the same material on separate occasions, but do not 
extend to cases where the library or archives, or its employee —
(1) is aware or has substantial reason to believe that it is engaging 
in the related or concerted reproduction or distribution of multiple copies or 
phonorecords of the same material, whether made on one occasion or over a 
period of time, and whether intended for aggregate use by one or more 
individuals or for separate use by the individual members of a group; or
...

-Joe

Re: [CODE4LIB] Help Start a Digital Preservation Stack Exchange QA Site

2012-04-25 Thread Joe Hourcle

On Apr 25, 2012, at 3:36 PM, Owens, Trevor wrote:

 I and some other folks working in digital preservation are trying to get a 
 Stack Exchange site focused on digital preservation launched. Here is the 
 blurb defining the proposed site: 
 
 Proposed QA site for librarians, archivists, curators, data managers, 
 information specialists, computer scientists and engineers and other 
 professionals working to ensure long term access to digital objects.
 
 It you would like to help get it launched just click the link and hit the 
 commit button. At this point the biggest hurdle is getting people who have 
 already have at least 200 rep on other stack exchange sites to commit. So, if 
 you have participated in any of the stack exchange sites it would be 
 particularly awesome if you could commit. Also, if you know other folks that 
 you think would be interested please consider sending the link along to them 
 too. 
 
 http://area51.stackexchange.com/proposals/39787/digital-preservation?referrer=anTT6XLk2hYl8-Pye4BdZw2


And because you need 200 rep on one of the other sites, you can commit to the 
proposal, and then find other stack exchange sites that you'd be interested in 
to try to get the 200 reputation necessary:

http://stackexchange.com/sites

(although, as a former moderator of the cooking site, I know that if they see 
people working together to bump up each other's reputation abnormally, they'll 
at the very least erase it all)

...

and hopefully this won't turn into the 'Libraries' proposal that languished as 
they had 500+ committed, but only 80 w/ the necessary rep and then was renamed 
to 'Libraries and Information Science' :


http://area51.stackexchange.com/proposals/12432/libraries-information-science?referrer=xHuHFdj5_FDG1iedac--IA2

-Joe

Re: [CODE4LIB] monitoring wireless networks

2012-04-12 Thread Joe Hourcle

On Apr 12, 2012, at 12:14 PM, Tara Robertson wrote:

 Hi,
 
 Is there an automated way of monitoring (and notifying) when a wireless 
 network goes down? I'm looking for something like Nagios, but for wireless 
 (or can Nagios do this too?)
 
 I don't manage our network--our ITS department does. They seem to think it's 
 adequate that I'm the monitoring system but I'm finding this extremely 
 frustrating.

Nagios can monitor *anything* so long as you can write a script that'll get you 
some status back.

If you have a command line way of getting signal strength for the network, 
that'd likely be best, but you could also just test to see if you can ping out 
on the right interface.

-Joe

[CODE4LIB] DC / Baltimore Perl Workshop

2012-03-01 Thread Joe Hourcle

Apologies in advance if you've already seen this from other mailing lists;  I 
know we have a few Perl folks on here, but I don't know how many in the DC area.

The DC  Baltimore Perl Mongers groups are organizing a Perl workshop on Sat, 
April 14th in Catonsville, MD.

We're still filling out the program schedule, but I thought I'd mention it as 
today's the last day for early registration ($25 vs. $50, although free for 
students  the unemployed)

http://dcbpw.org/dcbpw2012/

-Joe

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-02-27 Thread Joe Hourcle

On Feb 27, 2012, at 10:51 AM, Godmar Back wrote:
 On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann 
 metadata.ma...@gmail.comwrote:
 On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens o...@ostephens.com wrote:

 This issue is certainly not unique to VT - we've come across this as part
 of our project. While the OAI-PMH record may point at the PDF, it can
 also
 point to a intermediary page. This seems to be standard practice in some
 instances - I think because there is a desire, or even requirement, that
 a
 user should see the intermediary page (which may contain rights
 information
 etc.) before viewing the full-text item. There may also be an issue where
 multiple files exist for the same item - maybe several data files and a
 pdf
 of the thesis attached to the same metadata record - as the metadata via
 OAI-PMH may not describe each asset.
 
 
 This has been an issue since the early days of OAI-PMH, and many large
 providers provide such intermediate pages (arxiv.org, for instance). The
 other issue driving providers towards intermediate pages is that it allows
 them to continue to derive statistics from usage of their materials, which
 direct access URIs and multiple web caches don't.  For providers dependent
 on external funding, this is a biggie.
 
 Why do you place direct access URI and multiple web caches into the same
 category? I follow your argument re: usage statistics for web caches, but
 as long as the item remains hosted in the repository direct access URIs
 should still be counted (provided proper cache-control headers are sent.)
 Perhaps it would require server-side statistics rather than client-based GA.

I'd agree -- if you can't get good statistics from direct linking, something's 
wrong with the methods you're using to collect usage information.  Google 
Analytics and similar tools might produce pretty reports, but they're really 
meant for tracking web sites and won't work when someone has javascript turned 
off, has specifically blacklisted the analytics server, or on anything that's 
not HTML.

You *really* need to analyze the server logs directly, as you can't be sure 
that all access is going to go through the intermediate 'landing pages' or that 
it'd be tracked even if they did.

...

I admit, the stuff I'm serving is a little different than most people on this 
list, but we also have the issue that the collections are so large that we 
don't want people retrieving the files unless they really need them.  We serve 
multiple TB per day -- I'd rather a person figure out if they want a file 
*before* they retrieve it, rather than download a few GB of data and find out 
it won't serve their purposes.

It might not help our 'look how much we serve!' metrics to justify our funding, 
but it helps keep our costs down, and I personally believe it helps with good 
will in our designated community as they don't spend a day (or more) 
downloading only to find it's not what they thought.  (and it fits in with 
Ranganathan's 4th law better than saving them from an extra click)

-Joe

Re: [CODE4LIB] Transcription/dictation software?

2012-02-27 Thread Joe Hourcle

On Feb 27, 2012, at 1:52 PM, Suchy, Daniel wrote:

 Hello all,
 
 At my campus we offer podcasts of course lectures, recorded in class and then 
 delivered via iTunes and as a plain Mp3 download (http://podcast.ucsd.edu).  
 I have the new responsibility of figuring out how to transcribe text versions 
 of these audio podcasts for folks with hearing issues.
 
 I was wondering if any of you are using or have played with 
 dictation/transcription software and can recommend or de-recommend any?   My 
 first inclination is to go with open-source, but I'm open to anything that 
 works well and can scale to handle hundreds of courses.

I remember seeing a poster on a wall at the University of Maryland presenting 
work on a grant on doing this sort of work ... but I think it was for 
intelligence intercepts, as it was DoD funded and being used for Arabic.

This might've been the project:

Global Autonomous Language Exploration
http://projects.ldc.upenn.edu/gale/index.html

I have no idea why it's on a UPenn website, but it's listed at:

http://ischool.umd.edu/content/research-and-projects

And one of the researchers is Doug Oard, which matches what I remembered.

It might've also been Supporting Information Access Using Computational 
Linguistics, which was also DoD funded, but doesn't have a website link in 
that list.  And they didn't verify the links to faculty pages, so try one of 
the links to 'Douglas Oard' rather than 'Douglas Ward' if you want to contact 
him.

I also don't know if they were doing full transcription / translation, or if 
they were just looking for specific words to alert a human translator to review 
it.

...

Also, in the earlier list that Todd linked to, Zooniverse was mentioned.  They 
have a framework for mechanical turk-type stuff, but they tend to be science 
oriented, and I don't know if they've ever done audio transcription.  It's not 
exactly what they deal with, but they might be interested in helping, as at the 
2010 DCC, someone said they had the problem of not enough work for their 
volunteers to do.  (although, that might've changed since then).

https://www.zooniverse.org/researchers

-Joe

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-02-26 Thread Joe Hourcle

On Feb 26, 2012, at 9:42 AM, Godmar Back wrote:

 May I ask a side question and make a side observation regarding the
 harvesting of full text of the object to which a OAI-PMH record refers?
 
 In general, is the idea to use the dc:source/text() element, treat it as
 a URL, and then expect to find the object there (provided that there was a
 suitable dc:type and dc:format element)?
 
 Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the
 harvesting of ETD metadata.  Yet, its metadata reads:
 
 ListRecords
   
   metadata
 dc
typetext/type
formatapplication/pdf/format
source
 http://scholar.lib.vt.edu/theses/available/etd-3345131939761081//source

 
 
 When one visits
 http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however
 there is no 'text' document of type 'application/pdf' - rather, it's an
 HTML title page that embeds links to one or more PDF documents, such as
 http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto
 Walker_5.pdf.
 
 Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up
 to allow the harvesting of full-text without what would basically amount to
 crawling the ETD title page, or other repository-specific mechanisms?


I don't know if it's the official method, and I've never actually
implemented OAI-PMH myself, but I'd be inclined to have source
point to an OAI-ORE document, which can then point to the PDF,
full text, or whatever else.

If it's not currently an ORE document, you might still be able to
do some creative redirection on the webserver if you see the
appropriate Accept header and handling it as you would normal
content negotiation

You could also add a 'resourcemap' link element in the HTML
page to point to the ORE document.  If it's XHTML, you could
add the appropriate ORE elements; I think the microformat
style HTML was deprecated, as it's not mentioned in the 1.0 spec:

http://www.openarchives.org/ore/1.0/

-Joe

Re: [CODE4LIB] Repositories, OAI-PMH and web crawling

2012-02-24 Thread Joe Hourcle

On Feb 24, 2012, at 9:25 AM, Kyle Banerjee wrote:

 
 One of the questions this raises is what we are/aren't allowed to do in
 terms of harvesting full-text. While I realise we could get into legal
 stuff here, at the moment we want to put that question to one side. Instead
 we want to consider what Google, and other search engines, do, the
 mechanisms available to control this, and what we do, and the equivalent
 mechanisms - our starting point is that we don't feel we should be at a
 disadvantage to a web search engine in our harvesting and use of repository
 records.
 
 Of course, Google and other crawlers can crawl the bits of the repository
 that are on the open web, and 'good' crawlers will obey the contents of
 robots.txt
 We use OAI-PMH, and while we often see (usually general and sometimes
 contradictory) statements about what we can/can't do with the contents of a
 repository (or a specific record), it feels like there isn't a nice simple
 mechanism for a repository to say don't harvest this bit.
 
 
 I would argue there is -- the whole point of OAI-PMH is to make stuff
 available for harvesting. If someone goes to the trouble of making things
 available via a protocol that exists only to make things harvestable and
 then doesn't want it harvested, you can dismiss them as being totally
 mental.

I see it like the people who request that their pages not be cached elsewhere 
-- they want to make their object 'discoverable', but they want to control the 
access to those objects -- so it's one thing for a search engine to get a copy, 
but they don't want that search engine being an agent to distribute copies to 
others.

Eg, all of the journal publishers who charge access fees -- they want people to 
find that they have a copy of that article that you're interested in ... but 
they want to collect their $35 for you to read it.

In the case of scientific data, the problem is that to make stuff discoverable, 
we often have to perform some lossy transformation to fit some metadata 
standard, and those standards rarely have mechanisms for describing error 
(accuracy, precision, etc.).  You can do some science with the catalog records, 
but it's going to introduce some bias into your results, so you're typically 
better of getting the data from the archive.  (and sometimes, they have nice 
clean catalogs in FITS, VOTable, CDF, NetCDF, HDF or whatever their 
discipline's preferred data format is)

...

Also, I don't know if things have changed in the last year, but I seem to 
remember someone mentioning at last year's RDAP (Research Data Access  
Preservation) summit that Google had coordinated with some libraries for feeds 
from their catalogs, but was only interested in books, not other objects.

I don't know how other search engines might use data from OAI-PMH, or if they'd 
filter it because they didn't consider it to be information they cared about.

-Joe

Re: [CODE4LIB] Any libraries have their sites hosted on Amazon EC2?

2012-02-23 Thread Joe Hourcle

On Feb 22, 2012, at 11:52 PM, Cary Gordon wrote:

 EC2 works for a lot of models, but one that it does not work for is
 small traffic apps that need to be available 24/7. If you have a small
 instance (AWS term) running full time with a fixed IP, it costs about
 $75 a month. If you turn it on for 2 hours a day, it costs about
 $15/month. A large instance is about $325.
 
 Now where it gets interesting is if your app needs a large instance,
 but only run a few hours a month, you might be able to run a micro
 instance that is set to start a large (or ???) instance on demand, and
 run the whole thing for peanuts.


We've looked at something similar (not Amazon, NASA is working on its
own cloud service) where we'd locally run a server, but at times of
high demand, pass off to the cloud service.

If you have applications that are cyclic, I could see it being an
advantage to have something take over in the peak times.  Eg,
when I worked for a university, the system we used for class
registration was okay ... not great, but okay ... but the incoming
freshmen were brought in in 3 or 4 'orientation' periods over the
summer, and they'd all hit the system on the same day, at the same
hour (well, 1/3? 1/4 of the incoming class)

The system performance went to complete crap.  We're talking about
throughputs worse than if we had metered the access.  (and the
DBAs refused to look at database tuning, insisting that it was a
webserver problem ... it was of course, a database issue, but it
was months before we got it straightened out)

I could see conferences using something like this -- where almost
all of their traffic is on the days of deadlines, or during the
conference itself.

If the load's pretty uniform, I don't think their pricing model
is all that advantageous.  (and I have no idea how they handle
the loads over christmas, as the reason for the cloud is to make
money back on their excess capacity they need for the christmas
sales period.)

-Joe

Re: [CODE4LIB] Issue Tracker Recommendations

2012-02-22 Thread Joe Hourcle

On Feb 22, 2012, at 12:36 PM, Cynthia Ng wrote:

 Hi All,
 
 We're looking at implementing an issue tracker for internal use, so
 I'm looking for recommendations.
 
 What's key:
 1) minimal effort in install/setup i.e. ready to use out of the box
 2) small scale is okay, we have a very small team
 3) ideally, have an area for documentation and issue creation via email
 
 What does your institution use?
 What do you like and dislike most about it?
 Would you recommend it to others?
 
 Responses (short or detailed) would be greatly appreciated.

I've only managed Bugzilla and Trac.

They both were a little annoying to set up (define all of your
software components and versions, and who's responsible for each
one, so they'll get notified if bugs are filed).

Trac has good reporting  wiki for documentation, and their
markup syntax makes it easy to link trouble tickets within the
documentation (and it'll scratch them out as they're marked as
resolved).

I did get into some problems, as we had it open to the public,
and someone posted an attachment*, which triggered a 'security
incident' (which didn't seem to reach the 'men with guns show
up and seize your machines' like it had in the past ... instead,
it was 'we're going to make you rebuild your machine over and over
against until we say it's okay' so I wasted 2 weeks on it)

It's also a bit of a pain to strip all occurrences of the term
'wiki' and 'trac' from the software, so that I didn't show up as
7 of the top 10 results in google for 'site:nasa.gov wiki'.  If
you're keeping it private, it might not be so bad.

I also have no idea how useful the interaction with change control
is ... we were using CVS, and it was still subversion specific
back then.


I've also helped to configure Remedy before, it was more
than a decade ago, but it left a bad taste it my mouth (and it
wasn't cheap)

...

As others have mentioned github, I know there's other services
out there ... one project here uses launchpad.net (which is
tied to Bazaar), and they seem happy with it, but I've never
administered it myself.

-Joe

* The attachment was an image which said 'I've hacked your machine'.
Years later, when we switched virus scanning software, it found a
backup that had that file in it, and it turns out there was a
JPEG exploit in it ... but the security gestapo had thought that
*my* server had been hacked, which is what triggered it all.

Re: [CODE4LIB] Touch Screens in the Library

2012-02-13 Thread Joe Hourcle

On Feb 13, 2012, at 10:50 AM, Cynthia Ng wrote:

 Hi All,
 
 I was wondering if anyone has implemented (or plan to implement) touch
 screens in their library? We're looking mostly at doing it for
 wayfinding (finding items, rooms, etc.) but I'd definitely be
 interested in  hearing about any other uses.
 
 What kind of hardware did you choose?
 What software are you using?
 If you did it in-house, what language(s) did you use?
 
 Any ideas/help would be great.

I saw an article a couple of months back about one of the
harvard libraries using a Microsoft Surface:

http://osc.hul.harvard.edu/liblab/proj/wolbach-user-experience-lab

(I took note, as the pictures of the sun are from the Solar Dynamics
Observatory's AIA telescopes)

I'm guessing it's out of the price range for most of us, though.


-Joe

-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] barcode scanner with memory

2012-01-30 Thread Joe Hourcle

On Jan 30, 2012, at 1:37 PM, Adam Wead wrote:

 Hi all,
 
 Can anyone recommend a barcode scanner wireless or otherwise that saves 
 barcodes to internal memory, to be downloaded to a computer later?  We have 
 patrons scan their ids as they enter to keep track of statistics.  I've 
 created some software that does this, with a regular barcode reader, but the 
 problem is the window has to be in focus the whole time and the terminal is 
 used by a security guard who has to do other things at the same time.  So, I 
 need some kind of hands-off solution and preferably something involving the 
 least amount of work from me...
 
 any ideas?

I have an older Intelliscanner Mini that would fit the bill ... The earlier 
model holds about 300 barcodes before you have to dump it; I don't know what 
the memory limits of the current one are.

http://www.intelliscanner.com/products/mini/

You're supposed to use their software, but from what I remember, I was able to 
get it to dump to other programs (it acted as a keyboard, typing in the values, 
with line returns between each value).

...

Looking at their website, it looks like if you want to export to other 
software, they recommend the 'Intelliscanner SOHO' (more expensive) model:

http://www.intelliscanner.com/products/soho/

-Joe

Re: [CODE4LIB] My crazed idea about dealing with registration limitations

2011-12-23 Thread Joe Hourcle

On Dec 23, 2011, at 12:15 PM, Susan Kane wrote:

[trimmed]

 You could repeat the conference at a totally different time of year ...
 everyone who didn't get in is automatically registered for the second
 conference later that year ... kinda wacky but ...
 
 You could plan for a second conference of the same size in the same city
 (different hotel).  After presentations for C4L1 are finalized, presenters
 are sought on similar topics for C4L2.  Overflow registrations for C4L1
 automatically go to C4L2.  Similar content means that institutions who paid
 for you to come to learn about X will hopefully not be upset if you learn
 about X from a different person across the street.  Everyone hangs out
 informally during off-presentation times.
 
 One could call that tracks but I'm trying for more of a mirror download
 site concept.

[trimmed]

For some reason, this jogged my memory --

The DC-IA (Information Architecture) group used to hold an meeting
after the IA Summit to basically recap what was discussed at the IA
Summit.  (I think they called it the 'IA Redux')

As there was more than one track, it allowed people who did go to
the summit to hear more about the other presentations they missed,
and for those who didn't go at all, it gave them a chance to at least
hear second-hand what was discussed.

Obviously, it wasn't nearly as complete as the original, and lost some
in translation, but I found it to be informative.

Particularly when you consider the proposal to limit the number of
attendees from one organization, this means that you spread the
number of attendees out, who can then spread the gospel to the others
that weren't able to attend.

Now, I'm not saying that people have to go out and take copious notes
and then try to get them into some format for dissemination (I did that
for the last RDAP meeting ... it's a lot of work trying to get 'em into a
format that others might understand), but if you get a few people
together who were at the meeting, and they can talk about what they
thought was interesting (possibly referring to notes they might've
jotted down), and that often spurs interesting discussions in itself.

-Joe

ps.  as an example of understandability, compare:
http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_notes.txt
http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_report.html
(and I took the original notes by hand, not typed, so I was spending
my nights at the meeting typing, then making 'em understandable for
the next week or so)

Re: [CODE4LIB] automatic greeking of sample files

2011-12-12 Thread Joe Hourcle

On Dec 12, 2011, at 3:06 PM, Brian Tingle wrote:

 On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein mbkl...@gmail.comwrote:
 
 Here's a snippet that will completely randomize the contents of an
 arbitrary string while replacing the general flow (vowels replaced with
 vowels, consonants replaced with consonants (with case retained in both
 instances), digits replaced with digits, and everything else is left alone.
 
 https://gist.github.com/1468557  https://gist.github.com/1468557
 
 
 I like the way the output looks; but one problem with the random output is
 that the same word might come out to different values.  The distribution of
 unique words would also be affected, not sure if that would
 impact relevance/searching/index size.  Also, I was sort of hoping to be
 able to have some sort of browsing, so I'm looking for something that is
 like a pronounceable hash one way hash.  Maybe if I take the md5 of the
 word; and then use that as the seed for random, and then run
 your algorithm then NASA would always hash to the same thing?

If the list of missions / agencies / etc is rather small, it'd be possible to
just come up with a random list of nouns, and make a sort of secret
decoder ring, assigning each mission name that needs to be replaced
with a random (but consistent) word.

I just tend to replace all of my mission / spacecraft / instrument acronyms
with 'BOGUS' when I have to do similar stuff to generate records when
we're testing data systems, but I tend to just have the acronyms, not
the full spelled out names (which are looked up from the acronyms),
and I don't have large amounts of free text to worry about.

-Joe

Re: [CODE4LIB] Pandering for votes for code4lib sessions

2011-12-01 Thread Joe Hourcle

On Dec 1, 2011, at 8:47 AM, Ross Singer wrote:

 As unwilling commissioner of elections, I'm shocked, SHOCKED, I say,
 to hear of improprieties with the voting process.

It could be worse ... I'm an unwilling elected official.  (and the re-election
for my third term is next month ... anyone want to move to Upper Marlboro,
MD, so they can run against me?  I think you still have about a week to
make the 30 day residency deadline)

(maybe 'unwilling' is the wrong word, before this shows up in the
local newspaper ... I'll do it, but I think someone with more free time
to commit might be able to do a better job)


 That said, I'm not shocked (and we've seen it before).
 
 I am absolutely opposed to:
 
 1) Setting weights on voting.  0 is just as valid a vote as 3.
 2) Publicly shaming the offenders in Code4Lib.  If you run across
 impropriety in a forum, make a friendly, yet firm, reminder that
 ballot stuffing is unethical, undemocratic and tears at the fabric
 that is Code4Lib.  Sometimes it just takes a simple reminder for
 people to realize what they're doing is wrong (it certainly works for
 me).
 3) Selection committees.  We are, as Dre points out,
 anarcho-democratic as our core.  anarcho-bureaucratic just sounds
 silly.

It'd be (anarcho-)?republican, as you'd have a smaller body that's
appointed or elected to make the decisions.


 This current situation is largely our doing.  We even publicly said
 that getting your proposal voted in is the backdoor into the
 conference.  The first allotment of spaces sold out in an hour.  This
 is, literally, the only way that a person that was not able to
 register and is buried on the wait list is going to get in.  And we've
 basically told them that.

Perhaps if registration were done after the talk selection, this wouldn't
be a problem?   Or some sort of lottery, rather than first-come-first served?

... and the real way to ensure a slot is to help with the conference
planning ... if you've agreed to man the table where people get their
badges, they normally let you come.


 One thing I would be open to is to put a disclaimer splash page before
 any ballot (only to be seen the first time a person votes) briefly
 explaining how the ballot works and to mention that ballot stuffing is
 unethical, undemocratic and tears at the fabric that is Code4Lib or
 some such.  I would welcome contributions to the wording.
 
 What would people think about that?

I'd like to know if this is even a problem -- is there some way to
tell if we have people who only voted for one paper?

(although, just putting that as a restriction just makes 'em 
likely to vote for a few random ones, which really does taint
the whole process)

-Joe

Re: [CODE4LIB] Pandering for votes for code4lib sessions

2011-12-01 Thread Joe Hourcle

On Dec 1, 2011, at 10:29 AM, Ross Singer wrote:

 On Thu, Dec 1, 2011 at 10:09 AM, Richard, Joel M richar...@si.edu wrote:
 I feel this whole situation has tainted things somewhat. :(
 
 
 Let's not blow things out of proportion.  The aforementioned
 wrong-doing actually seems pretty innocent (there is backstory in the
 IRC channel, I'm not going to bring it up here).  There is a valid
 case for advertising interest in your talks (or location, or t-shirt
 design, etc.), especially in an extremely crowded field, and we've
 never explicitly set a policy around what is appropriate and what
 isn't.  I think a simple edit on the part of the accused would clear
 up any ambiguity of intention.
 
 Our one known incident was handled privately, but didn't really
 cause us to address the potential for impropriety.
 
 We seem to have quite a bit of support for the splash page.  If people
 will help me draft up the wording -- ideally something we can point to
 when we want to guide people in the right direction in other forums --
 I think we can put this issue to bed.

It depends on how harsh you want be ... I mean, if you're on the
fence about ballot stuffing, you could go with something like:

When voting, we expect you to actually read through the list,
and pick the best ones.  So yes, go ahead and vote for your
friends and colleagues, but also read through the others
to find other equally good proposals.

-Joe

Re: [CODE4LIB] server side vs client side

2011-12-01 Thread Joe Hourcle

On Dec 1, 2011, at 12:49 PM, Nate Hill wrote:

 As I was struggling with the syntax trying to figure out how to use
 javascript to load a .txt file, process it and then spit out some html on a
 web page, I suddenly found myself asking why I was trying to do it with
 javascript rather than PHP.
 
 Is there a right/wrong or better/worse approach for doing something like
 that? Why would I want to choose one approach rather then the other?
 
 As always, apologies if I'm asking a terribly basic question.


There's different advantages to each side:

JavaScript / JScript / ECMAScript / client side:
Scales better (as the clients do their own work)
More obnoxious to maintain (as different browsers may have slightly 
different implementations)
Less reliable (I keep mine turned off on my main browser)
Better detection of client features (you can always lie in a browser 
string, or just not send it)
May require extra layers of abstraction (APIs that then require extra 
taint checking)
More responsive for simple operations (if doesn't need remote calls)
Easier to do some tasks

PHP / ColdFusion / CGI / ASP / server side :
You can be assured that you know it's working, and error reports when 
it's not (assuming you log  check your logs)
the inverse of all of the ones in the 'client side' section
(but the inverse of 'Easier to do some things'  is till 'Easier to do 
some things')



I'm not going to make any claims about speed, as it's frequently dependent
on bandwidth/latency.  (if I can send data to the client on a slow link, and
have them build the structures around it, it might be faster than my doing
it server side, and more so if my server gets bogged down)

For some tasks, I'll do it both ways.  Eg, form input validation -- 

Once in javascript, so they get the warning *before* the submit the form,
and again on the server side, in case they have javascript off or are being
malicious.

-Joe

Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community

2011-11-23 Thread Joe Hourcle

On Nov 23, 2011, at 12:17 PM, Robert Sanderson wrote:

 LibLime
 A Division of PTFS, Inc.
 Main Office
 
 11501 Huff Court
 North Bethesda, Maryland 20895
 
 tel: (301) 654-8088 Ext. 127
 fax: (301) 654-5789
 email: kohai...@liblime.com
 
 Twitter: @liblime
 
 How about we all contact them? ;)


Our contacting them isn't as effective as their customers contacting them.

You can get a list of known Koha installations from lib-web-cats:

http://www.librarytechnology.org/map.pl?ILS=Koha

Which lists over 1200 sites ... the Library Journal, when they covered the
purchase of LibLime last year, only mentioned that they had about 1/2 of those
(140 libraries thought PTFS, 500 from LibLime):

http://www.libraryjournal.com/article/CA6714841.html

Although, I don't know if the lib-web-cats is libraries, or whole library
systems.

You could get specific names of LibLime customers by looking through
their website for testimonials scattered on the site, or get their more recent
clients through the press releases in their 'news' feed:

http://www.liblime.com/news-

-Joe

Re: [CODE4LIB] Citation Analysis - like projects for print resources

2011-11-17 Thread Joe Hourcle

On Nov 17, 2011, at 12:09 PM, Miles Fidelman wrote:

 Matt Amory wrote:
 Is anyone involved with, or does anyone know of any project to extract and
 aggregate bibliography data from individual works to produce some kind of
 most-cited authors list across a collection?  Local/Network/Digital/OCLC
 or historic?
 
 Sorry to be vague, but I'm trying to get my head around whether this is a
 tired old idea or worth pursuing...
 
 
 Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/ - it's 
 a combination bibliographic and citation index for computer science 
 literature.  It includes a good degree of citation analysis.  Incredibly 
 useful tool.


Another recent project (that I haven't had a chance to play with yet) is Total 
Impact : 

http://total-impact.org/about.php

It's from some of the folks in altmetrics, who are trying to find better 
bibliometrics for measuring value: 

http://altmetrics.org/manifesto/

I don't see a list of what they're scraping I think they're using the 
publisher's indexes, PubMed and other databases rather than parsing the text 
themselves ... but the software's available, if you wanted to take a look.  Or 
you could just ask Heather or Jason, they're both approachable and always eager 
to talk, when I've run into them at meetings.

I also seem to remember someone at the DataCite meeting this summer who was 
involved in a project to parse references in papers ... unfortunately, I don't 
have that notebook here to check ... but I *think* it was John Kunze.  (and I 
don't think it was part of the person's presentation, but something that I had 
picked up in the Q/A part)

-Joe

Re: [CODE4LIB] Hotel registration - This was a test, right?

2011-11-16 Thread Joe Hourcle

On Nov 16, 2011, at 5:02 PM, Cary Gordon wrote:

 I just registered for an overflow block room at
 https://resweb.passkey.com/Resweb.do?mode=welcome_gi_newgroupID=7466136
 
 I noticed when I got to the Guest Details page that there was a
 checkbox in the Contact Information block -- Yes, I'd like to be
 notified… -- which was checked (not surprising) and not changeable
 (that was surprising). Peeking at the code, I noticed that the form
 tag had the words, checked and disabled.
 
 Now, since nobody could be so slimy as to do this intentionally
 (right?). I helped them out by using my in-browser editor to correct
 this oversight, because I wouldn't want them to waste electrons
 sending me email that I don't want.

Unless they're doing something to un-disable the form when you submit,
there shouldn't be an issue in most browsers, as the 'disable' also implies
'don't bother sending when submitting'.

It's implied in the HTML4 spec, but I don't know if it's required behavior:

http://www.w3.org/TR/html4/interact/forms.html#h-17.12


Now, if they had set it 'readonly', then yes, you should worry.  Or that
whoever made the form doesn't know what they're doing, and as I've
often found out, those people seem to get paid way more money than
I do even though they're clueless.


-Joe

Re: [CODE4LIB] Looking for products/price ranges for a database of performers

2011-09-06 Thread Joe Hourcle

On Sep 6, 2011, at 7:20 PM, Heather Rayl wrote:

 ** apologies for cross-posting **
 
 Hi there,
 
 We have a database of performers that we use in our libraries. Currently,
 the data is stored on one person's computer in a file maker pro db that only
 this one person has access to (Hooray for legacy systems!). In order for the
 rest of the staff to have access to the performer listings, this one person
 runs yearly reports and they are posted on the staff intranet in a
 rather unwieldy series of pdf  documents for staff to browse. For a sense of
 scale, we have over 80 libraries, about probably around 300-400 staff people
 accessing these documents, and there are probably around 400 or so
 performers in the database. Clearly, we need a new system of managing these
 performers!!

[trimmed]

 So here's what we're grappling with:
 
 1. We can purchase a product that would give us the framework to do this. I
 realize that something like a wiki would let us do some of these things, but
 really we are rather freaky about our content control, and a wiki is just
 too free-wheeling!
 2. We can hire a developer/programmer to design a custom solution for us.
 
 So my questions for the list are:
 
 1. do you know of any products that do what we want?

FileMaker.  The more recent versions have a 'instant webpage' option:

http://www.filemaker.com/products/filemaker-pro/web-publishing.html

If you're expecting a lot of traffic, you'll want to go to FileMaker Server, or
Server Advanced:

http://www.filemaker.com/products/filemaker-server-advanced/

I admit, I haven't used any of the versions since they've added this
feature ... my FileMaker experience is 10+ years old at this point, so 
I don't know how much work 'instant' is.  I believe they offer 30 day
free trials on all of their software these days, so you might be able
to download it and see what it can do.


 2. if we were to hire someone, how much is a reasonable fee - we have some
 money in our budget, but we don't really know what a real person would
 charge for this, and if the money in our budget would cover it. And I don't
 want to go through writing an RFP for it if in the end we won't be able to
 afford it anyway.


As for cost, it varies widely.  Part of the issue is how the data's currently
structured, and if you're going to keep the same structure, or change
it as part of the re-design.  FileMaker had some fields that were basically
enums, and so the database handled what you'd have to do in most
RDBMSes as a lookup table.

As strange as it sounds, someone who is more skilled at this might
actually do it more cheaply than someone of moderate skill, because
they can get it done quickly, even at a higher per-hour rate, it's going
to be cheaper ... but I'd still try to get them to bid for the project, not
per hour, as you don't want someone who's going to vastly under-
estimate the hours, then end up billing you 2-3x their estimate ... of
course, bids for the whole project means they have to pad it out some,
so it'll seem higher up front, but it'll likely be lower in the end.

Of course, you also risk someone who underestimates the work,
bids it out, but then gets in so far over their head that they give up,
and you never see anything ... so I'd recommend checking references,
to try to mitigate this problem.  Working with a company rather than
an independent person usually helps with this case, as they don't
want the bad reputation from  something like this happening ... 
and they can throw extra people at it to get it done and out of their hair

-Joe

Re: [CODE4LIB] memory management for grownups

2011-08-30 Thread Joe Hourcle

On Aug 30, 2011, at 3:55 PM, Simon Spero wrote:

 On Tue, Aug 30, 2011 at 12:56 PM, Ken Irwin kir...@wittenberg.edu wrote:
 
 I have a feeling it may be time for me to learn some grown-up programming
 skills, and I hope someone here might be able to help.

[trimmed]


 Sometimes it can make sense to use the database to do the aggregation; e.g.
 
 CREATE TABLE Summary AS
  SELECT inst,patron_type,item_barcode,min(date)  first,
 min(call_no),min(renewals)  min_renewals, max(renewals)  max_renewals
FROM Renewals
GROUP BY inst,patron_type,item_barcode;


Wow ... I didn't realize I was that asleep.  No wonder I had such an
unproductive day.

I'll second Simon's recommendation.  There's no reason to pull this
into PHP if you can do it all in the database, which is quite likely based
on what was described.

-Joe

Re: [CODE4LIB] internet explorer and pdf files

2011-08-29 Thread Joe Hourcle

On Aug 29, 2011, at 3:30 PM, Eric Lease Morgan wrote:

I need some technical support when it comes to Internet Explorer (IE) and PDF
files.

Here at Notre Dame we have deposited a number of PDF files in a Fedora
repository. Some of these PDF files are available at the following URLs:

*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1
*
http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1

Retrieving the URLs with any browser other than IE works just fine.

Unfortunately IE's behavior is weird. The first time someone tries to load
one of these URL nothing happens. When someone tries to load another one, it
loads just fine. When they re-try the first one, it loads. We are banging our
heads against the wall here at Catholic Pamphlet Central. Networking issue?
Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus
issue?

Could some of y'all try to load some of the URLs with IE and tell me your
experience? Other suggestions would be greatly appreciated as well.

I don't have IE to test from, but it's been my experience that in past versions
of IE, it would use the file's extension no matter what the mime-type sent was.

I'd first see if you can trick IE ... it looks like Fedora doesn't like you
sending extra stuff in PATH_INFO, so you might have to abuse QUERY_STRING for
this:

http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1/?filename.pdf

http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1/?file=filename.pdf

If either of those work fine in IE, but the first one doesn't, that's the
problem.

I don't know what's possible in Fedora, so I don't know if it's possible to do
some URL re-writing so it'd always serve something that IE accepts as a PDF.
If you could insert an extra HTTP header, you might be able to trick it with
Content-Disposition, but that'll also tell some browsers to download the file
rather than display it themselves:

http://www.ietf.org/rfc/rfc2183.txt

-Joe

Re: [CODE4LIB] internet explorer and pdf files

2011-08-29 Thread Joe Hourcle

On Aug 29, 2011, at 3:52 PM, Godmar Back wrote:

 Earlier versions of IE were known to sometimes disregard the Content-Type
 (which you set correctly to application/pdf) and look at the suffix of the
 URL instead. For instance, they would render HTML if you served a .html as
 text/plain, etc.
 
 You may try creating URLs that end with .pdf
 
 Separately, you're not sending a Content-Length header:
 
 HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Server: Apache-Coyote/1.1
  Pragma: No-cache
  Cache-Control: no-cache
  Expires: Wed, 31 Dec 1969 19:00:00 EST
  Content-Type: application/pdf
  Date: Mon, 29 Aug 2011 19:47:27 GMT
  Connection: close
 Length: unspecified [application/pdf]
 
 which disregards RFC 2616,
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13


RFC2616 says 'SHOULD' for that section.

HTTP/1.1 clients *must* support chunked encoding:

http://en.wikipedia.org/wiki/Chunked_transfer_encoding

(which is why any time I write an HTTP client, I always claim to be
HTTP/1.0, so I don't have to support it)

If the data's stored on disk compressed, and being decompressed
on the fly, it's pretty typical to not send Content-Length.  (although,
you could argue that they should save it when storing the value,
so it's available when serving without needing to decompress
first).

-Joe

Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested

2011-08-03 Thread Joe Hourcle

On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote:

 Dear All,
 
 My colleague came with this query and I hope some of you could give us some
 ideas or suggestion:
 
 Our Digital Multimedia Center (DMC) scanning project can produce very large
 PDF files. They will have PDFs that are about 25Mb and some may move into
 the 100Mb range. If we provide a link to a PDF of that large, a user may not
 want to try to download it even though she really needs to see the
 information. In the past, DMC has created a lower quality, smaller versions
 to the original file to reduce the size. Some thoughts have been tossed
 around to reduce the duplication or the work (e.g. no more creating the
 lower quality PDF manually.)
 
 They are wondering if there is an application that we could point to the end
 user, who might need it due to poor internet access, that if used will
 simplify the very large file transfer for the end user. Basically:
 - a client software that tells the server to manipulate and reduce the file
 on the fly
 - a server app that would to the actual manipulation of the file and then
 deliver it to the end user.
 
 Personally, I'm not really sure about the client software part. It makes
 more sense to me (from the user's perspective) that we provide a download
 the smaller size of this large file link that would trigger the server-side
 apps to manipulate the big file. However, we're all ears for any suggestions
 you might have.


I've been dealing with related issues for a few years, and if you have
the file locally, it's generally not too difficult to have a CGI or similar
that you can call that will do some sort of transformation on the fly.

Unfortunately, what we've run into is that in some cases, in part because
it tends to be used by people with slow connections, and for very large
files, they'll keep restarting to the process, and because it's a generated
on-the-fly, the webserver can't just pick up where it left off, so has to
re-start the process.

The alternative is to write it out to disk, and then let the webserver
handle it as a normal file.  Depending on how many of these you're
dealing with, you may have to have something manage the scratch
space and remove the generated files that haven't been viewed in
some time.

What I've been hoping to do is:

1. Assign URLs to all of the processed forms, of the format:
http://server/processing/ID
(where 'ID' includes some hashing in it, so it's not 10mil 
files in a directory)

2. Write a 404 handler for each processing type, so that
should a file not exist in that directory, it will:
(a) verify that the ID is valid, otherwise, return a 404.
(b) check to see if the ID's being processed, otherwise, kick
off a process for the file to be generated
(c) return a 503 status.

Unfortunately, my initial testing (years ago) suggested that no
clients at the time properly handled 503 requests (effectively,
try back in (x) minutes, and you give 'em a time)

The alternative is to just basically sleep for a period of time, and
then return the file once it's been generated ... but for ones
that take some time (some of my processing might take hours,
as the files that it needs as input are stored near-line, and we're
at the mercy of a tape robot)

...

You might also be able to sleep and then use one of the various
30x status codes, but I don't know what a client might do if you
returned the same URL.  (they might abort, to prevent looping)

-Joe

Re: [CODE4LIB] Advice on a class

2011-07-26 Thread Joe Hourcle

On Jul 26, 2011, at 3:31 PM, Lepczyk, Timothy wrote:

 Thanks everyone.  The reasons I thought of taking the C course is a) it's 
 free, b) concepts might be transferrable to other languages.  I may continue 
 to focus on Ruby on Rails.


Before everyone manages to scare you away from learning C,
if you're going to be doing a lot of programming, it's useful to
learn other languages so you can see how they handle
different tasks.  C is particularly useful as a lot of other language's
implementations were primarily written in C.

In college, I took a 68k assembly course ... I've never done
*any* assembly since then, but it makes you appreciate the
issues in optimization, and just how low-level you need to get
when talking to processors.

With C, pointers and pointer arithmetic are a bit of a pain,
and strongly-typed languages aren't the greatest for all
tasks ... and don't get me started on C-strings ... but you'll
learn a lot ... even just where to look for people screwing
up their assumptions  creating security problems because
of off-by-one issues or screwing up the length of their strings
or neglecting their garbage collection.

... and, understanding C will also help you when it comes
time to install stuff, especially if you're trying to port someone's
linux-centric code to Solaris or MacOS.

As for the stuff that translates:

searching for the missing semi-colon
error messages that make no sense
finding the 'smart quote' that your lab partner pasted 
in because they do their editing in MS Word.

um ... I'm not selling this very well, am I?

Anyway ... C is a useful language ... almost all higher languages
have some way of binding to C code, and if nothing else,
learning it means you'll be able to port over someone's 
1k line C program into 20 to 40 lines of whatever other modern
language you prefer.

-Joe

Re: [CODE4LIB] REST interface types

2011-07-20 Thread Joe Hourcle

On Jul 19, 2011, at 11:33 AM, Ralph LeVan wrote:

 Where at all possible, I want a true REST interface.  I recognize that
 sometimes you need to use POST to send data, but I've found it very helpful
 to be able to craft URLs that can be shared that contain a complete request.

But there's more than just the URL that's used in REST.  As it uses
HTTP, you could vary the response based on the Accept-Language or
Accept headers.

Some implementations use file extensions in place of Accept, but
then you're assigning URIs to the container and not the contents.
Am I trying to identify the data, or the data formatted as XML?

Language is a bit messier, as it's the content, but when we're
looking up something like in Dewey ... are we trying to identify the
DDC 600s, or specifically the German labels for the 600s?  Dewey.info
packs it in the URL:

http://dewey.info/class/6/2009/03/about.de

But am I supposed to know that the english and french don't
share the same root as the german?

http://dewey.info/class/6/2009/08/about.en
http://dewey.info/class/6/2009/08/about.fr


Some groups will then put this in either as part of the QUERY_STRING
or pass it in via PATH_INFO. 

-Joe

Re: [CODE4LIB] TIFF Metadata to XML?

2011-07-19 Thread Joe Hourcle

On Jul 19, 2011, at 10:34 AM, Stern, Randall wrote:

 Also, see FITS (http://code.google.com/p/fits/)
 
 FITS is an open source java toolset we wrote that wraps JHOVE, ExifTool,
 and several other format analysis tools and produces a single XML output
 stream. It also includes a crosswalk to MIX XML as an optional output.


Really?  You named a tool that deals with image data 'FITS' ?

You do realize there's actually a 30+ year old image standard called FITS:

http://fits.gsfc.nasa.gov/

(which has its own metadata standard, just to make things even more 
interesting)

-Joe

Re: [CODE4LIB] TIFF Metadata to XML?

2011-07-18 Thread Joe Hourcle

On Jul 18, 2011, at 9:18 AM, Edward M. Corrado wrote:

 Hello All,
 
 Before I re-invent the wheel or try many different programs, does
 anyone have a suggestion on a good way to extract embedded Metadata
 added by cameras and (more importantly) photo-editing programs such as
 Photoshop from TIFF files and save it as as XML? I have  60k photos
 that have metadata including keywords, descriptions, creator, and
 other fields embedded in them and I need to extract the metadata so I
 can load them into our digital archive.
 
 Right now, after looking at a few tools and having done a number of
 Google searches and haven't found anything that seems to do what I
 want. As of now I am leaning towards extracting the metadata using
 exiv2 and creating a script (shell, perl, whatever) to put the fields
 I need into a pseudo-Dublin Core XML format. I say pseudo because I
 have a few fields that are not Dublin Core. I am assuming there is a
 better way. (Although part of me thinks it might be easier to do that
 then exporting to XML and using XSLT to transform the file since I
 might need to do a lot of cleanup of the data regardless.)
 
 Anyway, before I go any further, does anyone have any
 thoughts/ideas/suggestions?

I haven't (yet) used it myself, but Exiv2 ( http://www.exiv2.org )
supports reading and writing XMP, EXIF and IPTC metadata from
a large number of file formats.

-Joe

Re: [CODE4LIB] Trends with virtualization

2011-07-11 Thread Joe Hourcle

On Jul 11, 2011, at 11:21 AM, Madrigal, Juan A wrote:

 Its true what they say, history does repeat itself! I don't see how
 virtualization is much different from
 a dummy terminal connected to a mainframe. I'd hate to see an entire
 computer lab go down should the network fail.
 
 The only real promise is for making web development and server management
 easier.

re: web development

I assume by that you're talking about cases like Citrix, where they
force you to come in from the same OS  web browser version, so
they don't have to worry about Firefox rendering differently from
Safari, or the IE6 vs. 7, etc.

It's okay for an intranet, but I don't know that it's a good idea for
general web usage, as they normally force people to use some
outdated browser, as the web applications always seem to be
designed for IE6, and never tested on anything else.

(if they were, they then try to serve down alternative versions
using browser detection, which in my experience is more likely
to make things worse)


...

The only reason I've heard to virtualize desktops wasn't for
monetary considerations, and wasn't for general word processing
and such ...

it was for workstations for scientific processing.  By using virtualized
servers, you can more easily take snapshots of the machine's state
to archive it, and later restore it to re-run the software.  This gives you
two advantages:

(1) reduced down-time for patching / upgrading software -- you
patch the image, then push the image into the processing
pipeline.

(2) Because you've archived the OS, libraries and all software,
you have something you can analyze should someone
identify problems with the data processing such as
discontinuities after an update.

I could see the first one being useful for most groups, but with
tools like puppet and chef, it might not be a big deal. 

I can't remember what the software was that the university I
formerly worked for used in their computer labs -- it basically reset
the machine on each login, in hopes to prevent someone from
installing malware (intentionally or accidentally) that would then
affect later users.   And then once a week each lab was closed
down so they could do a complete re-format and re-image of
each machine ... you might be able to do something similar
with virtual desktops.

-Joe

Re: [CODE4LIB] exposing website visitor IP addresses to webcrawlers

2011-05-23 Thread Joe Hourcle

 On May 20, 2011, at 10:35 AM, Keith Jenkins wrote:
 
 Just out of curiosity, does anyone on this list have any opinions
 about whether website owners should publicly post lists of their
 visitors' IP addresses (or hostnames) and to also allow such lists to
 be indexable by search engines?
 
 For example:
  https://www3.ietf.org/usagedata/site_201104.html
 
 Keith


Somehow I missed this when it went by originally ...

For websites being hosted by the federal government, although it's
not considered PII (Personally Identifiable Information), most privacy
policies state that we won't share information with third parties, and that
we only use server logs for diagnostics and tuning.

We're actually required to destroy our webserver logs within 30 days
of rolling them, or at the very least, anonymize them.  We specifically
do *not* allow access logs or reports to be accessed from outside our
local network.  If nothing else, posting logs and/or reports invites
'referrer spam' :

http://en.wikipedia.org/wiki/Referrer_spam

And even if you're not posting referrer information, they'll embed
it in the QUERY_STRING to connections to your site, so you'll have
requests for:

http://yoursite.example.edu/?http://spammer.example.com

Which show up in most logs as:

/?http://spammer.example.com

...

I'd say there is *no* reason to make any of your logs, raw or processed,
visible to search engines.  If your administration insists on being
able to see reports remotely, put them behind some sort of
authentication.  (although, in our case, authentication means more
paperwork we have to fill out)

-Joe

Re: [CODE4LIB] Jpeg2000 and XMP metadata

2011-03-23 Thread Joe Hourcle

On Mar 23, 2011, at 9:45 AM, Richard, Joel M wrote:

 Morning, all! 
 
 I thought I'd crowdsource this question. 8+ hours of beating up on this and I 
 haven't found a good solution.
 
 We have some software that processes the scanned pages of a book. They come 
 to me as TIFF and I am converting to JP2 in order to upload to the Internet 
 Archive. The trouble is that I can't find a reliable piece of code or a 
 process to add XMP metadata to the JP2. (FWIW, we're using the Jasper library)
 
 - ImageMagick (PHP+Imagick) doesn't seem to support XMP in JP2 (or adding 
 profiles to JP2 at all)
 
 - GraphicsMagick crashes with malloc errors on images that are too big, and I 
 am unwilling to recompile to 64-bit and simply hope for the best. Our images 
 are large, though, and something is dying between GM and Jasper.
 
 - exiftool doesn't seem to be working either.
 
 I'm working in PHP, so that would be a preferred language. If necessary I can 
 always drop back to the command line to run a script or whatever. 
 
 Is anyone else doing this type of thing? Any help or advice would be most 
 welcome.

I've never used it, but exiv2 claims to support JP2  XMP writing:

http://www.exiv2.org/

(not PHP directly, but could be called via the shell)

-Joe

Re: [CODE4LIB] Need Apache log file analyzer for Mac OSX

2011-03-17 Thread Joe Hourcle

On Mar 17, 2011, at 1:16 PM, Tim McGeary wrote:

 Does anyone know of a good (and free) Apache log file analyzer for Mac OSX?  
 I have sets of Apache web logs that I need to analyze off server.

I've been using analog for years:

http://www.analog.cx/

The config syntax takes a little getting used to, but it generates HTML reports 
for just about anything.

I also know people who are fans of webalizer, but I don't like how it only
gives a month's report at a time:

http://www.mrunix.net/webalizer/

-Joe

Re: [CODE4LIB] online course on the semantic web?

2011-03-05 Thread Joe Hourcle

On Mar 5, 2011, at 3:01 PM, Cindy Harper wrote:

 Well, I just walked my 80-year-old mother through setting up her wireless
 router and wireless on her desktop and laptop via telephone NY-to-VA, and
 now I feel like I can think about another challenge for the coming
 season(s). Does anyone know of a good online course that's an introduction
 to semantic web technology that they could recommend? My goals are simply to
 understand more and be able to code a little, and afterward applying it to
 linked data?  I know of one course this summer at Johns Hopkins Engineering
 for Professionals program
 http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993, but it's
 rather pricey. Anyone know of cheaper options or creative ideas for funding?


I don't know how introductory it'd be, but ASIST has been doing a lot of
'webinars' this year, and there are ones coming up on the 9th and 13th on
linked data, and the first one sounds like it'll cover some semantic web
issues::

http://asis.org/Conferences/webinars/2011/linked-data.html

(I can't compare prices to the JHU one, as I didn't see any pricing on the
JHU site; this round of ASIST webinars are $25 for members, $59 for
non-members; some in the past have been free for ASIST members)

Also, looking at MIT's Open Courseware catalog, I see a few individual
lessons that might be applicable:

http://ocw.mit.edu/index.htm

In the past, I've looked at some of the courses from W3schools (not affiliated
with W3C, but has some tutorials on various things related to the web).  They
tend to be fairly introductory, but they have two that might be of interest:

http://www.w3schools.com/rdf/default.asp
http://www.w3schools.com/semweb/default.asp

-Joe

-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] online course on the semantic web?

2011-03-05 Thread Joe Hourcle

On Mar 5, 2011, at 3:40 PM, Cindy Harper wrote:

 Now that I think about it, this may be an opportunity to apply another idea
 that I was exploring in another context:  I had written to syslib-l looking
 for anyone interested in collaborating on a staff technology training wiki
 that would link staff to free and authoritative web-based resources on a
 range of technology training subjects.  Would anyone be interested in
 applying that idea to code4lib technology learning?  How much effort would
 be required for someone who's well acquainted with the Semantic Web to
 contribute to a site that lists texts or curriculum for those who are
 interested in learning? I don't know if this is doable. Anyone interested?
 Or should I just find myself a text and wade through it?

I want to say that I remember someone presenting on some sort of
modular courses to either be used as part of a library, museum or 
comp sci curriculum to deal with digital archives.  I want to say it was
IMLS funded.

Basically, it was so that faculty could pick  choose different courses
to use as a basic course on the topic.

I think I found the correct panel, but I'm not sure who it was who
presented on that particular topic.  (I was sick  kept myself drugged
up on dayquil for that whole meeting)

http://www.ils.unc.edu/digccurr/asist2009_panel_paper.pdf

Um ... I think this is the project, Digital Library Curriculum Project!
(NSF funded, not IMLS, though):

http://curric.dlib.vt.edu/

Unfortunately, it doesn't look like they (yet) have anything on the
Semantic Web, but I think there's a lot of overlap with what you're
proposing.

-Joe

-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] Apache URL redirect

2011-02-03 Thread Joe Hourcle

On Feb 3, 2011, at 4:42 PM, Nate Hill wrote:

 Hi - I'm new to Apache and hope that someone out there might be able to help
 me with a configuration issue over at San Jose Public Library.
 
 I need to have the URL www.partnersinreading.org redirect to
 http://www.sjpl.org/par
 Right now if you go to www.partnersinreading.org it takes you to the root,
 sjpl.org and then if you navigate through the site all the urls are
 rewritten with partnersinreading as the root.
 That's no good.
 
 I went into Apache's httpd.conf file and added in the mod_alias area:
 Redirect permanent http://www.partnersinreading.org/ http://www.sjpl.org/par

But the argument to match for redirecting is the local path, not the URL, so 
you'll
have to either do some environmental matching, or put it in a virtual host block

I'm used to mod_rewrite, so I'd probably do something like:

RewriteCond %{HTTP_HOST} ^www\.partnersinreading\.org$
RewriteRule ^/(.*)  http://www.sjpl.org/par/$1 [L,R=301]

(that assumes that you've replicated the directory structure on the new site)

-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] Apache URL redirect

2011-02-03 Thread Joe Hourcle

On Feb 3, 2011, at 5:21 PM, Nate Hill wrote:

 Thank you for your responses...
 Virtual host setup was also on the agenda, guess both things have to happen
 at the same time.

You don't have to set up virtual hosts with the method that both
Brian and I mentioned, although the syntax is a little more 
confusing for people who might not be used to mod_rewrite.

You only need virtual hosts if you want your method (Redirect) to work:

NameVirtualHost *

VirtualHost *
ServerName www.partnersinreading.org
DocumentRoot /set/to/point/somewhere/

Redirect permanent / http://www.sjpl.org/par
/VirtualHost

-Joe

[CODE4LIB] Job Opportunity for Web Developer Focusing on Linked Data (fwd)

2010-12-23 Thread Joe Hourcle


I'm just passing this along ... I know nothing about the actual job.
The bad formating of the message is probably my fault -- I prefer 
plain text email, and that can sometimes do interesting things to 
messages.  (random unknown characters, etc.)


If you have questions, I'd suggest contacting Gail Hodge (address below)

-Joe


-- Forwarded message --
Date: Wed, 22 Dec 2010 22:01:39 -0500
From: Gail Hodge gho...@iiaweb.com
To: onei...@grace.nascom.nasa.gov
Subject: Job Opportunity for Web Developer Focusing on Linked Data

Dear Joe,

We're looking for a Web Developer to focus on Linked Data and related
applications. Would you or anyone you know be interested? Could you pass
this around to some relevant lists? Rob Raskin has already sent it to the
ESIP SW list and I'll send it to SIG STI.

Thanks,

Gail




*Web Developer ? Linked Data *





Information International Associates, Inc. (IIa), an award-winning
information and knowledge management company, is seeking a Web Developer.
This position involves the ongoing website and application development,
maintenance, database development, and implementation for various website
applications for a variety of US Federal government agencies. The
applications will specifically focus on new and emerging technologies such
as linked open data, semantic technologies, ontologies, RDF and RDFa as
applied to both text and data, and Web 2.0 and social media applications
such as RSS and Twitter. Depending on the location of the successful
candidate, the home office may be located in Hyattsville, MD or Falls
Church, VA.



Responsibilities:



?  Designing/developing websites based on needs analysis and scope of
work.

?  Designing/developing the necessary back-end database and necessary
SQL calls and   web services for the applications.

?  Deploying applications.

? Designing, creating and deploying linked open data applications,
including mash-ups.

? Creating and managing triple stores based on existing relational
databases.

? Designing, developing and deploying Web 2.0 and social media
applications.

?  Continued maintenance, development, and troubleshooting for
applications.

 ?  Documenting the website code and applications.



Requirements:



?  Bachelor of Science Degree in Computer Science or other related
field(s).

?  Extensive knowledge of HTML, JavaScript, DHTML, PHP.

? Knowledge of linked open data and various semantic web
technologies and standards, including RDF, RDFa, etc. Knowledge of URIs.

?  Extensive knowledge of MySQL, SQL Server, ODBC.

? Knowledge of web services (SOAP and REST).

?  Dreamweaver/similar development environments.

 ?  Excellent verbal and written communications skills.



Desired (not required) experience:



?  Tomcat, Java, C#, .NET

?  Oracle, Excel and MS Access



  To apply online please access IIa Careers at:
https://www7.ultirecruit.com/INF1002/JobBoard/JobDetails.aspx?__ID=*3BAC1347B2106567

Re: [CODE4LIB] LDAP Issues

2010-10-06 Thread Joe Hourcle


On Wed, 6 Oct 2010, Amy wrote:


We are having a problem with a single student whose account was deleted from
LDAP by Technology, and then had her account re-established.   She has the
same username and status as she used to have.

She is now unable to login to any of the library resources that use LDAP to
authenticate patrons.  This includes our catalog  e-resources (through III)
and a Ruby on Rails group study room web application that uses LDAP
authentication.

Has anyone had any experiences like this before or any thoughts/speculation
on how to fix?


.. this is why it's a good idea to lock accounts for a period before 
they're deleted fully.


But anyway ...

LDAP's used for authentication, but what's used for authorization?
(ie, we use a login  password to confirm they're who they say they are, 
but what says that person's allowed to use the system?)


Sometimes it's stored in a field withing LDAP, sometimes it's stored in a 
separate system with a foreign key into LDAP.  (which *might* be the 
login / uid / cn (common name) / dn (distinguished name), etc.)


I've seen a few systems that use an assigned ID as the user component of 
the DN, rather than the UID / login, so should the user ever need to 
change the name of the account (eg, they get a name change, and want to 
change their login), they don't have to re-authorize them in all of the 
systems.  (of course, this means that a delete  recreate, even with the 
same name has issues).


If I were trying to debug it, I'd try to get an ldif dump of their entry, 
and compare that to someone created through 'normal' means, and see if 
there's anything that looks strange (missing fields, random serial 
numbers, something incremented (eg. John-Smith-2).


-Joe

Re: [CODE4LIB] Workflow analysis of archival arrangement and description

2010-08-31 Thread Joe Hourcle


On Fri, 27 Aug 2010, Mark A. Matienzo wrote:


I'm currently looking for any workflow and business process analysis
of the processes involved in processing archival collections. At this
point, I'm hoping to find fairly high-level information, ideally in
the form of or easily translatable into workflow diagrams to serve as
a strawman. Processing manuals may help, but likely are too detailed
for my current purposes. Please let me know if you have anything that
might help.


Have you already looked at OAIS?  (Open Archival Information System)

It's a reference model, so goes over the sorts of things that digital 
archives should do as they're ingesting / storing / disseminating things, 
but isn't a specific implementation.


Current version (2002):

http://public.ccsds.org/publications/archive/650x0b1.pdf

There's also drafts as they're working towards making it an ISO standard. 
I'm not sure if this is the most recent version or not, but it matches the 
last draft ID (p-1-1) up for review on the CCSDS site:


http://ddp.nist.gov/refs/650x0p11_OAIS_pink_book.pdf


-Joe

Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)

2010-06-24 Thread Joe Hourcle


On Thu, 24 Jun 2010, Simon Spero wrote:


Haven't seen any concrete plans.


I'm fine with the same plan as last year -- meet in front of RFD on 
Monday, but from your comments you'll be gone by then.



[trimmed]


If folks can make it out to McLean,VA on Saturday the 26th (Falls Church
East/West are closest Metro), we have propane, with a chance of dead things,
animal or vegetable. Minerals are available from the Department of the
Interior through the usual procedures.

Any interest?


Interest, yes, but also a conflict due to family stuff so won't be able to 
make it.  (and as I live in PG, and have to be in Howard county, it's 
basically on the other side of the world)



-Joe

Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)

2010-06-24 Thread Joe Hourcle


On Thu, 24 Jun 2010, Schwartz, Raymond wrote:


What is RFD?


A restaurant in DC.  Here's what was sent out the last time:


 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Monday, June 18, 2007 11:33 AM
 To: CODE4LIB@listserv.nd.edu
 Subject: [CODE4LIB] Informal get together Monday of ALA

 Some of us have spontaneously decided to have an informal Code4Lib get
 together the Monday of ALA in DC.

 We will meet on Monday the 25th of June at 8pm, at RFD, which
 reccomended by anarchivist, which appears to be a pub and 
 Washington's Largest Multi-Tap.  It's located just a couple blocks 
 from the convention center.


 http://www.lovethebeer.com/rfd.html

 Some of the Talis crew have said they will be there. I will be there.
 Anarchivist and edsu have said they'll be there. (I forget if I just 
 made up edsu).


 Please join us! Any and everyone interested in meeting code4lib folks 
 or other assorted library technologists and library geeks and hangers 
 on are welcome.


 No, I wasn't planning on making a reservation or anything. No, I have 
 no idea how we'll all find each other. I think it'll work out.


 Jonathan



I assumed we'd go with the same ast last time -- Monday, 8pm, just show 
up and we'll figure something out.


(it worked out okay last time).

-Joe

Oh ... and for those from Tucson, it might be the largest multi-tap in DC, 
but it's only like 1/2 of what 1702 has)

Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)

2010-06-24 Thread Joe Hourcle


On Thu, 24 Jun 2010, KREYCHE, MICHAEL wrote:


Since no one else has asked, does Monday the 25th mean Monday (the 28th) or 
Friday (the 25th).


Monday, the 25th, 2007.

('the last time' being the last time this was done, in 2007).

So, the proposal is:

Monday, June 25th, 2010
Meet in front of RFD at 8pm
801 7th St, NW., Washington, DC

http://www.lovethebeer.com/rfd.html

If it's 6 people and intimate, or closer to the 2 dozen we had last time, 
we'll make it work.


-Joe



-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On
Behalf Of Joe Hourcle
Sent: Thursday, June 24, 2010 3:32 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Cookout in McLean on Saturday? (was
Re: [CODE4LIB] Get together in DC during ALA?)

On Thu, 24 Jun 2010, Schwartz, Raymond wrote:


What is RFD?


A restaurant in DC.  Here's what was sent out the last time:


-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Monday, June 18, 2007 11:33 AM
To: CODE4LIB@listserv.nd.edu
Subject: [CODE4LIB] Informal get together Monday of ALA

Some of us have spontaneously decided to have an informal

Code4Lib get

together the Monday of ALA in DC.

We will meet on Monday the 25th of June at 8pm, at RFD, which
reccomended by anarchivist, which appears to be a pub and
Washington's Largest Multi-Tap.  It's located just a

couple blocks

from the convention center.

http://www.lovethebeer.com/rfd.html

Some of the Talis crew have said they will be there. I

will be there.

Anarchivist and edsu have said they'll be there. (I

forget if I just

made up edsu).

Please join us! Any and everyone interested in meeting

code4lib folks

or other assorted library technologists and library geeks

and hangers

on are welcome.

No, I wasn't planning on making a reservation or

anything. No, I have

no idea how we'll all find each other. I think it'll work out.

Jonathan



I assumed we'd go with the same ast last time -- Monday, 8pm,
just show
up and we'll figure something out.

(it worked out okay last time).

-Joe

Oh ... and for those from Tucson, it might be the largest
multi-tap in DC,
but it's only like 1/2 of what 1702 has)

Re: [CODE4LIB] SMS headers in email-sms

2010-06-09 Thread Joe Hourcle


On Wed, 9 Jun 2010, Ken Irwin wrote:

We originally tried changing the From and Reply-To mail headers, but the 
phones we tested on didn't honor the email headers. Instead they show an 
address @www6.wittenberg.edu (ie, our web server). That's why I was 
thinking there would be some sort of SMS-equivalent-header that it cared 
about more.


Are you changing the line 'From:' (the From header) or 'From ' (the 
envelope from, which is part of the SMTP protocol's routing, and *not* 
part of the e-mail message)


I don't know if this will help or not, but it sounds like the -f flag is 
the way to go:



http://stackoverflow.com/questions/179014/how-to-change-envelope-from-address-using-php-mail

-Joe





-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Thomas 
Bennett
Sent: Wednesday, June 09, 2010 9:36 AM


I don't know if this will be any help but you would need to replace the reply-
to header I expect.

Thomas

Re: [CODE4LIB] drupal question

2010-06-04 Thread Joe Hourcle




On Fri, 4 Jun 2010, Nate Vack wrote:


On Fri, Jun 4, 2010 at 2:02 PM, Jill Ellern ell...@email.wcu.edu wrote:


I know we can put this open source software on a PC...and we've done that but 
this isn't a solution for a production level web service

What is the average cost of hosting a drupal server out there in the cloud?  
Are there things we should know?  Would you recommend anyone that does this for 
libraries?


It all depends on what production level web service means to you --
do you get lots of traffic? A little? Do you want to call someone on
the phone when it goes pear-shaped? Even if the problem is with a
customization you're making? How much downtime is OK? How snappy does
it need to be?

It sounds a bit like your IT department is trying to give you a
brush-off (This sounds like a pain. Let's make them use a dedicated
server, and say it'll cost INFINITY DOLLARS.) Sitting down with
someone, being very clear with your expectations for support, and
finding out what their major concerns are might help.


As someone who's worked as a sysadmin for an ISP, a university IT 
department and a government agency that is the target of a lot of 
intrusion attempts, let me tell you that it *is* a pain.


For the first one.

The incremental cost is insignificant, but each new piece of software that 
you have to support is yet another round of finding out how the server 
needs to be tuned ; if it plays well with other software you're running ; 
more websites to watch for security updates ; more patches to apply ; more 
log files to watch to suspicious activity.


Once you're hosting 10+ of the same piece of software, the incremental 
cost is relatively insignificant -- but that first instance sure as hell 
is not cheap in terms of man-hours (if you don't want your machine getting 
hacked, then get your domain black listed when someone starts pumping mail 
through it, etc, etc.)




Generally, hosting will run something like $5-10/month for cheap
shared hosting, and maybe $30-40/month for a small VPS.


Yes, for a site that already has lots of Drupal instances they're already 
maintaining.


-Joe

[CODE4LIB] SRU 2.0 / Accept-Ranges (was: Inlining HTTP Headers in URLs )

2010-06-02 Thread Joe Hourcle


On Wed, 2 Jun 2010, Jonathan Rochkind wrote:


Joe Hourcle wrote:


  Accept-Ranges is a response header, not something that the client's 
supposed to be sending.


Weird. Then can anyone explain why it's included as a request parameter in 
the SRU 2.0 draft?   Section 4.9.2.


They're not the only ones who think it's a client header:

http://en.wikipedia.org/wiki/List_of_HTTP_headers

(which of course shows up #1 on google for 'http headers')

It looks like someone decided to split it into two tables:


http://en.wikipedia.org/w/index.php?title=List_of_HTTP_headersoldid=183353617

And within a week, someone decided to add Accept-Ranges where it didn't 
belong:



http://en.wikipedia.org/w/index.php?title=List_of_HTTP_headersoldid=184742665

...

I'm guessing it's a mistake -- either the SRU authors looked at the 
Wikipedia entry, or they also misread the intent of the HTTP header in the 
RFC.


Do we have anyone affiliated with the project on this list who can make a 
correction before it leaves draft?


-Joe

Re: [CODE4LIB] Inlining HTTP Headers in URLs

2010-06-01 Thread Joe Hourcle


On Tue, 1 Jun 2010, Jonathan Rochkind wrote:

Accept-Ranges, I have no idea, I don't understand that header's purpose 
well enough. But SRU also provides a query param for that, it seems less 
clear to me if that's ever useful or justifiable.


Accept-Ranges is a response header, not something that the client's 
supposed to be sending.


The client sends a 'Range' header (with an optional 'If-Range' if you're 
concerned with the resource having changed), and in response, the server 
sends a 206 status with a 'Content-Range' header.


See
http://labs.apache.org/webarch/http/draft-fielding-http/p5-range.html

...

I only know of two values for 'Accept-Ranges' -- none (ie, I don't accept 
partial downloads) and bytes, so for incomplete downloads you can start 
where you left off.  If you know the file's excessively large, I guess you 
could use it to transfer it in parallel to abuse the TCP congestion rules. 
(or if you have a way of knowing that there are multiple mirrors, to 
spread the load across servers).


-Joe

Re: [CODE4LIB] It's cool to love milk and cookies

2010-05-03 Thread Joe Hourcle

You know, there are some of us who are milk intolerant on this mailing 
list.


And emacs intolerant, too.  (although, I did use 'ee' as my editor in elm, 
but elm took too long to support MIME, so I switched to pine, with their 
pico default editor, but I don't use any of those I mentioned for coding, 
even though I am in pico/pine right now, as I still haven't switched to 
alpine or mutt)


-Joe

Re: [CODE4LIB] code4lib server downtime needed

2010-04-28 Thread Joe Hourcle


On Wed, 28 Apr 2010, Ryan Ordway wrote:


I need to move the server that hosts the code4lib.org website into another rack 
to make room for some other equipment, when is a good time to do this?


You power down machines when moving them?

Oh, sure, do it the easy way.

(After waiting 2 months for the university I was working for to approve a 
maintenance window, as we had a machine lift and the machine had two power 
taps, I ran an extension cord and a long ethernet cable, swapped them in, 
pulled the machine as far out of the rack as it'd extend on its rails, 
brought the lift up from under it, ejected it from the rails, rolled it 
out the way, moved the rails to the new rack, rolled the lift over to the 
new place, cranked the lift to the new height, re-ingaged the rails, and 
then swapped back to the new rack's power and patch panel.  Only ~ 2 min 
of downtime, and that was because the switch took 60 sec to test to make 
sure there wasn't a loop when you changed connections.)



-Joe

Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle


On Mon, 12 Apr 2010, Jonathan Rochkind wrote:

So, as usual, the right tool for the job. If all you really need is a 
key-value store on ID, then a NoSQL solution may be the right thing.  But 
if you need actual querrying and joining, then personally I'd stick with 
rdbms unless I had some concrete reason to think a more complicated 
nosql+solr solution was required.  Certainly if you are planning on using 
Solr _anyway_ because your application is a search engine of some type, that 
would lessen the incremental 'cost' of a nosql+solr solution.


I'm surprised that I keep hearing so much about NoSQL for key-value 
stores, and everyone seems to forget the *old* key-value stores, such as 
directory services (X.500 and LDAP, although that's actually the protocol 
used to query them, not the storage implementation).


Yes, there are things that LDAP doesn't do so well (relationships being 
one of them), but it supports querying, you can adjust the matching by 
attribute (ie, this one's matched as a number, this one's matched as a 
string, this one's a case insensitive string ... I think some 
implementations have functionality to run the search term through a 
functions for things like soundex, so it might be possible add hooks for 
stemming and query expansion, etc.)



I think that NoSQL got a lot of press because of Google having used it 
(and their having a *VERY* large data system -- but not everyone has that 
large of a system; also, Google did it 10+ years ago -- you can now 
through a lot more CPU and RAM at an RDBMS, so the point at which the 
database becomes a problem isn't the same as it was when Google first came 
out.)


...

So, I think that there are cases where NoSQL is the right solution for the 
job, and I think there are times when an DRBMS is the right solution ... 
there are also plenty of times for flat file databases, XML, LDAP, and a 
slew of other storage standards.


-Joe


hmm ... now I'm going to have to try to bring back my attempt to put my 
catalogs into a directory service ... I have a feeling I'm going to run 
into issues with unit conversions when searching.

Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?

2010-04-12 Thread Joe Hourcle


On Mon, 12 Apr 2010, Ryan Eby wrote:

[trimmed]


But I'm
guessing they've thought about the data and what benefits they would
get out of the backend.



Wow.  You obviously don't work with the same folks that I do.

I've been attached to one project for about 16 months now, while the rest 
of the team's been together for 4 years ... I've been trying to get a few 
changes made to better support my user community (basically, all of the 
people who don't have access to their system, or don't want to spend the 6 
months using the system 'to be able to do something almost useful'.


About 2-3 months ago, the main project team finally realized that they 
have *no*idea* what the user community wants or needs.


Oh, and they have to go live on April 21st.  I'm expecting a major 'wtf?' 
reaction from the majority of the community.


-Joe

Re: [CODE4LIB] Works API

2010-03-31 Thread Joe Hourcle


On Wed, 31 Mar 2010, stuart yeates wrote:


Jonathan Rochkind wrote:

Karen Coyle wrote:
The OL only has full text links, but the link goes to a page at the 
Internet Archive that lists all of the available formats. I would  prefer 
that the link go directly to a display of the book, and offer  other 
formats from there (having to click twice really turns people  off, 
especially when they are browsing). So unfortunately, other than  full 
text there won't be more to say.


In an API, it would be _optimal_ if you'd reveal all these links, tagged 
with a controlled vocabulary of some kind letting us know what they are, so 
the client can decide for itself what to do with them (which may not even 
be immediately showing them to any user at all, but may be analyzing them 
for some other purpose). 


Even better, for those of us who have multiple formats of full text (TEI XML, 
HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs to the full 
text, differentiated using the mime-type.


Would different forms of processing have different mime-types?  (ie, we 
can tell it's a PDF, but can we tell what's actually in it?)


Personally, for the different packaging formats, if you're going to be 
selecting using mime-type, I'd be inclined to hide it all behind a single 
URL -- the user agent could set the appropriate Accept header, so long as 
it's being served by HTTP.


...

I admit, it's possible that this works better for APIs than user browsing; 
they might prefer a PDF for digital library objects, but prefer HTML for 
other purposes.  We were hoping to allow users to set cookies to set their 
preferences on processing  packaging for our system, but I'm still 
waiting for a response to the paperwork that I filed to be allowed to use 
them.


(little known fact -- OMB M-00-13 outlaws cookies on all government 
websites; OMB M-03-22 spells out some of the procedures for being allowed 
around it, but I've given up trying to let them know, when they're set up 
so bad you can't even report themm [3])


-Joe


[OMB M-00-13] http://www.whitehouse.gov/omb/memoranda_m00-13/
[OMB M-03-22] http://www.whitehouse.gov/omb/memoranda_m03-22/
[3] http://politics.slashdot.org/comments.pl?sid=1021887cid=25678129

Re: [CODE4LIB] PHP bashing (was: newbie)

2010-03-26 Thread Joe Hourcle


On Fri, 26 Mar 2010, Doran, Michael D wrote:


As a first language, you want something that let's you Get Stuff Done
with a minimum of fuss...


If you are getting started and if you are not planning on being a 
full-time programmer, then you want to be looking at the high-level 
languages as Mike suggests: the strong candidates include Perl, Python, 
arguably PHP and my own favourite, Ruby...


Even *if* you are looking to be a full-time programmer, I'd recommend most 
people do stuff in higher-level languages.


That earlier development effort that I mentioned, the majority of their 
work is being done in C -- the system needs to go live in ~30 days, and 
they're *still* finding memory leaks (signs of poor garbage collection), 
string and integer overflows (one of the joys of strict typing), etc.


Yes, I've done a fair bit of C, and even a little assembler -- and it's 
fine, if you really, really, need the speed boost.  (and some people would 
argue that this might be a case where they *do* need it, but it'd have 
been more cost-effective to throw hardware at it, rather than a 10 person 
team for 2-3 years, even with their 100-node cluster; or better yet, wait 
to see what happens under real load, and optimize then, rather than 
building a system with no requirements, and no testing of simulated data 
flows until 2 months before launch)


I've had my share of problems in Perl, where it'd attempt to assume what I 
mean has led to problems.  (specifically, SOAP::Lite's attempt at guessing 
that a string full of numbers was an integer, not a string, and that a URL 
should be marked as such, and not a string ... once in a while, I'll hit 
one of the edge cases with braces where you have to force it as a block or 
a hash)


... but those are few and far between compared to the segfaults that I've 
gotten in trying to port their code over to a non-linux system.  (spent 
months on it, as each new version would either not fix the problem, or 
break new things ... it doesn't help they decided to write their own 
configuration and build tools because someone must've read 'recursive make 
considered harmful') ... we finally gave up and just bought new hardware 
for our caching sites, as we had a feeling that we'd have to keep going 
through these headaches every new update through the life of the mission.


... sorry, went off on a tangent again.

Anyway, the point is -- even us full-time programmers would rather be 
making new and interesting things, rather than trying to work around 
problems with our tools.  If I'm painting a room, the roller gets the job 
done fast, and I can deal with a brush for the corners and edges -- 
there's no reason to do the whole thing with a brush, and it'd just look 
like crap if I tried doing the whole thing with a roller.


Take the same approach in programming -- if you can do 90% of the work 
really fast and really well in one language, and have to do the other 10% 
in another language, it doesn't mean you need to do the *whole*thing* the 
slow way.


I believe that all of the 'higher level' languages support some form of 
linking to C code, should you need it.  (although, you don't always need 
it ... after dealing with scientists insisting that I use their libraries, 
and trying to get it compiled as an object so I could call it as a 
postgres function, I finally just gave up and hard-coded the table in 
PGPLSQL ... I'll just have to update it every few years as leap-seconds 
are added to UTC)


...crap, tangent again.

okay, back to the hell of debugging crappy code with off-by-one errors and 
race conditions due to lack of locking, and no error checking to see if 
processes completed.


-Joe

[CODE4LIB] PHP bashing (was: newbie)

2010-03-25 Thread Joe Hourcle


On Thu, 25 Mar 2010, Brian Stamper wrote:

On Wed, 24 Mar 2010 17:51:38 -0400, Mark Tomko mark.to...@simmons.edu 
wrote:


I wouldn't recommend PHP to learn as a programming language, if your goal 
is to have a general purpose programming language at your disposal.  PHP is 
a fine language for building dynamic web pages, but it won't help you to 
slice and dice a big text file or process a bunch of XML or do some other 
odd job that you don't want to do by hand.


To be precise, PHP can indeed do these kind of things, particularly in 
command line mode. I certainly don't recommend it, but if you're used to PHP 
for other reasons, and you already have it available to you, you can do 'odd 
jobs' with PHP. You can also use your teeth to open a tight bottle cap, the 
edge of a knife as a screwdriver, and duct tape to perform auto repairs.


You say that as if duct tape is a bad thing for auto repairs.  Not all 
duct tape repairs are candidates for There, I fixed it![1].  It works 
just fine for the occassional hose repair.


-Joe

[1] http://thereifixedit.com/

Re: [CODE4LIB] newbie

2010-03-25 Thread Joe Hourcle


On Thu, 25 Mar 2010, Yitzchak Schaffer wrote:


On 3/24/2010 17:43, Joe Hourcle wrote:

I know there's a lot of stuff written in it, but *please* don't
recommend PHP to beginners.

Yes, you can get a lot of stuff done with it, but I've had way too many
incidents where newbie coders didn't check their inputs, and we've had
to clean up after them.


Another way of looking at this: part of learning a language is learning its 
vulnerabilities and how to deal with them.  And how to avoid security holes 
in web code in general.


Unfortunately, it's not all web code.  Part of the issue is in selecting 
the correct tool for the job.


Case in point --

I've been working for the last year to integrate a new data system into 
our federation.  The system officially hasn't gone live yet, so as the 
institution building the system had replaced their full time DBA with a 
contractor, the contractor decided he was going to replace all of the work 
that the DBA had already done to enable external sites to subscribe to 
collections within the system.


Unfortunately, he did the entire thing in shell, and he's passing around
SQL scripts, applying them to the database without any validation, and 
he's hard-coded assumptions about how directories are laid out and where 
the script has permissions to write.


Needless to say, when you get someone reading stuff from config files with 
*no* taint checking and *no* escaping or even quoting of arguments passed 
to other commands, I have to clean it up.  I even try passing my changes 
back upstream, but I'm told that the contractor has to make the changes 
(and he then picks and chooses which security changes he's going to make 
... then decides to wrap each 'rm' and dozen other commands in functions 
(so I can override what command's being called?), and I now have a shell 
script that's over 1000 lines.  (okay, that's not fair ... his version is 
only 968 lines, it only gets over 1000 when I try to add my corrections to 
it, and it's only 702 lines when you strip out comments and blank lines)


Now, much of it's just plain bad programming -- I mean, would you test to 
see if variables were set BEFORE loading the config file?   Would you run 
through a series of functions where each one required the other one to 
complete without actually testing to see if any of them actually worked?


(and well, one of those functions was the one that removed a tarball that 
took an hour to generate at the server, and the next one report back the 
'success' to the server, so I couldn't get the server to run it again 
without getting someone to correct things manually)


... I probably wouldn't be so hot on the topic, if it hadn't occupied the 
better part of the last month of my life, and all of this last week. 
(well, it seems that scp'ing a file for the subscription manager to 
service to process, and create a tarball response with the contents for 
your database doesn't work too well when the service isn't actually 
running ... but the way it's written you have *no* idea what the status of 
the server is).


...

sorry, I just needed to vent.

Anyway, part of what makes a good programmer is knowing the correct tools 
to use.  (and unfortunately, by definition, any newbie isn't going to have 
enough languages in their toolbox to be able to make a good selection). 
Yes, we always have to deal with determining the 'best' language based on 
what we know, who's going to maintain it, etc, so we sometimes have to go 
with sub-optimal choices.


But much of it's trying to identify what's going to go wrong with what we 
build, and trying to make sure that it doesn't break in spectacularly bad 
ways.[1]  I guess most people don't have the men with guns show up and 
take your servers for forensic analysis when some types of things go 
wrong, which makes me a little more paranoid in my error handling.


But if you put it out there on the internet, someone, sooner or later will 
attempt to abuse it.  It could be link spam on blogs, or usurping a guest 
book program to send spam, or even people claiming that compression 
artifacts in your data are UFOs[2], resulting in DDoS of your servers.
The bad ones are where they find a way to modify your database, add 
something to your filesystem, or give them a shell on your system.



-Joe

[1] http://xkcd.com/327/
[2] http://www.google.com/search?q=disclosure+nasa+sun+2010

Re: [CODE4LIB] newbie

2010-03-24 Thread Joe Hourcle


On Wed, 24 Mar 2010, Eric Lease Morgan wrote:


On Mar 24, 2010, at 3:24 PM, jenny wrote:


My question is, where would you recommend I would begin? What's hot
right now in the library world? Python, PERL, Ruby? Any advice you'd
have for a beginner like me or even recommendations for online courses
would be extremely appreciated



If you are approaching the problem for the point of view of learning a 
programming language, then then you have outlined pretty good choices. 
At the risk of starting a religious war, I like Perl, but PHP is more 
popular. Java is pretty good too, but IMHO it doesn't really matter. In 
the end you will need to use the best tool for the job.


I know there's a lot of stuff written in it, but *please* don't recommend 
PHP to beginners.


Yes, you can get a lot of stuff done with it, but I've had way too many 
incidents where newbie coders didn't check their inputs, and we've had to 
clean up after them.  Just yesterday, I was helping someone at another 
federal agency clean up after someone got in through a PHP script and 
had turned their site into an ad for cialis.  (but cleverly disguised, 
using their header / footer, and it only showed up when you passed the 
correct query_string to it)


The problem's gotten so bad here, that we've been asked to send our entire 
web directory on each server to our security office, so that they can run 
it through some security scanner that looks for problems in PHP code. 
(they relented to my running 'find' on the system for PHP scripts, as we 
serve a few dozen TB of data over HTTP)


We're also running intrusion detection software that managed to catch 
someone attempting to exploit refbase (and that was strike #2 against it 
... I've never gotten a response to my e-mails to the maintainer, so we've 
since had to scrap the installs of it that we had).


So, anyway ... don't do PHP.  Even Tim Bray recommended that at ASIST's 
2009 annual meeting, where he gave the plenary.  (He recommended people 
learn Ruby, instead)


Personally, I do most of my work in Perl, where I can, but I'd recommend 
Ruby or Python over someone learning PHP (unless it was to learn enough to 
migrate code off of PHP).



...

and yes, I know I've stirred this pot before:

http://www.mail-archive.com/code4lib@listserv.nd.edu/msg06630.html
http://www.mail-archive.com/code4lib@listserv.nd.edu/msg06648.html
...

And if you're using PHP, and can't get away from it, consider using 
something like mod_security to watch for signs of malicious behavior:


http://www.modsecurity.org/

(note -- not an endorsement, I don't use it myself, as they've got 
something installed on the upstream firewall that does it ... which means 
that someone else sees it happen, and then we have to clean it up, fill 
out paperwork that we've cleaned it up, have meetings about how we're 
going to clean it up (when we already did), etc.)



-Joe

Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-22 Thread Joe Hourcle


On Sun, 21 Mar 2010, Karen Coyle wrote:

One thing I am finding about FRBR (and want to think about more) is that one 
seems to come up with different conclusions depending on whether one works 
down from Work or works up from Item. The assumption that an aggregate in a 
bound volume is an Expression seems to make sense if you are working up from 
the Manifestation, but it makes less sense if you are working down from the 
Work. If decisions change based on the direction, then I think we have a real 
problem!


It's a *reference model*.  People are going to apply it differently, for 
what works in their situation.


It is pointless to assume that we will ever get everyone to agree on a 
single implementation -- it's either too complex and waste's people time 
for stuff they don't care about, or it's not complex enough and doesn't 
handle your special situations and strange edge cases.


Build the system that makese sense for your needs, and use FRBR as 
guidelines on issues to consider, basic requirements, etc.  It is not an 
API spec.  It is not an interchange format.


RDA, on the other hand, is more concrete -- it has specific cataloging 
instructions on how to deal with specific situations.  (and well, in the 
case of aggregates as new expressions without a resultant new work, as 
I've come to understand from this discussion, rules that might not comply 
with FRBR)  With the RDA toolkit, you even have a specific implementation.


...

Maybe my take on the situation is different because I don't deal with 
bibliographic objects.  Technically, by FRBR, I don't even deal with 
Items, as it's all digital.  (and I don't want to try to answer if little 
bits of magnetic film spread across my disk arrays make up an 'Item', as 
then I have to consider things being new Items when my disk array decides 
to move data around because a drive starts to fail)


... as such, there's no way in hell I'm going to be able to mesh my 
resultant catalogs with most other people's catalogs (and to do so, 
wouldn't make sense for the users).  I also have to try to mesh other 
catalogs with our federation, where we just don't have the funding to 
re-catalog every object, so I'm just trying to see how each catalog fits 
within a common model, so I know how to talk to each system and how the 
granularity of their results compares to the results from other systems.


I specifically have to plan for everyone coming up with their own systems; 
some are spectacularly bad.  (A new database table every year or month, so 
we don't hit limits within our database.  Multiple related tables, but not 
actually assigning foreign keys between them.  Over 10k tables, with each 
catalog table storing both current and deprecated data and no way to easy 
way to select just the deprecated data without going through an overly 
cumbersome abstraction interface (which merges in constants as stored in 
other yet other tables) ... and each of the catalog tables has no fixed 
specification.)


...

I'm with Jenn on this -- different groups can set set up their little 
idealized implementations of FRBR, as is being done with RDA, and the 
different groups working on their implementation can ignore them when it 
doesn't fit with their needs.


More concrete systems *are* needed, or we're going to end up with a 
near-infinate number of variations, but some people are going to find it 
easier to deal with a more restrictive model, where they don't have to 
deal with complexity; and others are going to have strange edge cases that 
don't fit within the restrictions that require that same complexity.


The final vote on if people accept the restrictions of RDA will be if they 
decide to adopt it, or if they have to go with some other implementation.


-Joe

Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Joe Hourcle


On Thu, 18 Mar 2010, Jonathan Rochkind wrote:


Karen Coyle wrote:


naturally favors the package over the contents. So we'll have some  works 
that are what users think of as works, and other works that  represent the 
publisher's package -- which sometimes will be something  that makes sense 
to the user, but at other times, as in many music  CDs, is bordering on the 
arbitrary. If we present these all as works  to the user, confusion will 
ensue.


So it's up to our systems to NOT present things that way, right?  If a 
particular Work is just an aggregate which is not that meaningful to the 
user, it shouldn't be presented (at least in initial result sets), the 
meaningful expressions/manifestations should be presented, right?  I'm not 
entirely clear on your example demonstrating that, but I believe you that it 
exists.


I would personally assume so -- you don't want someone searching to see if 
you have a copy of 'Hamlet', and all you have is 'The Collected Works of 
William Shakespeare' and so your system reports that you don't.


Of course, depending on what the user asks for affects what we respond 
back with -- even if we have 27 copies of 'Hamlet', we wouldn't respond 
with 27 records back in response to their request.  It's entirely possible 
(and probable) that systems track objects at a granularity other than 
what's presented back to the user.


If someone's searching for a specific song, so we expect them to know the 
names of every album it's been on?  Yes, our local catalog might only 
track the albums, but if there's some sort of indication that they're 
aggregations, we know that we might need to expand them to be able to 
answer the question.



The way I see it, our architectural job is _first_ to create a data model 
that allows all the neccesary things to be expressed, THEN create systems 
that use those necessary expressed things to create reasonable displays.


I'm still thinking my interpretation (which is not JUST mine, I don't think I 
even invented it) of aggregate modelling is the only sane one I've seen that 
allows us to model what in many use cases we'd be allowed to model, without 
forcing us to model what in many use cases cost-benefit would not justify 
modelling.


It's a *reference* *model* ... it is *not* an implementation.  Everyone's
allowed to model anything you want.


In the RDA relationships (which I've summarized here 
http://kcoyle.net/rda/group1relsby.html) there seem to be two kinds: 
intellectual relationships, and bibliographic relationships. Is  adapted 
from is an intellectual relationship; Contains is a  bibliographic 
relationship. They're all mixed together as if they are  the same thing.


I think you may very well be right that there some be more clarification in 
the model here. I haven't thought about it enough.


There definitely needs to be more clarification in the model as to how to 
handle aggregates. At one point there was a working group on that, I'm not 
sure what happened to it. Of course, if the working group came up with 
something OTHER than my preferred interpretation, I'd be very unhappy. :)


The group's two proposals were to model aggregates as works, or as 
manifestatons, so RDA seems to be on their own modeling them as 
expressions:


http://www.ifla.org/en/events/frbr-working-group-on-aggregates

I don't know what happened at the August 2009 meeting, though.  William 
Denton had a breakdown of the August 2008 meeting, which explained 
some of the issues that they were considering:


http://www.frbr.org/2008/08/18/working-group-on-aggregates


-Joe

Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Joe Hourcle


On Thu, 18 Mar 2010, Jonathan Rochkind wrote:


Joe Hourcle wrote:


The group's two proposals were to model aggregates as works, or as 
manifestatons, so RDA seems to be on their own modeling them as 
expressions:


See, this is what I don't understa.d As works, or as manifestations??  In 
the FRBR model, every single manifestation belongs to _some_ Work, does it 
not?  So I don't understand how those can be alternatives. Or was the 
proposal to change this? So some manifestations exist free floating 
belonging to no work at all? (By belonging to in FRBR terms of art, I mean 
in the FRBR model, every manifestation is the embodiment of SOME 
expression, which is the realization of SOME Work. Whether that expression 
or work are yet described or not, they're there in the model.  Was the 
proposal really to change this, so some manifestations are  by definition 
the embodiment of no expression at all, not even an expression that has yet 
to have an identifier assigned to it? That seems horribly mistaken to me).


There's a many-to-many relationship between Expressions and 
Manifestations in FRBR, so a single Manifestation can encompass multiple 
Expressions (and therefore, multiple Works).


In the Aggregates-as-Manifestations model, something like the 'Complete 
Works of ...' would exist as a new manifestation, but *not* as a new work. 
(and those individual works might never exist as individual 
manifestations)


It's of course much more simple to express some items (such as the 
Canterbury Tales) as a single work (Aggregations-as-Works), and then just 
make an expressions of them, and the corresponding dozens of possible 
manifestations.  I guess it'd be the FRBR equivalent of data 
normalization.  And aggregating at the work levels makes it easier to 
reconcile the cases where different catalogers can't agree if it's a 
single object or multiple objects.


I'm torn -- I think both are valid ways of describing the relationships, 
and different domains are going to try to go the route that makes the most 
sense for them.  (which is likely, which one's the least cost to implement 
while giving them the functionality they want)


-Joe

[CODE4LIB] Any examples of using OAI-ORE for aggregation?

2010-03-10 Thread Joe Hourcle

Most of the examples I've seen of OAI-ORE seem to assume that you're 
ultimately interested in only one object within the resource map -- 
effectively, it's content negotiation.


Has anyone ever played with using ORE to point at an aggregation, with the 
expectation that the user will be interested in all parts, and 
automatically download them?


...

Let me give a concrete example:

A user searches for some data ... we find (x) number of records
that match their criteria, and they then weed the list down to 10
files of interest.

We then save this request as a Resource Map, as part of an OAIS
order.  I then want to be able to hand this off to a browser /
downloader / whatever to try to obtain the individual files.

Currently, I have something that can take the request, and create a 
tarball on the fly, but we have the unfortunate situation when some of the 
data is near-line and/or has to be regenerated -- I'm trying to find a 
good way to effectively fork the request into multiple smaller request, 
some of which I can service now, and some for which I can return an HTTP 
503 status (service unavailable) w/ a retry-after header.


...

Has anyone ever tried doing something like this?  Should I even be looking 
at ORE, or is there something that better fits with what I'm trying to do?


Thanks for any advice / insight you can give

-Joe

-
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center

Re: [CODE4LIB] Q: XML2JSON converter

2010-03-05 Thread Joe Hourcle


On Fri, 5 Mar 2010, Godmar Back wrote:


On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote:


Hi,
try this: http://code.google.com/p/xml2json-xslt/



I should have mentioned that I already tried everything I could find after
googling - this stylesheet doesn't meet the requirements, not by far. It
drops attributes just like simplexml_json does.

The one thing I didn't try is a program called 'BadgerFish.php' which I
couldn't locate - Google once indexed it at badgerfish.ning.com


http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/

http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php

-Joe

Re: [CODE4LIB] Location of the first Code4Lib North meeting?

2010-01-25 Thread Joe Hourcle


On Mon, 25 Jan 2010, Edward M. Corrado wrote:

I never had a problem in the couple of times I crossed a border into Canada 
for a library conference, but I tend to make sure I have the program and 
hotel information readily available to show them in case they ask (yes, the 
Canadian border people have looked at it). My guess is that some of the 
border guards think only old ladies with hair buns can be librarians so they 
might be a bit confused when someone that doesn't meet that description tries 
to cross the boarder.


I've only been to Canada once (for last year's ASIST), and the only 
question I had difficulty in answering was who do you work for, for 
which I probably confused the guy with an explanation of US government 
contracting.


It's not as bad as Israel, where I've heard some people have been asked to 
give their presentation for the conference, when they said that's why they 
were visiting.


-Joe

Re: [CODE4LIB] Online PHP course?

2010-01-06 Thread Joe Hourcle


On Wed, 6 Jan 2010, MJ Ray wrote:


Thomas Krichel kric...@openlib.org wrote:

  Joe Hourcle writes

ps.  yes, I could've used this response as an opportunity to bash
PHP ...  and I didn't, because they might be learning PHP to
migrate it to something else.


  controversial ;-)

  what's the problem(s) with PHP?


Oh please don't nuke the list from orbit like that!  I hope that
this is a balanced enough reply to keep everyone happy:

Our experience is that PHP hosting environments vary much more, most
PHP code is a mess (PHP-based software was part of 35% of the
U.S. government's National Vulnerability Database in 2008 -
http://www.coelho.net/php_cve.html) and few things (code and hosting)
move between the different major versions smoothly.  It's a personal
home page tool which has grown massively, for better or worse.

BUT! Even after all that, software.coop still supports some PHP
applications because they can work well and be very useful, though
we're under no illusions about PHP's warts.


I can sum it up in one sentance:

PHP makes it *very* easy to write insecure programs.

Of the security incidents in our department (the ones where men with guns 
come and take your hard drive and/or whole server away for an 
'investigaton'), PHP has been responsible for the majority of the 
incidents.


Part of it is the perceived simplicity -- look at how easy it is to add 
some extra functionality to your website!  You don't even need to 
understand good programming practices!  Anyone can do it!


(to be fair -- Perl used to be the software that fell into this niche 10 
years ago, but I blame Matt's Script Archive more than the language 
itself, as Perl isn't specifically for web site automation)


... and they never get their code reviewed by one of the professional 
programmers in our department, it goes live, and then, a year or so later, 
someone shows up to take our server because the security monitoring showed 
that it looks like someone managed to pull our password file off the 
system.  (never mind that (1) there's a shadow file, so /etc/passwd has no 
passwords in it, and (2) even if they got the password file, it only has 
the application users (none of whom have login privs) because it's macosx)


Then you waste a week of your time trying to convince the security gestapo 
that yes, there was a security vulnerability, and there was an incident, 
but nothing confidential was actually lost ... and then we get everyone 
who had stuff on the server bitching us out because they can't get to 
their stuff, and they had some time-sensitive information to get out, or 
whatever, and we're trying to jump through security's hoops for a week or 
two while our other projects get further and further behind.


...

Now, if they actually manage to *upload* a file to your system ... then 
expect to rebuild your whole machine from the ground up.


so um ... if you're going to use PHP ... if you're on apache, look into 
suPHP.  Consider making your website served from a read-only file system, 
and look online for other tips on hardening your server.



-Joe


oh, and I also really disike having to tye all of my stuff to one 
database.  I know mysqli makes it better, but the original mysql stuff 
still taints my perception of PHP.


I also have a dislike of ColdFusion servers, but that stems from the 'unix 
registry' crap they used (still use?) back when they were still Allaire, 
and I had a few times when the system choked and I had to rebuild all 
settings from memory the first time, and from printouts of the server 
configuration the next few times.  And then there was the time at a 
previous job when we upgraded the server and they pushed in changes that 
made the service crash every night at about 2am ... so I'd get a call 
every night to restart the thing ... until I finally wrote a watchdog 
script which by the time I got fired, was restarting the service 5-8 times 
per night ... but I actually *liked* coldfusion as a developer.


...

and so long as we're mentioning PHP, and this is code4lib -- anyone 
personally know the developer of refbase?  I tried emailing him a few 
months back offering patches to get rid of all of the 'deprecated' 
warnings when running under php5.

Re: [CODE4LIB] Online PHP course?

2010-01-05 Thread Joe Hourcle


On Tue, 5 Jan 2010, Tod Olson wrote:

One of our staff needs to learn PHP, and an online course is preferred. 
Is there an online PHP course that any of you would recommend?


If they already understand basic programming, and just need to pick up the 
syntactic issues, some of the documentation from w3schools is good -- I 
haven't looked over their PHP stuff specifically, though:


http://www.w3schools.com/php/default.asp

-Joe

ps.  yes, I could've used this response as an opportunity to bash PHP ...
 and I didn't, because they might be learning PHP to migrate it to
 something else.

Re: [CODE4LIB] good and best open source software

2009-12-29 Thread Joe Hourcle


On Tue, 29 Dec 2009, Thomas Krichel wrote:


 Requiring an upfront healthy community is particurly problematic is
 a small community such as digital library work.

 On the other kind, there is widely adopted software that I got
 cajoled into maintaining, that consider bad. Apache is one of
 them. I run maybe 50 virtual servers an a bunch of boxes, I am still
 puzzled how it works and it's trial and error with each software
 upgrade, where goes that NameVirtualServer thing into, the constant
 croaks server foo has no virtualserver. I'm not a dunce, but
 Apache makes me feel I am one. When I look at these config files
 that are half-baked XML, I wonder what weed the guy smoked who
 invented this.

 If I could do it allover again, I would do it in lighttpd. Oh well
 it was not there in 1995 where I started running web servers.

 Other problematic case: Mailman. I run about 130 mailing lists, over
 80 have a non-standard config, I am running every few months into
 problems with onne of them, despite the fact that I wrote a script
 to configure all the non-standard lists the same way.



Even if they don't have specific forums, if they're more widely adopted 
software, you might have luck with well populated, but more generic 
forums:


programming related:
http://stackoverflow.com/

server administration:
http://serverfault.com/

other IT stuff:
http://superuser.com/

I admit that I haven't specifically asked any questions about Apache or 
Mailman, though.


-Joe

Re: [CODE4LIB] good and best open source software

2009-12-29 Thread Joe Hourcle


On Tue, 29 Dec 2009, Jonathan Rochkind wrote:

I think you may find yourself somewhat in the minority in thinking Apache is 
bad software. (I certainly have my complaints about it, but in general I find 
it more robust, flexible, and bug-free than just about any other software I 
work with).


But aside from getting into a war about some particular package:  It may be 
true that in general popular software does not necessarily equal good 
software -- even popular open source software.  And doesn't neccesarily equal 
the right software solution for your problem. (I could mention some 
library-sector-origin open source software I think proves that, but I won't, 
and it would just be my opinion anyways, like yours of Apache).


But popular software _does_ mean software that has a much higher chance of 
continuing to evolve with the times instead of stagnating, getting it's bugs 
and security flaws fixed in a timely manner, and having a much larger base of 
question-answering and support available for it (both free and paid).


Which is one important criteria for evaluating open source software. But 
nobody was suggesting it should be the _only_ criteria used for evaluating 
open source software, or even neccesarily the most important. It depends on 
your situation.


I think that part of the problem here is that software tends to fill a 
niche, and some of these larger software projects tend to fill the 
'enterprise' niche.


Now, Apache 2 in many ways *is* easier to configure than Apache 1.3, but 
the sheer number of configuration options from all of the different 
modules makes it more difficult to configure than the Netscape/iPlanet/ 
SunOne product line.  (at least to me, other people might not be making 
the sorts of changes that I deal with).


However, there's a lot of power in Apache's configuration ability ... I 
just wish I didn't have to deal with all of it.*


... but it's like anything -- if I switch to a different server, it might 
be easier to configure, but then I lose mod_perl support, so it's a 
trade-off.


-Joe

* I think I lost a week trying to get some software virtual hosts working
  correctly, where there'd be a 'default' host, and one that only
  responded to specific names and had some alternate security options.

Re: [CODE4LIB] good and best open source software

2009-12-28 Thread Joe Hourcle


On Mon, 28 Dec 2009, Eric Lease Morgan wrote:

For my own education and cogitation, I have begun to list questions to 
help me address what I think is the best library-related open source 
software. [1] Your comments would be greatly appreciated. I have listed 
the questions here in (more or less) personal priority order:


 * Does the software work as advertised?
 * To what degree is the software supported?
 * Is the documentation thorough?
 * What are the licence terms?
 * To what degree is the software easy to install?
 * To what degree is the software implemented
   using the standard LAMP stack?
 * Is the distribution in question an
   application/system or a library/module?
 * To what degree does the software satisfy some
   sort of real library need?


What sorts of things have I left out? Is there anything here that can be 
measurable or is everything left to subjective judgement? Just as 
importantly, can we as a community answer these questions in light of 
distributions to come up with the best of class?


+ How often do I have to update it to keep ahead of security exploits?

+ Does it play well with other software?  (eg, does it break under updated
  libraries and/or does the installer try to force me to update every
  library on my system to bleeding edge for no good reason?)
 (aspect #2 might fall under the 'easy to install' item)

...

You could also end up with some outdated software that meets all of the 
requirements, but is based on older standards that might not be relevant 
today.


-Joe

Re: [CODE4LIB] calling another webpage within CGI script

2009-11-23 Thread Joe Hourcle


On Mon, 23 Nov 2009, Ken Irwin wrote:


Hi all,

I'm moving to a new web server and struggling to get it configured properly. The problem of the 
moment: having a Perl CGI script call another web page in the background and make decisions 
based on its content. On the old server I used an antique Perl script called hcat 
(from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried 
curl and LWP::Simple.

In all three cases, I get the same behavior: it works just fine on the command 
line, but when called by the web server through a CGI script, the LWP (or other 
socket connection) gets no results. It sounds like a permissions thing, but I 
don't know what kind of permissions setting to tinker with. In the test script 
below, my command line outputs:

Content-type: text/plain
Getting URL: http://www.npr.org
885 lines

Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get 
to the Couldn't get error message.

Any clue how I can make use of a web page's contents from w/in a CGI script? 
(The actual application has to do with exporting data from our catalog, but I 
need to work out the basic mechanism first.)

Here's the script I'm using.

#!/bin/perl
use LWP::Simple;
print Content-type: text/plain\n\n;
my $url = http://www.npr.org;;
print Getting URL: $url\n;
my $content = get $url;
die Couldn't get $url unless defined $content;
@lines = split (/\n/, $content);
foreach (@lines) { $i++; }
print \n\n$i lines\n\n;

Any ideas?


I'd suggest testing the results of the call, rather than just looking for 
content, as an empty response could be a result of the server you're 
connecting to.  (unlikely in this case, but it happens once in a while, 
particularly if you turn off redirection, or support caching). 
Unfortunately, you might have to use LWP::UserAgent, rather than 
LWP::Simple:


#!/bin/perl --

use strict; use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent-new( timeout = 60 );

my $response = $ua-get('http://www.npr.org/');
if ( $response-is_success() ) {
my $content = $response-decoded_content();
...
} else {
print HTTP Error : ,$response-status_line(),\n;
}

__END__

(and changing the shebang line for my location of perl, your version 
worked via both CGI and command line)



oh ... and you don't need the foreach loop:

my $i = @lines;

-Joe

Re: [CODE4LIB] Library Linked Data

2009-10-29 Thread Joe Hourcle


On Wed, 28 Oct 2009, Roy Tennant wrote:


David,
Could you elaborate a bit? In my mind, the only semantic web technology of
any note is linked data. How that fits into library search is anyone's
guess, and I'm wondering what, specifically, you're referring to when you
say that Talis is active in this area.

If you are asking about library linked data, then there are several
examples, most notably the Library of Congress[1], the Swedish Union
Catalogue [2], and OCLC[3][4]. I believe that a minimum both the Library of
Congress and OCLC plan on releasing more linked data sets.

So can you elaborate a bit more on what, exactly, you're seeking? Thanks,
Roy

[1] http://id.loc.gov/authorities/
[2] http://article.gmane.org/gmane.culture.libraries.ngc4lib/4617
[3] http://dewey.info/
[4] http://outgoing.typepad.com/outgoing/2009/09/viaf-as-linked-data.html



For some other information on what other groups are doing in this regard, 
the DCMI (Dublin Core) just had a meeting in Korea two weeks ago, with the 
theme Semantic Interoperability of Linked Data


http://www.dc2009.kr/

And there was a CENDI/NKOS workshop that I attended last week, that 
featured many of the same speakers.


http://nkos.slis.kent.edu/2009workshop/NKOS-CENDI2009.htm

Both sites have presentations linked from their sites.  I can forward on 
my notes from the CENDI/NKOS workshop, but I'll warn you in advance that I 
wrote them for a different intended audience (folks on an interoperability 
project that I'm attached to), so I might've trimmed some stuff that's of 
general interest to folks in libraries, while bringing out stuff that 
isn't.


The CENDI folks are all US Government, but there seems to be a wider range 
of people in NKOS.  I don't know how much of it fits into the typical 
'library' definition, other than the Library of Congress stuff that was 
already mentioned.


-Joe

Re: [CODE4LIB] Bookmarking web links - authoritativeness or focused searching

2009-09-29 Thread Joe Hourcle


On Tue, 29 Sep 2009, Cindy Harper wrote:


I've been thinking about the role of libraries as promoter of authoritative
works - helping to select and sort the plethora of information out there.
And I heard another presentation about social media this morning.  So I
though I'd bring up for discussion here some of the ideas I've been mulling
over.


[trimmed]


Is anyone else thinking about these ideas?  or do you know of projects that
approach this goal of leveraging librarian's vetting of authoritative
sources?


I don't know of any projects that specifically do what you've mentioned, 
but for the last few years, we've been mulling over how to store various 
lists and catalogs so that we could present interesting intersections of 
them.


In my case, I deal with scientific catalogs, so it's stuff like when was 
RHESSI observing the same area as TRACE? or When was there an X-class 
flare within 2 hours of a CME? or even lack of intersections When were 
there type-II radio bursts without a CME or flare within 6 hours?


For the science catalogs, we specifically don't want to just make some 
sort of single ranking from each list, and it's not really easy to merge 
the catalogs into some form of union catalog as they're cataloging 
different concepts.


... and I think that there's use in library searches to keep the catalogs 
different, particularly when you're bringing up authority (which then gets 
to reputation, etc.).


I'm not sure how many other people out there would try to search for Hugo 
award winning novels that weren't on the New York Times best seller list, 
so it might not be as useful for general patron use ... unless you could 
give it your *own* catalog (AFI top 100 movies ... that I don't already 
own)



-
Joe Hourcle
Solar Data Analysis Center
Goddard Space Flight Center

Re: [CODE4LIB] Implementing OpenURL for simple web resources

2009-09-14 Thread Joe Hourcle


On Mon, 14 Sep 2009, Mike Taylor wrote:


2009/9/14 Jonathan Rochkind rochk...@jhu.edu:

Seriously, don't use OpenURL unless you really can't find anything else that
will do, or you actually want your OpenURLs to be used by the existing 'in
the wild' OpenURL resolvers. In the latter case, don't count on them doing
anything in particular or consistent with 'novel' OpenURLs, like ones that
put an end-user access URL in rft_id... don't expect actually existing in
the wild OpenURLs to do anything in particular with that.


Jonathan, I am getting seriously mixed messages from you on this
thread.  In one message, you'll strongly insist that some facility in
OpenURL is or isn't useful; in the next, you'll be saying that the
whole standard is dead.  The last time I was paying serious attention
to OpenURL, that certainly wasn't true -- has something happened in
the last few months to make it so?


My interpretation of the part of Jonathan's response that you quoted was 
basically, don't use OpenURL when you're just looking for persistant URLs.


The whole point of OpenURL was that the local resolver could determine 
what the best way to get you the resource was (eg, digital library vs. ILL 
vs. giving you a specific room  shelf).


If you're using OpenURLs for the reason of having it work with the 
established network of resolvers, don't get cute w/ encoding the 
information, as you can't rely on it to work.


...

From what I've seen of the thread (and I admit, I didn't read every 

message), what's needed here is PURL, not OpenURL.

-Joe

< 1 2 3 >

101 - 200 of 253 matches

Mail list logo