Re: [CODE4LIB] Survey
On Nov 27, 2012, at 12:20 PM, Karen Coyle wrote: Peter, again I worry about this being self-selecting. People who report on surveys are the people who report on surveys. A code4lib survey would be nice, but I'm really interested in on the ground troops. And I think the questions would have to be specific to what one does: - installs and fixes equipment - runs updates/backups on ILS - writes scripts - writes code - manages local network - modifies ILS tables for local customization - creates web pages - makes decisions on tech purchasing - supervises staff that runs ILS/local network Well, that's probably a stupid list, but a smarter list could be made. In other words, I would want what you actually do to define whether you are a techie -- not whether you consider yourself a techie (many women demean their own skills -- Oh, I just push a few buttons). [1] I'd like to see it be very broad, and later we can decide if we think modifying ILS tables counts as being a real techie. I admit, I'm no expert on surveys (I tried doing one once for a class ... got shut down for an IRB violation as I said I'd share the results back with the organization we were surveying ... which is pretty sad, as the organization I was surveying was the library school itself) ... but you could do a much larger survey, trying to get all people who work in libraries, and ask questions about specific IT-related tasks that they might be doing, even if they don't self-identify as IT. Of course, then you might miss those of us who don't work in libraries, but who may identify with this group. ... and make sure that whoever does it isn't at an academic institution, to avoid that IRB crap. -Joe
Re: [CODE4LIB] anti-harassment policy for code4lib?
On Nov 26, 2012, at 5:16 PM, Bess Sadler wrote: Why have an official anti-harassment policy for your conference? First, it is necessary (unfortunately). Harassment at conferences is incredibly common - for example, see this timeline (http://geekfeminism.wikia.com/index.php?title=Timeline_of_incidents) of sexist incidents in geek communities. Second, it sets expectations for behavior at the conference. Simply having an anti-harassment policy can prevent harassment all by itself. Third, it encourages people to attend who have had bad experiences at other conferences. Finally, it gives conference staff instructions on how to handle harassment quickly, with the minimum amount of disruption or bad press for your conference. If the conference already has something like this in place, and I'm just uninformed, please educate me and let's do a better job publicizing it. Thanks for considering this suggestion. If the answer is the usual code4lib answer (some variation on Great idea! How are you going to make that happen?) then I hereby nominate myself as a member of the Anti-Harrassment Policy Adoption committee for the code4lib conference. Would anyone else like to join me? We had no Anti-Harassment Policy for the DC-Baltimore Perl Workshop as it was all covered under our general Code of Conduct: Don't be an asshole. I think there was a second line of it, about how we had the right to remove people who refused to follow that advice and no refunds would be given. I might be wrong on the exact language. The e-mail I found referenced 'Don't be a dick', in an attempt to paraphrase the legalese of the Code of Conduct for our venue ... but the reference to gender-specific anatomy would be kinda sexist in itself. -Joe
Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?
On Nov 2, 2012, at 2:09 PM, Mita Williams wrote: +1 to web-hosting as it gives the ability install one's own software on one's domain (which feels great) *and* easy access to shell. And when web-hosting feels like too much of a barrier to access, sites like jsfiddle where you can immediately start adding *and* sharing code is key. IMHO the initial appeal of Code Academy was that it removed all barriers to getting started. Getting a laptop's localhost set up is too daunting for a first step, I think. If that's a problem for people, it might be worth looking at the various *AMP (LAMP, WAMP, MAMP) stacks for an easy install of Apache, mySQL + perl / python / php. We're probably moving away from locally hosted services towards 'the cloud' for the most part (remember when they used to be called 'service providers'?) but it's still useful to learn a little something about configuring a webserver / database / etc. And it's generally more locked down in the various *AMP stacks than if you went and installed them individually, so there aren't quite the same level of problems w/ security. -Joe
Re: [CODE4LIB] Just Solve the File Format Problem month: can you help?
On Nov 2, 2012, at 3:48 PM, Roy Tennant wrote: Um...how is this better/different from already existing sites/efforts around this? http://en.wikipedia.org/wiki/List_of_file_formats http://www.wotsit.org/ http://www.ace.net.nz/tech/TechFileFormat.html http://www.fileformat.info/ At the very least, this new effort shouldn't start from scratch... They could also extract a lot of information / links from: http://www.digitalpreservation.gov/formats/index.shtml Although, admittedly, it's more intended for creators rather than those trying to figure out what it is they have. (archaeology? forensics?) -Joe On Fri, Nov 2, 2012 at 2:36 AM, Ed Summers e...@pobox.com wrote: I imagine you've heard about the Just Solve the Problem month already, but if not, I thought Chris Rusbridge's email to the digital-preservation list was a good call for participation in the project ... //Ed -- Forwarded message -- From: Chris Rusbridge c.rusbri...@googlemail.com Date: Thu, Nov 1, 2012 at 4:00 PM Subject: Just Solve the File Format Problem month: can you help? To: digital-preservat...@jiscmail.ac.uk Some of you will know that Jason Scott, Rogue Archivist, is raising a citizen's army to attempt to solve the file format problem* in the month of November, 2012. The work is taking place via a wiki at http://justsolve.archiveteam.org/index.php/Main_Page, with a band of volunteers (you need to register to make changes to the wiki, by sending a username and email address to justso...@textfiles.com). I've added a few formats and groups of formats myself (at least as skeletons or empty placeholders). The best form of help is for some of you who know more about rarer data formats to register and help by editing the wiki yourself. It's pretty easy; I've never used MediaWiki before, and everything I've done so far has been by finding something like it and adapting the wiki source. Other people can make it beautiful and standardised later on! If you can't do that, you could email me information about missing data formats. This should include as much as possible of: - name, and what it's for (ie brief description) - web site with some authoritative information - web site with some examples, etc. Let's try and capture ALL these formats. As Jason says in his own inimitable way Let's make that goddam army!. * Note, the problem is only vaguely defined, and after some angst (eg see http://unsustainableideas.wordpress.com/2012/07/04/the-solution-is-42-what-was-the-problem/), I think that's OK. Gathering a huge amount of information about file formats in one place will be a BIG HELP. -- Chris Rusbridge Mobile: +44 791 7423828 Email: c.rusbri...@gmail.com Adopt the email charter! http://emailcharter.org/
Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?
On Nov 1, 2012, at 5:02 PM, Ethan Gruber wrote: Google is more useful than any reference book to find answers to programming problems. Too bad they got rid of codesearch. On Nov 1, 2012, at 5:06 PM, Nate Hill wrote: Huh. Michael, I'd love to know more about why I should care about SASS. I kinda like writing CSS. I see why LESS http://lesscss.org/ makes sense, but help me under stand why SASS does? For the most part, using *any* CSS pre-processor is better than not using one. LESS's problem was that it's javascript based ... so if they have JS off ... you've got nothing. And it's got to be done for each user, rather than re-generate the files after you've made a modification. You can get around this with the 'lessc' compiler, and serve valid css files rather than having each client have to do the processing. They've also got different syntaxes, so it's really up to which one makes sense to you. Functionality wise ... I think they're about equal these days. I suspect that if one comes up with a useful new feature, the other group will copy it. On Nov 1, 2012, at 5:21 PM, Suchy, Daniel wrote: I can already feel the collective rolling of eyes for this, but what about Twitter? It's not a guide or manual, but start following and engaging talented developers and library geeks on Twitter and you'll soon have more help than you know what to do with. Plus, no Zoia ;) Too much misinformation: http://twitter.com/danhooker/status/5630099300 On Nov 1, 2012, at 5:06 PM, Kam Woods wrote: foss4lib is a good resource that I'm sure many use, but isn't (as far as I can tell) linked anywhere on the current code4lib site. How would this differentiate itself from that? The best tool isn't necessarily free or open source. (and it isn't necessarily software). So that being said ... my whiteboard. And a digital camera ... none of that 'smartboard' crap. -Joe
Re: [CODE4LIB] one tool and/or resource that you recommend to newbie coders in a library?
On Nov 1, 2012, at 6:56 PM, Kam Woods wrote: Apologies, everyone (and especially Bohyun). You may still want to consider pointing people to foss4lib as a useful resource, but amend it with the following statement: Free and open source tools may not be the best tools. You might not even NEED software to handle whatever problem you have. Please consider contacting onei...@grace.nascom.nasa.gov for further insight. Oh ... sure... just get me in trouble ... We're supposed to use our 'OneNASA' e-mail address, so you'd have to change it to joseph.a.hour...@nasa.gov ... and I said that in part as I've been in the past a beta tester for BareBones's BBEdit. If you're not doing HTML work, TextWrangler will probably do what you need (which is ... whatever the 'free' is that isn't 'libre') And there's plenty of other good software out there that isn't free, and there's lots of free software out there that's crap (some of which I might've been involved with). Personally, I was unaware of either of these issues. It's a good thing I came here today for some edification. Yes. 'smart' whiteboards are over priced crap. I hope I've educated everyone today. -Joe
[CODE4LIB] Crappy AJAX (was: [CODE4LIB] Q: Discovery products and authentication (esp Summon))
On Oct 25, 2012, at 6:46 AM, Gary McGath wrote: On 10/24/12 8:58 PM, Ross Singer wrote: On Oct 24, 2012, at 6:06 PM, Gary McGath develo...@mcgath.com wrote: On 10/24/12 4:00 PM, Ross Singer wrote: On Oct 24, 2012, at 3:48 PM, Gary McGath develo...@mcgath.com wrote: Also, why wouldn't your AJAX-enabled app be prepared for such an event? Are you asking how an AJAX-enabled application can handle such cases? No, I know how an AJAX-enabled application should handle such cases, I'm saying why, if you're implementing an AJAX-enabled application, why you think this would be an issue. Because I just don't see this being an issue. This has always been a tricky thing to explain; it's not just you, if that's any consolation. Someday I'll figure out how to make it clear on the first try. The point is that if a service redirects to a login page, it assumes the browser can display the login page. Normally this is true, but only if the resource would be delivered as a web page. AJAX components are received as elements, not pages. If you like, I can go into more detail off-list. This is really too much of a side technical issue to be worth taking up a lot of space on the list. You didn't answer the question -- why would you not have some sort of check on the AJAX application (or any application, web or otherwise) to do at least minimal sanity checking on the result of an external call? In the case of something requiring authentication, if it's a well designed back-end, it should return some HTTP status other than 200; 401 or 403 would be most appropriate. I've unfortunately worked with ColdFusion in the early days before they added cfheader to allow you to change the status code so that it was something other than 200. I've also seen websites that cheat to install a 'handler' for all requests by linking to a PHP script using Apache's 404 ErrorHandler directive. This also has the side effect that search engines won't index your site at all (as they assume it's all errors) In both of these cases, I'd say the service is poorly designed if you can't easily identify a failure. You can send a login page along with your 401 status, but you *should* *not* send a 30x redirect to a login page, as then the actual status message is lost. (the content hasn't been moved ... you just want someone to go to the login page ... the HTTP specs don't forbid a Location field w/ a 40x status, although I admit I've never verified that major browsers support it) If you have something pulling in content using something AJAX-like, and it *doesn't* check the result, then the client's poorly designed as well. It might be something as simple as checking to ensure that expected elements are included in the response. The only valid example that I can think of where you may have blind inclusion (ie, you don't have a chance to verify what the results are before displaying) are frames (including iframes) and image links. I'm assuming we've all see those horrible websites that have a 'authentication required' message for every frame, but images are a little more subtle. The best thing to do for images is to serve an image back in response, rather than HTML. It's not a new thing; I remember doing it back when I worked for a university in the mid 1990s. We had your standard 'image-counter' CGI ... but when we realized that the majority of HTTP-referers were from outside the university, it was changed to instead return an image that said 'access denied' or something similar. -Joe
Re: [CODE4LIB] Crappy AJAX
On Oct 25, 2012, at 9:20 AM, Gary McGath wrote: On 10/25/12 7:37 AM, Joe Hourcle wrote: You didn't answer the question -- why would you not have some sort of check on the AJAX application (or any application, web or otherwise) to do at least minimal sanity checking on the result of an external call? Because putting the onus of sanity checking on the web page isn't the best solution in this case. Of course, it should be set up to handle unexpected results sensibly in any case. I view it like using JavaScript for form validation -- don't trust it, and still re-do the validation in the backend. If the costs to check tainted inputs are minimal, *do* *it*. Even when the back-end is well designed, there are enough other things out there that are outside your control. ... like when IE decided to start re-writing 404 and other status pages unless they happened to be at least 1k ... so even when we *were* giving informative messages about what was going on, links to report the problem, etc ... it never made it back to the user. (and yes, I know, I've officially hit old fogey status by complaining about changes that IE made more than 10 years ago ... I'm also not a fan of the br tag ... one of the worst mistakes of HTML+) But for more recent situations ... mobile browsers w/ spotty reception. Man-in-the-middle attacks ... deep-packet filtering (the firewall doesn't like some phrase used in the response, so replaces the content with a 'blocked' message ... they may not be common, but they *do* happen. -Joe
Re: [CODE4LIB] Crappy AJAX
On Thu, 25 Oct 2012, Chris Fitzpatrick wrote: http://en.m.wikipedia.org/wiki/Sayre's_law I'm guessing the other people participating in this thread have never had men with guns show up to take your server because of a 'security incident'. Or block your server's IP address, and then make you jump through hoops for two weeks because they were unhappy with someone uploading an image to your trouble ticket system that accepted anonymous submissions ... with the explaination that if they managed to get a file on there, the whole system was compromised, and had to be blanked and the OS reinstalled. ... it didn't help that the image was text saying something to the effect of 'I've hacked your computer'. And they didn't realize at the time it actually had a JPEG exploit in it, so it was the people who downloaded it could've been compromised, but it wasn't even a valid exploit against the OS we were running. Or have all of the sysadmins in your group stop work for a day while we have a comprehensive scan of all of our machines by the security group because someone on the security auditing group noticed that a machine on our network sent out a request to some random webserver in the middle of the night, and then there was a connection attempted back to that machine and another one on our network. ... but they failed to mention was that the connection back was from a completely different IP range, and they had selectively filtered what they were looking for, so the incoming connections were attempted against *all* machines on our network and not a sign that someone was being selective in their attempts and cause for concern ... and the 'middle of the night' just meant 'before we got in this morning', but we have folks who have to work earlier shifts depending on when we get assigned antenna time to talk to the spacecraft. ... it makes the people who e-mail convinced that NASA's hiding evidence of the existance of alien life seem reasonable by comparison.* So I actually *do* have a stake in validating what we use as inputs. Other people might not, but I do my best to avoid a DOS from our security group.** -Joe * They don't like that we get highly compressed data for 'space weather' purposes, and we replace them with a higher-quality image once it's been downloaded through a higher bandwidth link. They also seem convinced that a compression artifact must be at the same distance from us as the sun for their size and speed calculations, rather than highly energetic particles right at the telescope. ** I've got other stories, too ... but I thought I'd keep it to only the ones that actually affected me.
Re: [CODE4LIB] Q: Discovery products and authentication (esp Summon)
On Oct 24, 2012, at 2:40 PM, Jonathan Rochkind wrote: On 10/24/2012 2:04 PM, Ben Florin wrote: We use Primo, but we've never bothered with their restricted search scopes. Apparently the answer to my question is that nobody has thought about this before, heh. Primo, by default, will suppress some content from end-users unless they are authenticated, no? Maybe that's what restricted search scopes are? I'm not talking about your locally indexed content, but about the PrimoCentral index of scholarly articles. At least I know the Primo API requires you to tell it if end-users are authenticated or not, and suppresses some results if they are not. I assume Primo 'default' interface must have the same restrictions? Perhaps the answer to my question is that at most discovery customers, off-campus users always get the 'restricted' search results, have no real way to authenticate, and nobody's noticed yet! Do they even get a message that they've been restricted? I would think that having a message such as : 74 records not shown because you weren't authenticated would be enough to spur most folks to log in. What I hate is when you do a search for something that you *know* should be there, and it's not ... then you find out that they're using IP range or DNS matching, and not telling the user that they've intentionally hid stuff. I think I've gotten most of the stuff straightened out with our local library, but I have no way of knowing for sure. (my desktop machine's doesn't resolve in the 'gsfc.nasa.gov' domain, and not on the most common network here ... so most systems' test for 'is this a local person' fail, and I get treated as an outsider ... I actually get better service using my personal laptop on the wireless network for visitors) -Joe
Re: [CODE4LIB] Event Registration System Question
On Oct 18, 2012, at 6:08 PM, Brian McBride wrote: Greeting! I was wondering if anyone out there has found or knows of a good open source solution for event scheduling? We would need users to be able to register, allow instructors to set enrollment caps, and basic email reminder functions. Any information would be great! I know there have been a lot of suggestions already, but I'd have to ask what the scope of the system is. Most of the ones that I know if are for conferences; you effectively need to stand up a new instance of the software for each event. Some are designed for hosting purposes (eg, the perl ACT software), and may have features like using a single registrant table, so that you don't have to set up a new login for each conference. (in the case of ACT, it also allows you to see what other conferences someone's attended ... but there's a separate 'user page' for each conference which shows which sessions they're planning to attend, so it's most likely not 100% what you'd want) For a library system, particularly an academic one, I'd assume that this isn't for a single event, but for lots of events (eg, there's a 'intro to (excel) class on the (first tuesday) of the month', but the instructors may change). If this is what you're looking for, it might be easier to look into class or room scheduling software for schools, and add whatever additional functionality you might need. ... and conveniently, when searching for 'open source room scheduling software', a code4lib journal article popped up: http://journal.code4lib.org/articles/2941 You also get quite a few hits for 'open source classroom scheduling software', which may have more of the features you're looking for (eg, managing the individual class registrations vs. just managing the room allocation) ... but of course, search engine hits don't actually mean they're necessarily good, just that the exist, so it's probably worth explaining what you're looking for, so that the other folks on the mailing list can give recommendations. -Joe
Re: [CODE4LIB] email to FTP or something?
On Oct 17, 2012, at 11:46 AM, Nate Hill wrote: Maybe someone can offer me a suggestion here... I bought a nifty new gadget that records data and spits out csv files as email attachments. I want to go from csv MySQL and build a web application to do cool stuff with the data. The thing is, the device can only email the files as attachments, it doesn't give me the ability to upload them to a server. Can anyone suggest how I can securely email a file directly to a folder on a server? The scenario is nearly identical to what is described here: http://www.quora.com/How-can-I-upload-to-an-FTP-site-via-email It depends if you're hosting the mail server or not. If you are, and it's a unix box, you change your .forward file to pipe into a program to do the processing, eg: |/path/to/program If you're already using procmail for local mail delivery, you can do more complex things with a .procmailrc file. (eg, only pass along to the processing program messages that match certain characteristics): http://www.procmail.org/ If you're not hosting your own mail server, you might be able to cobble something together with fetchmail, which retrieves mail from IMAP or *POP* services and then processes it for local delivery: http://www.fetchmail.info/ -Joe
Re: [CODE4LIB] email to FTP or something?
On Oct 17, 2012, at 12:15 PM, Cary Gordon wrote: The securely part is a gotcha. I would venture a guess that whatever the gadget does to produce emails doesn't include encryption or key verification. What do you qualify as 'securely'? You scan the message attachment to make sure it's valid, process it, and then either put it in place (if local) or scp over to the server that's doing the hosting. If you're concerned about the e-mail itself being unsecure, then you have to look into what protocols the appliance supports. If it does ASMTP (Authenticated SMTP) over TLS, then you're fine: http://www.ietf.org/rfc/rfc2554.txt If it doesn't, well, then you set up a local mail relay that's firewalled off so that only the appliance can talk to it, and have that one do the processing / transfer. ... We used to use these sorts of things at the university where I used to work. One would process the class schedules (generated as a nightly report from the registration system), and make a series of pages for gopher (later modified to generate HTML). Another was used so that authorized users could modify the 'university status' message (eg, closed due to snow) years before there were protocols such as webdav. It's also quite useful for generating status pages based on cronjob messages. -Joe On Wed, Oct 17, 2012 at 9:05 AM, Joe Hourcle onei...@grace.nascom.nasa.gov wrote: On Oct 17, 2012, at 11:46 AM, Nate Hill wrote: Maybe someone can offer me a suggestion here... I bought a nifty new gadget that records data and spits out csv files as email attachments. I want to go from csv MySQL and build a web application to do cool stuff with the data. The thing is, the device can only email the files as attachments, it doesn't give me the ability to upload them to a server. Can anyone suggest how I can securely email a file directly to a folder on a server? The scenario is nearly identical to what is described here: http://www.quora.com/How-can-I-upload-to-an-FTP-site-via-email It depends if you're hosting the mail server or not. If you are, and it's a unix box, you change your .forward file to pipe into a program to do the processing, eg: |/path/to/program If you're already using procmail for local mail delivery, you can do more complex things with a .procmailrc file. (eg, only pass along to the processing program messages that match certain characteristics): http://www.procmail.org/ If you're not hosting your own mail server, you might be able to cobble something together with fetchmail, which retrieves mail from IMAP or *POP* services and then processes it for local delivery: http://www.fetchmail.info/ -Joe -- Cary Gordon The Cherry Hill Company http://chillco.com
[CODE4LIB] Job : Senior Software Engineer (mostly Perl / SOAP work)
For those of you who saw Mitzi's job announcement, but are more of a backend person rather than a web developer*, my group has a job opening on the other side of the building**, writing connectors for the Virtual Solar Observatory, a distributed federated search system for solar physics data: http://www.sesda3.com/careers/ss062-senior-software-engineer/ The quick summary of the main task: Most of the existing system's in Perl, using SOAP::Lite. Most of the catalogs are in MySQL or PostgreSQL. Much of the issues are reconciling data models, so having a physics or other science background is useful. Pros: Pretty laid back environment. Working for NASA. Learn about the sun. Working with interesting people. Cons: Can be aggressively laid back if you don't conform (I was threatened with bodily harm in my first week if I continued to wear ties, even though they featured cartoon characters ... I still don't understand how someone couldn't appreciate a Dogbert tie) And it's only laid back in some regards; anything that might affect a spacecraft or human spaceflight is taken *really* seriously; men with guns have been known to show up and seize machines when we have security breaches. Trying to explain to your grandmother the difference between working for a contractor at a NASA center, and actually directly working as a civil servant. Dealing with bureaucratic rules that make no sense (which our boss does his best to shield us from) and having to do tons of extra work when Congress threatens to shut down the government (see con #2). Hour long phone calls with your grandmother explaining that no, the sun is not going to blow up this year, and how unrealistic it is that the Mayans were able to pinpoint to a specific day more than a millennia ago when we can't be sure if it's going to rain next Tuesday. Interesting people occasionally involves scientists who are convinced their PhD makes them an expert in *everything* including your job (see http://xkcd.com/793/ )... and some of them write code that you have to interface with. You'd have to work with me. I can answer questions about the work that needs to be done, the group you'd work with, stuff like that. Everything else has to go through ADNET HR. (I couldn't even tell you about the benefits, as I work for one of the sub-contractors) -Joe * Although, I wouldn't mind a web developer; our site's been in need of some work for years, but that's another long story. Those skills were in the 'preferred' list that I was told that I should not have titled 'minion wishlist' ** ie, Goddard Space Flight Center, Greenbelt, MD. But we're a little more relaxed in that we'll accept U.S. citizens *or* permanent residents. - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?
On Sep 4, 2012, at 10:48 AM, Matthew LeVan wrote: It's like a google search challenge! Looks like they changed their student home link patterns... http://home.ubalt.edu/nicole.kerber/idia642/Final_Usability_Report.pdf That's a challenge? http://www.google.com/search?q=Final_Usability_report.pdf+site:ubalt.edu (although my normal first step would've been archive.org, but they didn't have it in their cache) -Joe On Tue, Sep 4, 2012 at 10:44 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Hi helpful code4lib community, at one point there was a report online at: http://student-iat.ubalt.edu/**students/kerber_n/idia642/** Final_Usability_Report.pdfhttp://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usability_Report.pdf David Walker tells me the report at that location included findings about SFX and/or other link resolvers. I'm really interested in reading it. But it's gone from that location, and I'm not sure if it's somewhere else (I don't have a title/author to search for other than that URL, which is not in google cache or internet archive). Is anyone reading this familiar with the report? Perhaps one of the authors is reading this, or someone reading it knows one of the authors and can be put me in touch? Or knows someone likely in the relevant dept at ubalt and can be put me in touch? Or has any other information about this report or ways to get it? Thanks! Jonathan
Re: [CODE4LIB] looking for an application to handle a large amount of redirects
On Aug 30, 2012, at 2:17 PM, John A. Kunze wrote: If you run Apache server at the old location, and the original links and new links obey one or a few regular patterns, you could use one or a few RedirectMatch directives. If there are few patterns, you could use a big enumerated list of simple Redirect directives. An ordinary Apache server can easily be loaded with a million directives; a little slow on 'restart', but very short redirect times when it comes up. If you need more than that, you could look into installing a noid resolver. (google CPAN noid) Yet another alternative in Apache when you're dealing with a larger number of items to redirect (and it's not simple directories moving about) is mod_rewrite's RewriteMap: http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewritemap -Joe --- On Thu, 30 Aug 2012, Pottinger, Hardy J. wrote: Hi, we're in the process of migrating an existing digital library to a new platform, and we want to ensure that old URLs continue to resolve to the items in the new location. The new digital library will be built on Islandora, and I am pretty sure we can just map old URLs to new ones within Fedora Commons. But, in case we run into trouble, I was wondering if anyone might have some experience with an application that's more specific to our use case? Or, heck, if you have already migrated from a DLXS-based digital library to an Islandora-based digital library, and have already sorted out how to handle redirects, I'd love to hear from you. Thanks! -- HARDY POTTINGER pottinge...@umsystem.edu University of Missouri Library Systems http://lso.umsystem.edu/~pottingerhj/ https://MOspace.umsystem.edu/ Debug only code. Comments lie.
Re: [CODE4LIB] Maker Spaces and Academic Libraries
On Aug 28, 2012, at 9:07 AM, Emily Lynema wrote: I find this conversation interesting, mostly because the why do it reasons given parallel so closely what we are working on at NC State in our new library building. Except it doesn't have anything to do with makerspaces! Our emphasis is on taking expensive visualization and high performance computing capacity and making it available to students all across our campus. Some would ask why we are building massive visualization walls and working on creating a cloud computing environment where anyone can request temporary access to high performance computing in order to build stuff to render on the visualization walls. And it's just the same as the reason given for doing makerspaces in academic libraries: while faculty on fancy grant projects have access to high performance computing nodes, nowhere on campus is this kind of computing and visualization openly available for undergraduate students to creatively use. It's neat to see the different directions we go with the same underlying reason. And in that regard, (high performance computing), I heard an interesting story from someone who I think was from JHU Physics dept. a year or so ago -- Basically, all of the professors were building their own personal beowulf clusters (getting the money as either part of their condition on hire, or using grant money to buy them) which caused a number of problems: 1. They weren't experts, so it'd take them a while to set up. 2. They typically didn't secure them properly, so they'd get hacked, and they had to take them down, and often didn't get them back up for many months, up to a year from original purchase 'til it was finally running at full tilt. (ie, it had already depreciated by a year) 3. So many clusters were built, that it overloaded the electrical in the building, and the whole building lost power. ... So there really are some benefits to having a centralized cluster that the faculty can submit jobs to, rather than all of the little ones. The visualization stuff may be even more useful, as they're quite uncommon. Besides some of the 'hiperwall' and 'cave' systems, there was a project from one of the Harvard libraries on using a Microsoft Surface (the table, not the yet-to-be-released table) for working with huge images (telescope data, hi-res scans, etc.) http://projects.iq.harvard.edu/harvardux/ -Joe
Re: [CODE4LIB] Corrections to Worldcat/Hathi/Google
On Aug 28, 2012, at 12:05 PM, Galen Charlton wrote: Hi, On 08/27/2012 04:36 PM, Karen Coyle wrote: I also assumed that Ed wasn't suggesting that we literally use github as our platform, but I do want to remind folks how far we are from having people friendly versioning software -- at least, none that I have seen has felt intuitive. The features of git are great, and people have built interfaces to it, but as Galen's question brings forth, the very *idea* of versioning doesn't exist in library data processing, even though having central-system based versions of MARC records (with a single time line) is at least conceptually simple. What's interesting, however, is that at least a couple parts of the concept of distributed version control, viewed broadly, have been used in traditional library cataloging. For example, RLIN had a concept of a cluster of MARC records for the same title, with each library having their own record in the cluster. I don't know if RLIN kept track of previous versions of a library's record in a cluster as it got edited, but it means that there was the concept of a spatial distribution of record versions if not a temporal one. I've never used RLIN myself, but I'd be curious to know if it provided any tools to readily compare records in the same cluster and if there were any mechanisms (formal or informal) for a library to grab improvements from another library's record and apply it to their own. As another example, the MARC cataloging source field has long been used, particularly in central utilities, to record institution-level attribution for changes to a MARC record. I think that's mostly been used by catalogers to help decide which version of a record to start from when copy cataloging, but I suppose it's possible that some catalogers were also looking at the list of modifying agencies (library A touched this record and is particularly good at subject analysis, so I'll grab their 650s). I seem to recall seeing a presentation a couple of years ago from someone in the intelligence community, where they'd keep all of their intelligence, but they stored RDF quads so they could track the source. They'd then assign a confidence level to each source, so they could get an overall level of confidence on their inferences. ... it'd get a bit messier if you have to do some sort of analysis of which sources are good for what type of information, but it might be a start. Unfortunately, I'm not having luck finding the reference again. It's possible that it was in the context of provenance, but I'm getting bogged down in too many articles about people storing provenance information using RDF-triples (without actually tracking the provenance of the triple itself) -Joe ps. I just realized this discussion's been on CODE4LIB, and not NGC4LIB ... would it make sense to move it over there?
Re: [CODE4LIB] Maker Spaces and Academic Libraries
On Aug 27, 2012, at 9:44 AM, BWS Johnson wrote: Salvete! Can't. Resist. Bait. Batman. Can anyone on the list help clarify for me why, in an academic setting, this kind of equipment and facility isn't part of a laboratory in an academic department? I'd say that I hate to play devil's advocate, but that would be a patent misrepresentation of material fact. Conversely, could you please tell us why you think it *shouldn't* be at the Library? I can think of one reason they shouldn't be *anywhere*: liability. When I was working on my undergrad, in civil engineering, the university's science and engineering school had their own machine shop. Officially, you were only supposed to use it if you were a grad student, or supervised by a grad student. Yet, there were a number of us (the undergrad population) who had more experience than the grad students. (I had done a couple years of shop class during high school, one of the other students had learned from his father who worked in the trade, another was going back to school after having been a professional machinist for years, etc.). So well, I know at least two of us would go down and use the shop without supervision. (and in a few cases, all alone, which is another violation when you're working at 1am and there's no one to call for medical assistance should something go really, really wrong). And in some cases, we'd teach the grad students who were doing stuff wrong (trying to take off too much material in a pass, using the incorrect tools, etc. But I made just as many mistakes. (when you're in a true machine shop, and there's two different blades for the bandsaw with different TPI, it's not that one's for metal and one's for wood ... as they don't do wood cutting there ... but I must've broken and re-welded the blade a half dozen times and gone through a quart of cutting fluid to make only a few cuts, as I didn't realize that I should've been using the lower TPI blade for cutting aluminum) I admit I don't know enough about these 'maker spaces' ... I assume there'd have to be some training / certification before using the equipment. The other option would be to treat it more like a print shop, where someone drops off their item to be printed, and then comes back to pick it up after the job's been run. And it's possible that you're using less dangerous equipment. (eg, when in high school, my senior year we got a new principal who required that all teachers wear ties ... including the shop teachers. Have you ever seen what happens when a tie gets caught in a lathe or a printing press? He's lucky the teachers were experienced, as a simple mistake could've killed them) But even something as simple as a polishing/grinding wheel could be a hazard to both the person using it and anyone around them. (I remember one of my high school shop teachers not happy that I was so aggressive when grinding down some steel, as I was spraying sparks near his desk ... which could've started a fire) ... so the whole issue of making sure that no one gets injured / killed / damages others is one of the liability issues, but I also remember when I worked for the university computer lab, we had a scanner that you could sign up to use. One day, one of the university police saw what one of the students was doing, and insisted that we were allowing students to make fake IDs. (the student in question had scanned in a CD cover, which was a distorted drivers license looking thing ... if he was trying to make a fake ID, you'd think he'd have started from a genuine ID card) As we've now got people who are printing gun receivers, there's a real possibility that people could be printing stuff that might be in violation of the law. (I won't get into the issue of if it's a stupid law or not ... this is something the legal department needs to weigh in on). And conversely, if you're a public institution and you censor what people are allowed to make, then you get into first amendment issues. ... On a completely unrelated note, when I first saw the question about libraries maker spaces, I was thinking in the context of public libraries, and thought the idea was pretty strange. I see a much better fit for academic libraries, but I'm still not 100% sold on it. In part, I know that it's already possible to get a lot of stuff 'made' at most universities, but you risk treading on certain trade's toes, which could piss off the unions. Eg, we had a sign shop who had some CNC cutters for sheet goods (this was the mid 1990s), carpenters and such under the building maintenance, large scale printing and book binding through the university graphics department (they later outsourced the larger jobs, got rid of the binding equipment). I could see the equipment being of use to these groups, but I don't know that they'd be happy if their lack of control over being able to make money by charging for
[CODE4LIB] Software/service to deal with matching up incomplete DVD/CD sets.
So yesterday, I noticced a question on the libraries info science stack exchange site on dealing with TV seres ... which led me to post a question about dealing with trying to match up libraries with incomplete sets of multi-disk packages: http://libraries.stackexchange.com/q/1051/62 So far, the only response has been from someone who said that they use a a shared Google docs file for this. I'm thinking that some software to better manage this could be useful to library consortia, multi-branch systems, etc. So, a few questions for this group: 1. Does anyone know of software specifically designed for doing this? (if so, you can probably just answer it on the site) 2. Can anyone suggest existing software that might be able to be repurposed to handle this? (I've never used the various commercial DVD/book swapping sites, but I'm guessing it'd be a similar approach ... although maybe make it specifically track by ISBN so we don't get a 'special edition' mixed in with a 'regular' edition or widescreen vs. full screen) 3. Would anyone be interested in helping to build it? (my time's rather scarse at this time ... if I manage to loose the election for AGU ESSI secretary, I might get a little time back, but once the new year rolls around, I'm going to barely be keeping my head above water ... I *am* willing/able to fund the hosting service and such, though) (I guess just reply directly to me for this one) 4. And to judge demand -- would people be interested in using it if it did exist? ... if so, let me know, as I'd need to spec out what the requirements are. (eg, if it should be individual instances for different library systems, one big system open to all (with some confirmation the registered users work for libraries), or some larger system w/ rules set by the offerer on who they'll share with (only in this state, only in my consortia, etc.) -Joe (I'd attach my .sig, but this really has nothing to do with my day job ... although, there was that proposal for a 'tool exchange' at NASA that won the whitehouse SAVE award last year, and it could be construed as a similar concept ... is just won't help the local library, as they're dropping all of their physical items)
Re: [CODE4LIB] Intuitive Dual Boot on Mac (Mountain Lion)
On Aug 17, 2012, at 2:50 PM, Ingersoll, Ryan wrote: Hi everyone, I am imaging and configuring 20 MacBook Pros for student check out. They will have the option to dual boot (Mountain Lion or Windows 7). I am looking for an intuitive way to inform how to boot to Windows. I was thinking of a desktop background once they log in to the Mac side, but that doesn't seem the friendliest or quickest (though logging in to the Mac is significantly faster compared to Windows). I really don't want to tape instructions to the physical computer either. Is is possible to tweak the login screen? You can add a message, if that's what you're looking for. It looks like it's gotten easier in more recent versions of the OS: http://www.macobserver.com/tmo/article/os_x_lion_adding_custom_messages_to_the_login_window/ Previously, you'd have to go and edit the loginwindow.plist file: http://hints.macworld.com/article.php?story=20020921074429845 - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] Browser Wars
On Jul 12, 2012, at 3:39 PM, Cary Gordon wrote: It is almost worth getting an iPad just to see all the clueless messages. Borrow one and try some restaurant sites. The restaurant business seems to have the absolute worst relationship between what they spend and the usefulness of what they get. I understand and respect your view, but still contend that regardless of the reason that someone is using IE 6, they have certainly had enough time to figure it out by now. The only way that IE6 users will have a good experience is if you build a site for them. iPads are especially annoying because there are so many websites that must've been re-written to support iPhones, but haven't had any updates, so they insist on redirecting you to the 'mobile' version of the website. Which often, goes something like: http://xkcd.com/869/ I actually have fewer problems on my WebOS phone, as no one bothered to write specifically for it. (or be smart, and ask for the resolution or window size, and deal with things that way ... or even use CSS sheet with '@media handheld') ... As for IE6, one of the many arguments against supporting is is that by catering to people who are still using 12 year old web browsers, you're keeping them from upgrading to a more secure browser. Now, ideally, you don't make pages that are completely useless without plugins and javascript and whatever turned on ... but we shouldn't be forced to make it pretty for 'em. ... And more scripting languages (javascript/ecmascript/whatever they want to call it) that are intended to be used across platforms without knowing what version it's going to be run from ... need to have some way of asking 'hey, do you support (x)', rather than all of the assumptions based on the browser string (which in my case, is often a lie, specifically because of those sites that make bad assumptions), and they may have no idea what stuff I've specifically limited in my security preferences. -Joe (who complains every year when I have to re-take the annual security training that won't work unless I (1) allow pop-ups, (2) allow plug-ins and (3) allow java)
Re: [CODE4LIB] LoC job opening ???
On Jul 9, 2012, at 2:04 PM, Chris Fitzpatrick wrote: This just seems like some sort of trap. The fact that it's a craigslist ad in all caps makes me pretty sure this person is working on a librarian centpede in their basement. If that were the case, I think they'd also accept applicants from the Folger Shakespeare Library, which may actually be closer. So, the real question is why it must specifically be federal employee librarians. (and I don't know of any librarians with TS/SCI/Poly ... but I *have* heard that some of the archivists at the National Archives do, but that was a 'my son is fed up with his job' story from a librarian at my local public branch) -Joe On Jul 9, 2012 7:56 PM, Simon Spero sesunc...@gmail.com wrote: On Jul 9, 2012 1:27 PM, Joshua Gomez jngo...@gwu.edu wrote: WE NEED A CAT LOVER WHO IS ALSO A FEDERAL EMPLOYEE TO DO THIS JOB! Must have active TS/SCI clearance with FS Poly. All applicants must complete the attached 20 page KSA.
Re: [CODE4LIB] LoC job opening ???
On Jul 9, 2012, at 3:00 PM, Joseph Montibello wrote: Um, did LC just stop referring to Library of Congress? http://www.acronymfinder.com/LC.html The closest that I can come to having the paragraph all make sense is 'low carb', but the 'pay is lousy' doesn't work for it. -Joe But doing the LC thing isn't as bad as it soundsI did it for a few months when I first got out of school. The pay is lousy, but you do get pretty nice benefits (although it's hard to find a dentist that will actually see you when you're in that condition).
Re: [CODE4LIB] Storing lat / long
On Jun 28, 2012, at 3:46 PM, Matthew LeVan wrote: I'd think it would depend on what you plan to do with the coordinates once you have them stored. If you intend to do anything at all complicated (spatial queries, KML generation, your own custom maps, area/volume calculations), you might want to consider a spatial database extension ( http://en.wikipedia.org/wiki/Spatial_database). I've used the SQLite SpatiaLite and Postgres PostGIS extensions, and they're fairly straightforward to setup. Agreed. If you're going to be searching on them (places w/in 50 miles of (x), closest to (y)) ... spatial database extensions are the way to go. If you're just going to be returning them for display, it probably doesn't matter so much, but odds are someone in the future is going to ask about it. (and that being said; I store two copies of most anything coordinate or unit related ... one for searching that's well normalized, and one for display purposes ... database normalization be damned) -Joe
Re: [CODE4LIB] Best way to process large XML files
On Jun 8, 2012, at 2:36 PM, Kyle Banerjee wrote: I'm working on a script that needs to be able to crosswalk at least a couple hundred XML files regularly, some of which are quite large. [trimmed] How do you guys deal with large XML files? Thanks, um ... I return ASCII tab-delim records, because IDL's XML processing routines have some massive issue with garbage collection if you walk down the DOM tree. However, no one in their right mind uses IDL for XML, as it's basically Fortran w/ multi-dimensional arrays. ... Everyone else is going to tell you to use SAX, and they're probably right, but as you sound to be as reluctant as I am on using SAX, another alternative may be Perl's XML::Twig: http://search.cpan.org/perldoc?XML::Twig -Joe
Re: [CODE4LIB] viewer for TIFFs on iPad
On May 10, 2012, at 11:16 AM, Edward Iglesias wrote: Hello All, I was wondering if any of you had experience viewing large ~300MB and up TIFF files on an iPad. I can get them to the iPad but the photo viewer is less than optimal. It stops enlarging after a while and I'm looking at Medieval manuscripts so... Are there any other requirements? If it doesn't have to be actually on that machine, and you can interact with a webserver, you might want to consider converting it to JPEG2000, and then using a JPIP server to serve them. The group here that's using it is only serving 16 megapixel images, but the advantage is that you can selectively send only the regions and detail as needed ... but you don't have to generate lots of tiles at different scaling: http://wiki.helioviewer.org/wiki/ESA_JPIP_Server -Joe
Re: [CODE4LIB] Anyone using node.js?
On May 8, 2012, at 2:18 PM, Ross Singer wrote: On May 8, 2012, at 2:01 PM, Ethan Gruber wrote: [trimmed] Thanks for the info. To clarify, I don't develop in java, but deploy well-established java-based apps in Tomcat, like Solr and eXist (and am looking into a java triplestore to run in Tomcat) and write scripts to make these web services interact in whichever language seems to be the most appropriate. Node looks like it may be interesting to play around with, but I'm wary of having to learn something completely new, jettisoning every application and language I am experienced with, to put a new project into production in the next 4-8 weeks. Eh, if your window is 4-8 weeks, then I wouldn't be considering node for this project. It does, however, sound like you could really use a new project manager, because the one you have sounds terrible. But project managers don't 'add value' unless they actually do something. If they just let you do things the way that you've done in the past, even if they worked, they could be replaced by any other project manager who knew enough not to micro-manage things. And, if you actually managed to do the project on time, with them staying mostly hands-off, what does that tell people? That they're not needed ... they need a project that's going to hell, so they can step in and 'fix' stuff. -Joe ps. and besides the obvious 'this is not the opinion of my employer, and may or may not be sarcasm' disclaimer, I've had a few instances where there was a non-quite-as-tight deadline and I had to learn something new ... but they footed the bill for sending me to a week of training pps. in all seriousness -- I know of someone who pulled crap like this, and then used it as a reason to fire the developer and replace them with one of the PM's friends who had the 'needed' skills ... then another instance where an outside consultant did a 'peer review' of our system 2 weeks before we were supposed to go live and then somehow got a contract to design build a different system which took a year and cost the university $250k? $500k?, but he never delivered (hardware was shipped w/ empty drive arrays) ... so I might be a little more jaded than most in this scenario. (but neither of those two anecdotes were at my current employeer)
Re: [CODE4LIB] possible new stackexchange site for Digital Preservation
On Apr 26, 2012, at 12:26 PM, Nada O'Neal wrote: I haven't seen the proposed new Stackexchange digital preservation site: http://area51.stackexchange.com/proposals/39787 mentioned on code4lib yet. I'm sure most of you have turned to Stack Overflow in your darkest hours of need, so if you think you might like such a site specifically geared towards Digital Preservation, please take a look. The proposal is currently in the commitment stage and needs about 900 more committers to make it to the next stage. It was mentioned yesterday, but it doesn't need 900 more 'committers'. If you click on the 'more info' near the 11% commitment score: The commitment score is the minimum of three scores: 56% 112/200 committers in total 11% 11/100 committers with 200+ rep on any other site 40% commitment score, based on committers' activity on all other sites and how old the commitment is So ... yes, we need another 88 people to commit ... but what's going to be harder to get (as evidenced by the 'Libraries' proposal, which has dragged on for so long that the folks at Stack Exchange renamed it to 'Library and Information Science' incorrectly thinking that it'd be broadening the category : http://area51.stackexchange.com/proposals/12432/) Now, the important thing is that the 'any other site' is specifically 'Stack Exchange 2.0' sites, which means that Unshelved Answers, even though it was a 'Stack Exchange' site *does* *not* count. It must be one of the sites listed at: http://stackexchange.com/sites And it's really not that hard ... ask a few good questions (make sure they're not a duplicate, or they'll mark you down), or answer some questions, and you'll get voted up. Now, the thing is, some of the larger sites get so many questions that fewer people are going to look at them unless you make it really intriguing (which could get it marked down and closed as subjective). So, I'd recommend sticking with some of the smaller sites, including these that haven't yet graduated out of 'beta'. For example, likely relevant for those on here, being an intersection of MLS folks and programmers: Databases : http://dba.stackexchange.com/ Drupal : http://drupal.stackexchange.com/ Wordpress : http://wordpress.stackexchange.com/ User Experience : http://ux.stackexchange.com/ Graphic Design : http://graphicdesign.stackexchange.com/ Unix / Linux : http://unix.stackexchange.com/ Apple : http://apple.stackexchange.com/ Ubuntu : http://askubuntu.com/ English Language : http://english.stackexchange.com/ Linguistics : http://linguistics.stackexchange.com/ Project Management : http://pm.stackexchange.com/ Academia: http://academia.stackexchange.com/ eg, Is there any world-wide ranking of conferences/journals? : http://academia.stackexchange.com/questions/1199/ or Preprint services other than arXiv (for other fields) : http://academia.stackexchange.com/questions/84/ (don't bother with Literature -- it's going to be culled) And of course, the original three: programmer questions : http://stackoverflow.com/ sysadmin questions : http://serverfault.com/ other computer users : http://superuser.com/ So, and for advice on getting reputation ... writing good answers tends to be the best way to go, but you want to : Format it clearly. (bulleted lists are your friend; they use MarkDown, but there's an editor to make it easy) Use good grammar / punctuation (minor ones, not so bad ... if it looks like you're being sloppy and didn't even try ... not so good) Cite authoritative sources when appropriate Give an answer, not just a link (eg, summarize, then cite the authority) Speak from a position of authority and you're more likely to get voted up even when you're wrong... a 'it might be (x)' or 'have you tried (x)?' isn't going to go was well as 'As you said (y), based on previous experience, there's a good probability of it being (x)' Don't be repetitive; if there's already a similar answer, you're better off commented on that answer to improve it ... Answer quickly; most people look to see what they can answer when they first see a new question, and so if there's already a good answer there will vote it up ... two weeks later, not so much. (although, I find that I'll get sudden bursts of lots of old answers being voted up ... and I know that if someone gives an interesting answer, I'll look at what else they've posted, which often leads me to vote their stuff up) If you're going to ask questions: Make sure it's not something that can be answered easily with a search on the internet. Select good 'tags' for it. (although, others may change the tags, but having good ones up front helps) ... and, I should add
Re: [CODE4LIB] crowdsourced book scanning
On Apr 25, 2012, at 1:36 PM, Michael Lindsey wrote: A colleague posed an interesting idea: patrons scan book pages to deliver to themselves by email, flash drive, etc. What if the scans didn't disappear from memory, but went into a repository so the next patron looking for that passage didn't have to jockey the flatbed scanner? * Patron scans library barcode at the scanner * The system says, I have these pages available in cache. o Patron's project overlaps with the cache and saves time in the scanning, or o Patron needs different pages, scans them and contributes to the cache Now imagine a consortium of some sort where when the patron scans the barcode, the system takes a hop via the ISBN number in the record to reach out to a cache developed between a number of libraries. I know there are a number of cases where this may not apply, like loose-leaf publications in binders that get updated, etc. And I'm sure there are discussions around how to handle copyright, fair use, etc. Do we as a community already have a similar endeavor in place? It sounds like a great idea ... but I'm guessing that this is the sort of thing that Google got in trouble for, as they were storing copies of books. It might be that as libraries, we have different exemptions from copyright law than I'm aware of, but I'm looking in Section 108 of Title 17 and I don't think it'd be allowed, or at the very least would increase the library's liability. Per 108(g) (g) The rights of reproduction and distribution under this section extend to the isolated and unrelated reproduction or distribution of a single copy or phonorecord of the same material on separate occasions, but do not extend to cases where the library or archives, or its employee — (1) is aware or has substantial reason to believe that it is engaging in the related or concerted reproduction or distribution of multiple copies or phonorecords of the same material, whether made on one occasion or over a period of time, and whether intended for aggregate use by one or more individuals or for separate use by the individual members of a group; or ... -Joe
Re: [CODE4LIB] Help Start a Digital Preservation Stack Exchange QA Site
On Apr 25, 2012, at 3:36 PM, Owens, Trevor wrote: I and some other folks working in digital preservation are trying to get a Stack Exchange site focused on digital preservation launched. Here is the blurb defining the proposed site: Proposed QA site for librarians, archivists, curators, data managers, information specialists, computer scientists and engineers and other professionals working to ensure long term access to digital objects. It you would like to help get it launched just click the link and hit the commit button. At this point the biggest hurdle is getting people who have already have at least 200 rep on other stack exchange sites to commit. So, if you have participated in any of the stack exchange sites it would be particularly awesome if you could commit. Also, if you know other folks that you think would be interested please consider sending the link along to them too. http://area51.stackexchange.com/proposals/39787/digital-preservation?referrer=anTT6XLk2hYl8-Pye4BdZw2 And because you need 200 rep on one of the other sites, you can commit to the proposal, and then find other stack exchange sites that you'd be interested in to try to get the 200 reputation necessary: http://stackexchange.com/sites (although, as a former moderator of the cooking site, I know that if they see people working together to bump up each other's reputation abnormally, they'll at the very least erase it all) ... and hopefully this won't turn into the 'Libraries' proposal that languished as they had 500+ committed, but only 80 w/ the necessary rep and then was renamed to 'Libraries and Information Science' : http://area51.stackexchange.com/proposals/12432/libraries-information-science?referrer=xHuHFdj5_FDG1iedac--IA2 -Joe
Re: [CODE4LIB] monitoring wireless networks
On Apr 12, 2012, at 12:14 PM, Tara Robertson wrote: Hi, Is there an automated way of monitoring (and notifying) when a wireless network goes down? I'm looking for something like Nagios, but for wireless (or can Nagios do this too?) I don't manage our network--our ITS department does. They seem to think it's adequate that I'm the monitoring system but I'm finding this extremely frustrating. Nagios can monitor *anything* so long as you can write a script that'll get you some status back. If you have a command line way of getting signal strength for the network, that'd likely be best, but you could also just test to see if you can ping out on the right interface. -Joe
[CODE4LIB] DC / Baltimore Perl Workshop
Apologies in advance if you've already seen this from other mailing lists; I know we have a few Perl folks on here, but I don't know how many in the DC area. The DC Baltimore Perl Mongers groups are organizing a Perl workshop on Sat, April 14th in Catonsville, MD. We're still filling out the program schedule, but I thought I'd mention it as today's the last day for early registration ($25 vs. $50, although free for students the unemployed) http://dcbpw.org/dcbpw2012/ -Joe
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
On Feb 27, 2012, at 10:51 AM, Godmar Back wrote: On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann metadata.ma...@gmail.comwrote: On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens o...@ostephens.com wrote: This issue is certainly not unique to VT - we've come across this as part of our project. While the OAI-PMH record may point at the PDF, it can also point to a intermediary page. This seems to be standard practice in some instances - I think because there is a desire, or even requirement, that a user should see the intermediary page (which may contain rights information etc.) before viewing the full-text item. There may also be an issue where multiple files exist for the same item - maybe several data files and a pdf of the thesis attached to the same metadata record - as the metadata via OAI-PMH may not describe each asset. This has been an issue since the early days of OAI-PMH, and many large providers provide such intermediate pages (arxiv.org, for instance). The other issue driving providers towards intermediate pages is that it allows them to continue to derive statistics from usage of their materials, which direct access URIs and multiple web caches don't. For providers dependent on external funding, this is a biggie. Why do you place direct access URI and multiple web caches into the same category? I follow your argument re: usage statistics for web caches, but as long as the item remains hosted in the repository direct access URIs should still be counted (provided proper cache-control headers are sent.) Perhaps it would require server-side statistics rather than client-based GA. I'd agree -- if you can't get good statistics from direct linking, something's wrong with the methods you're using to collect usage information. Google Analytics and similar tools might produce pretty reports, but they're really meant for tracking web sites and won't work when someone has javascript turned off, has specifically blacklisted the analytics server, or on anything that's not HTML. You *really* need to analyze the server logs directly, as you can't be sure that all access is going to go through the intermediate 'landing pages' or that it'd be tracked even if they did. ... I admit, the stuff I'm serving is a little different than most people on this list, but we also have the issue that the collections are so large that we don't want people retrieving the files unless they really need them. We serve multiple TB per day -- I'd rather a person figure out if they want a file *before* they retrieve it, rather than download a few GB of data and find out it won't serve their purposes. It might not help our 'look how much we serve!' metrics to justify our funding, but it helps keep our costs down, and I personally believe it helps with good will in our designated community as they don't spend a day (or more) downloading only to find it's not what they thought. (and it fits in with Ranganathan's 4th law better than saving them from an extra click) -Joe
Re: [CODE4LIB] Transcription/dictation software?
On Feb 27, 2012, at 1:52 PM, Suchy, Daniel wrote: Hello all, At my campus we offer podcasts of course lectures, recorded in class and then delivered via iTunes and as a plain Mp3 download (http://podcast.ucsd.edu). I have the new responsibility of figuring out how to transcribe text versions of these audio podcasts for folks with hearing issues. I was wondering if any of you are using or have played with dictation/transcription software and can recommend or de-recommend any? My first inclination is to go with open-source, but I'm open to anything that works well and can scale to handle hundreds of courses. I remember seeing a poster on a wall at the University of Maryland presenting work on a grant on doing this sort of work ... but I think it was for intelligence intercepts, as it was DoD funded and being used for Arabic. This might've been the project: Global Autonomous Language Exploration http://projects.ldc.upenn.edu/gale/index.html I have no idea why it's on a UPenn website, but it's listed at: http://ischool.umd.edu/content/research-and-projects And one of the researchers is Doug Oard, which matches what I remembered. It might've also been Supporting Information Access Using Computational Linguistics, which was also DoD funded, but doesn't have a website link in that list. And they didn't verify the links to faculty pages, so try one of the links to 'Douglas Oard' rather than 'Douglas Ward' if you want to contact him. I also don't know if they were doing full transcription / translation, or if they were just looking for specific words to alert a human translator to review it. ... Also, in the earlier list that Todd linked to, Zooniverse was mentioned. They have a framework for mechanical turk-type stuff, but they tend to be science oriented, and I don't know if they've ever done audio transcription. It's not exactly what they deal with, but they might be interested in helping, as at the 2010 DCC, someone said they had the problem of not enough work for their volunteers to do. (although, that might've changed since then). https://www.zooniverse.org/researchers -Joe
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
On Feb 26, 2012, at 9:42 AM, Godmar Back wrote: May I ask a side question and make a side observation regarding the harvesting of full text of the object to which a OAI-PMH record refers? In general, is the idea to use the dc:source/text() element, treat it as a URL, and then expect to find the object there (provided that there was a suitable dc:type and dc:format element)? Example: http://scholar.lib.vt.edu/theses/OAI/cgi-bin/index.pl allows the harvesting of ETD metadata. Yet, its metadata reads: ListRecords metadata dc typetext/type formatapplication/pdf/format source http://scholar.lib.vt.edu/theses/available/etd-3345131939761081//source When one visits http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/ however there is no 'text' document of type 'application/pdf' - rather, it's an HTML title page that embeds links to one or more PDF documents, such as http://scholar.lib.vt.edu/theses/available/etd-3345131939761081/unrestricted/Walker_1.pdfto Walker_5.pdf. Is VT's ETD OAI implementation deficient, or is OAI-PMH simply not set up to allow the harvesting of full-text without what would basically amount to crawling the ETD title page, or other repository-specific mechanisms? I don't know if it's the official method, and I've never actually implemented OAI-PMH myself, but I'd be inclined to have source point to an OAI-ORE document, which can then point to the PDF, full text, or whatever else. If it's not currently an ORE document, you might still be able to do some creative redirection on the webserver if you see the appropriate Accept header and handling it as you would normal content negotiation You could also add a 'resourcemap' link element in the HTML page to point to the ORE document. If it's XHTML, you could add the appropriate ORE elements; I think the microformat style HTML was deprecated, as it's not mentioned in the 1.0 spec: http://www.openarchives.org/ore/1.0/ -Joe
Re: [CODE4LIB] Repositories, OAI-PMH and web crawling
On Feb 24, 2012, at 9:25 AM, Kyle Banerjee wrote: One of the questions this raises is what we are/aren't allowed to do in terms of harvesting full-text. While I realise we could get into legal stuff here, at the moment we want to put that question to one side. Instead we want to consider what Google, and other search engines, do, the mechanisms available to control this, and what we do, and the equivalent mechanisms - our starting point is that we don't feel we should be at a disadvantage to a web search engine in our harvesting and use of repository records. Of course, Google and other crawlers can crawl the bits of the repository that are on the open web, and 'good' crawlers will obey the contents of robots.txt We use OAI-PMH, and while we often see (usually general and sometimes contradictory) statements about what we can/can't do with the contents of a repository (or a specific record), it feels like there isn't a nice simple mechanism for a repository to say don't harvest this bit. I would argue there is -- the whole point of OAI-PMH is to make stuff available for harvesting. If someone goes to the trouble of making things available via a protocol that exists only to make things harvestable and then doesn't want it harvested, you can dismiss them as being totally mental. I see it like the people who request that their pages not be cached elsewhere -- they want to make their object 'discoverable', but they want to control the access to those objects -- so it's one thing for a search engine to get a copy, but they don't want that search engine being an agent to distribute copies to others. Eg, all of the journal publishers who charge access fees -- they want people to find that they have a copy of that article that you're interested in ... but they want to collect their $35 for you to read it. In the case of scientific data, the problem is that to make stuff discoverable, we often have to perform some lossy transformation to fit some metadata standard, and those standards rarely have mechanisms for describing error (accuracy, precision, etc.). You can do some science with the catalog records, but it's going to introduce some bias into your results, so you're typically better of getting the data from the archive. (and sometimes, they have nice clean catalogs in FITS, VOTable, CDF, NetCDF, HDF or whatever their discipline's preferred data format is) ... Also, I don't know if things have changed in the last year, but I seem to remember someone mentioning at last year's RDAP (Research Data Access Preservation) summit that Google had coordinated with some libraries for feeds from their catalogs, but was only interested in books, not other objects. I don't know how other search engines might use data from OAI-PMH, or if they'd filter it because they didn't consider it to be information they cared about. -Joe
Re: [CODE4LIB] Any libraries have their sites hosted on Amazon EC2?
On Feb 22, 2012, at 11:52 PM, Cary Gordon wrote: EC2 works for a lot of models, but one that it does not work for is small traffic apps that need to be available 24/7. If you have a small instance (AWS term) running full time with a fixed IP, it costs about $75 a month. If you turn it on for 2 hours a day, it costs about $15/month. A large instance is about $325. Now where it gets interesting is if your app needs a large instance, but only run a few hours a month, you might be able to run a micro instance that is set to start a large (or ???) instance on demand, and run the whole thing for peanuts. We've looked at something similar (not Amazon, NASA is working on its own cloud service) where we'd locally run a server, but at times of high demand, pass off to the cloud service. If you have applications that are cyclic, I could see it being an advantage to have something take over in the peak times. Eg, when I worked for a university, the system we used for class registration was okay ... not great, but okay ... but the incoming freshmen were brought in in 3 or 4 'orientation' periods over the summer, and they'd all hit the system on the same day, at the same hour (well, 1/3? 1/4 of the incoming class) The system performance went to complete crap. We're talking about throughputs worse than if we had metered the access. (and the DBAs refused to look at database tuning, insisting that it was a webserver problem ... it was of course, a database issue, but it was months before we got it straightened out) I could see conferences using something like this -- where almost all of their traffic is on the days of deadlines, or during the conference itself. If the load's pretty uniform, I don't think their pricing model is all that advantageous. (and I have no idea how they handle the loads over christmas, as the reason for the cloud is to make money back on their excess capacity they need for the christmas sales period.) -Joe
Re: [CODE4LIB] Issue Tracker Recommendations
On Feb 22, 2012, at 12:36 PM, Cynthia Ng wrote: Hi All, We're looking at implementing an issue tracker for internal use, so I'm looking for recommendations. What's key: 1) minimal effort in install/setup i.e. ready to use out of the box 2) small scale is okay, we have a very small team 3) ideally, have an area for documentation and issue creation via email What does your institution use? What do you like and dislike most about it? Would you recommend it to others? Responses (short or detailed) would be greatly appreciated. I've only managed Bugzilla and Trac. They both were a little annoying to set up (define all of your software components and versions, and who's responsible for each one, so they'll get notified if bugs are filed). Trac has good reporting wiki for documentation, and their markup syntax makes it easy to link trouble tickets within the documentation (and it'll scratch them out as they're marked as resolved). I did get into some problems, as we had it open to the public, and someone posted an attachment*, which triggered a 'security incident' (which didn't seem to reach the 'men with guns show up and seize your machines' like it had in the past ... instead, it was 'we're going to make you rebuild your machine over and over against until we say it's okay' so I wasted 2 weeks on it) It's also a bit of a pain to strip all occurrences of the term 'wiki' and 'trac' from the software, so that I didn't show up as 7 of the top 10 results in google for 'site:nasa.gov wiki'. If you're keeping it private, it might not be so bad. I also have no idea how useful the interaction with change control is ... we were using CVS, and it was still subversion specific back then. I've also helped to configure Remedy before, it was more than a decade ago, but it left a bad taste it my mouth (and it wasn't cheap) ... As others have mentioned github, I know there's other services out there ... one project here uses launchpad.net (which is tied to Bazaar), and they seem happy with it, but I've never administered it myself. -Joe * The attachment was an image which said 'I've hacked your machine'. Years later, when we switched virus scanning software, it found a backup that had that file in it, and it turns out there was a JPEG exploit in it ... but the security gestapo had thought that *my* server had been hacked, which is what triggered it all.
Re: [CODE4LIB] Touch Screens in the Library
On Feb 13, 2012, at 10:50 AM, Cynthia Ng wrote: Hi All, I was wondering if anyone has implemented (or plan to implement) touch screens in their library? We're looking mostly at doing it for wayfinding (finding items, rooms, etc.) but I'd definitely be interested in hearing about any other uses. What kind of hardware did you choose? What software are you using? If you did it in-house, what language(s) did you use? Any ideas/help would be great. I saw an article a couple of months back about one of the harvard libraries using a Microsoft Surface: http://osc.hul.harvard.edu/liblab/proj/wolbach-user-experience-lab (I took note, as the pictures of the sun are from the Solar Dynamics Observatory's AIA telescopes) I'm guessing it's out of the price range for most of us, though. -Joe - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] barcode scanner with memory
On Jan 30, 2012, at 1:37 PM, Adam Wead wrote: Hi all, Can anyone recommend a barcode scanner wireless or otherwise that saves barcodes to internal memory, to be downloaded to a computer later? We have patrons scan their ids as they enter to keep track of statistics. I've created some software that does this, with a regular barcode reader, but the problem is the window has to be in focus the whole time and the terminal is used by a security guard who has to do other things at the same time. So, I need some kind of hands-off solution and preferably something involving the least amount of work from me... any ideas? I have an older Intelliscanner Mini that would fit the bill ... The earlier model holds about 300 barcodes before you have to dump it; I don't know what the memory limits of the current one are. http://www.intelliscanner.com/products/mini/ You're supposed to use their software, but from what I remember, I was able to get it to dump to other programs (it acted as a keyboard, typing in the values, with line returns between each value). ... Looking at their website, it looks like if you want to export to other software, they recommend the 'Intelliscanner SOHO' (more expensive) model: http://www.intelliscanner.com/products/soho/ -Joe
Re: [CODE4LIB] My crazed idea about dealing with registration limitations
On Dec 23, 2011, at 12:15 PM, Susan Kane wrote: [trimmed] You could repeat the conference at a totally different time of year ... everyone who didn't get in is automatically registered for the second conference later that year ... kinda wacky but ... You could plan for a second conference of the same size in the same city (different hotel). After presentations for C4L1 are finalized, presenters are sought on similar topics for C4L2. Overflow registrations for C4L1 automatically go to C4L2. Similar content means that institutions who paid for you to come to learn about X will hopefully not be upset if you learn about X from a different person across the street. Everyone hangs out informally during off-presentation times. One could call that tracks but I'm trying for more of a mirror download site concept. [trimmed] For some reason, this jogged my memory -- The DC-IA (Information Architecture) group used to hold an meeting after the IA Summit to basically recap what was discussed at the IA Summit. (I think they called it the 'IA Redux') As there was more than one track, it allowed people who did go to the summit to hear more about the other presentations they missed, and for those who didn't go at all, it gave them a chance to at least hear second-hand what was discussed. Obviously, it wasn't nearly as complete as the original, and lost some in translation, but I found it to be informative. Particularly when you consider the proposal to limit the number of attendees from one organization, this means that you spread the number of attendees out, who can then spread the gospel to the others that weren't able to attend. Now, I'm not saying that people have to go out and take copious notes and then try to get them into some format for dissemination (I did that for the last RDAP meeting ... it's a lot of work trying to get 'em into a format that others might understand), but if you get a few people together who were at the meeting, and they can talk about what they thought was interesting (possibly referring to notes they might've jotted down), and that often spurs interesting discussions in itself. -Joe ps. as an example of understandability, compare: http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_notes.txt http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_report.html (and I took the original notes by hand, not typed, so I was spending my nights at the meeting typing, then making 'em understandable for the next week or so)
Re: [CODE4LIB] automatic greeking of sample files
On Dec 12, 2011, at 3:06 PM, Brian Tingle wrote: On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein mbkl...@gmail.comwrote: Here's a snippet that will completely randomize the contents of an arbitrary string while replacing the general flow (vowels replaced with vowels, consonants replaced with consonants (with case retained in both instances), digits replaced with digits, and everything else is left alone. https://gist.github.com/1468557 https://gist.github.com/1468557 I like the way the output looks; but one problem with the random output is that the same word might come out to different values. The distribution of unique words would also be affected, not sure if that would impact relevance/searching/index size. Also, I was sort of hoping to be able to have some sort of browsing, so I'm looking for something that is like a pronounceable hash one way hash. Maybe if I take the md5 of the word; and then use that as the seed for random, and then run your algorithm then NASA would always hash to the same thing? If the list of missions / agencies / etc is rather small, it'd be possible to just come up with a random list of nouns, and make a sort of secret decoder ring, assigning each mission name that needs to be replaced with a random (but consistent) word. I just tend to replace all of my mission / spacecraft / instrument acronyms with 'BOGUS' when I have to do similar stuff to generate records when we're testing data systems, but I tend to just have the acronyms, not the full spelled out names (which are looked up from the acronyms), and I don't have large amounts of free text to worry about. -Joe
Re: [CODE4LIB] Pandering for votes for code4lib sessions
On Dec 1, 2011, at 8:47 AM, Ross Singer wrote: As unwilling commissioner of elections, I'm shocked, SHOCKED, I say, to hear of improprieties with the voting process. It could be worse ... I'm an unwilling elected official. (and the re-election for my third term is next month ... anyone want to move to Upper Marlboro, MD, so they can run against me? I think you still have about a week to make the 30 day residency deadline) (maybe 'unwilling' is the wrong word, before this shows up in the local newspaper ... I'll do it, but I think someone with more free time to commit might be able to do a better job) That said, I'm not shocked (and we've seen it before). I am absolutely opposed to: 1) Setting weights on voting. 0 is just as valid a vote as 3. 2) Publicly shaming the offenders in Code4Lib. If you run across impropriety in a forum, make a friendly, yet firm, reminder that ballot stuffing is unethical, undemocratic and tears at the fabric that is Code4Lib. Sometimes it just takes a simple reminder for people to realize what they're doing is wrong (it certainly works for me). 3) Selection committees. We are, as Dre points out, anarcho-democratic as our core. anarcho-bureaucratic just sounds silly. It'd be (anarcho-)?republican, as you'd have a smaller body that's appointed or elected to make the decisions. This current situation is largely our doing. We even publicly said that getting your proposal voted in is the backdoor into the conference. The first allotment of spaces sold out in an hour. This is, literally, the only way that a person that was not able to register and is buried on the wait list is going to get in. And we've basically told them that. Perhaps if registration were done after the talk selection, this wouldn't be a problem? Or some sort of lottery, rather than first-come-first served? ... and the real way to ensure a slot is to help with the conference planning ... if you've agreed to man the table where people get their badges, they normally let you come. One thing I would be open to is to put a disclaimer splash page before any ballot (only to be seen the first time a person votes) briefly explaining how the ballot works and to mention that ballot stuffing is unethical, undemocratic and tears at the fabric that is Code4Lib or some such. I would welcome contributions to the wording. What would people think about that? I'd like to know if this is even a problem -- is there some way to tell if we have people who only voted for one paper? (although, just putting that as a restriction just makes 'em likely to vote for a few random ones, which really does taint the whole process) -Joe
Re: [CODE4LIB] Pandering for votes for code4lib sessions
On Dec 1, 2011, at 10:29 AM, Ross Singer wrote: On Thu, Dec 1, 2011 at 10:09 AM, Richard, Joel M richar...@si.edu wrote: I feel this whole situation has tainted things somewhat. :( Let's not blow things out of proportion. The aforementioned wrong-doing actually seems pretty innocent (there is backstory in the IRC channel, I'm not going to bring it up here). There is a valid case for advertising interest in your talks (or location, or t-shirt design, etc.), especially in an extremely crowded field, and we've never explicitly set a policy around what is appropriate and what isn't. I think a simple edit on the part of the accused would clear up any ambiguity of intention. Our one known incident was handled privately, but didn't really cause us to address the potential for impropriety. We seem to have quite a bit of support for the splash page. If people will help me draft up the wording -- ideally something we can point to when we want to guide people in the right direction in other forums -- I think we can put this issue to bed. It depends on how harsh you want be ... I mean, if you're on the fence about ballot stuffing, you could go with something like: When voting, we expect you to actually read through the list, and pick the best ones. So yes, go ahead and vote for your friends and colleagues, but also read through the others to find other equally good proposals. -Joe
Re: [CODE4LIB] server side vs client side
On Dec 1, 2011, at 12:49 PM, Nate Hill wrote: As I was struggling with the syntax trying to figure out how to use javascript to load a .txt file, process it and then spit out some html on a web page, I suddenly found myself asking why I was trying to do it with javascript rather than PHP. Is there a right/wrong or better/worse approach for doing something like that? Why would I want to choose one approach rather then the other? As always, apologies if I'm asking a terribly basic question. There's different advantages to each side: JavaScript / JScript / ECMAScript / client side: Scales better (as the clients do their own work) More obnoxious to maintain (as different browsers may have slightly different implementations) Less reliable (I keep mine turned off on my main browser) Better detection of client features (you can always lie in a browser string, or just not send it) May require extra layers of abstraction (APIs that then require extra taint checking) More responsive for simple operations (if doesn't need remote calls) Easier to do some tasks PHP / ColdFusion / CGI / ASP / server side : You can be assured that you know it's working, and error reports when it's not (assuming you log check your logs) the inverse of all of the ones in the 'client side' section (but the inverse of 'Easier to do some things' is till 'Easier to do some things') I'm not going to make any claims about speed, as it's frequently dependent on bandwidth/latency. (if I can send data to the client on a slow link, and have them build the structures around it, it might be faster than my doing it server side, and more so if my server gets bogged down) For some tasks, I'll do it both ways. Eg, form input validation -- Once in javascript, so they get the warning *before* the submit the form, and again on the server side, in case they have javascript off or are being malicious. -Joe
Re: [CODE4LIB] Plea for help from Horowhenua Library Trust to Koha Community
On Nov 23, 2011, at 12:17 PM, Robert Sanderson wrote: LibLime A Division of PTFS, Inc. Main Office 11501 Huff Court North Bethesda, Maryland 20895 tel: (301) 654-8088 Ext. 127 fax: (301) 654-5789 email: kohai...@liblime.com Twitter: @liblime How about we all contact them? ;) Our contacting them isn't as effective as their customers contacting them. You can get a list of known Koha installations from lib-web-cats: http://www.librarytechnology.org/map.pl?ILS=Koha Which lists over 1200 sites ... the Library Journal, when they covered the purchase of LibLime last year, only mentioned that they had about 1/2 of those (140 libraries thought PTFS, 500 from LibLime): http://www.libraryjournal.com/article/CA6714841.html Although, I don't know if the lib-web-cats is libraries, or whole library systems. You could get specific names of LibLime customers by looking through their website for testimonials scattered on the site, or get their more recent clients through the press releases in their 'news' feed: http://www.liblime.com/news- -Joe
Re: [CODE4LIB] Citation Analysis - like projects for print resources
On Nov 17, 2011, at 12:09 PM, Miles Fidelman wrote: Matt Amory wrote: Is anyone involved with, or does anyone know of any project to extract and aggregate bibliography data from individual works to produce some kind of most-cited authors list across a collection? Local/Network/Digital/OCLC or historic? Sorry to be vague, but I'm trying to get my head around whether this is a tired old idea or worth pursuing... Sounds like you're describing citeseer - http://citeseerx.ist.psu.edu/ - it's a combination bibliographic and citation index for computer science literature. It includes a good degree of citation analysis. Incredibly useful tool. Another recent project (that I haven't had a chance to play with yet) is Total Impact : http://total-impact.org/about.php It's from some of the folks in altmetrics, who are trying to find better bibliometrics for measuring value: http://altmetrics.org/manifesto/ I don't see a list of what they're scraping I think they're using the publisher's indexes, PubMed and other databases rather than parsing the text themselves ... but the software's available, if you wanted to take a look. Or you could just ask Heather or Jason, they're both approachable and always eager to talk, when I've run into them at meetings. I also seem to remember someone at the DataCite meeting this summer who was involved in a project to parse references in papers ... unfortunately, I don't have that notebook here to check ... but I *think* it was John Kunze. (and I don't think it was part of the person's presentation, but something that I had picked up in the Q/A part) -Joe
Re: [CODE4LIB] Hotel registration - This was a test, right?
On Nov 16, 2011, at 5:02 PM, Cary Gordon wrote: I just registered for an overflow block room at https://resweb.passkey.com/Resweb.do?mode=welcome_gi_newgroupID=7466136 I noticed when I got to the Guest Details page that there was a checkbox in the Contact Information block -- Yes, I'd like to be notified… -- which was checked (not surprising) and not changeable (that was surprising). Peeking at the code, I noticed that the form tag had the words, checked and disabled. Now, since nobody could be so slimy as to do this intentionally (right?). I helped them out by using my in-browser editor to correct this oversight, because I wouldn't want them to waste electrons sending me email that I don't want. Unless they're doing something to un-disable the form when you submit, there shouldn't be an issue in most browsers, as the 'disable' also implies 'don't bother sending when submitting'. It's implied in the HTML4 spec, but I don't know if it's required behavior: http://www.w3.org/TR/html4/interact/forms.html#h-17.12 Now, if they had set it 'readonly', then yes, you should worry. Or that whoever made the form doesn't know what they're doing, and as I've often found out, those people seem to get paid way more money than I do even though they're clueless. -Joe
Re: [CODE4LIB] Looking for products/price ranges for a database of performers
On Sep 6, 2011, at 7:20 PM, Heather Rayl wrote: ** apologies for cross-posting ** Hi there, We have a database of performers that we use in our libraries. Currently, the data is stored on one person's computer in a file maker pro db that only this one person has access to (Hooray for legacy systems!). In order for the rest of the staff to have access to the performer listings, this one person runs yearly reports and they are posted on the staff intranet in a rather unwieldy series of pdf documents for staff to browse. For a sense of scale, we have over 80 libraries, about probably around 300-400 staff people accessing these documents, and there are probably around 400 or so performers in the database. Clearly, we need a new system of managing these performers!! [trimmed] So here's what we're grappling with: 1. We can purchase a product that would give us the framework to do this. I realize that something like a wiki would let us do some of these things, but really we are rather freaky about our content control, and a wiki is just too free-wheeling! 2. We can hire a developer/programmer to design a custom solution for us. So my questions for the list are: 1. do you know of any products that do what we want? FileMaker. The more recent versions have a 'instant webpage' option: http://www.filemaker.com/products/filemaker-pro/web-publishing.html If you're expecting a lot of traffic, you'll want to go to FileMaker Server, or Server Advanced: http://www.filemaker.com/products/filemaker-server-advanced/ I admit, I haven't used any of the versions since they've added this feature ... my FileMaker experience is 10+ years old at this point, so I don't know how much work 'instant' is. I believe they offer 30 day free trials on all of their software these days, so you might be able to download it and see what it can do. 2. if we were to hire someone, how much is a reasonable fee - we have some money in our budget, but we don't really know what a real person would charge for this, and if the money in our budget would cover it. And I don't want to go through writing an RFP for it if in the end we won't be able to afford it anyway. As for cost, it varies widely. Part of the issue is how the data's currently structured, and if you're going to keep the same structure, or change it as part of the re-design. FileMaker had some fields that were basically enums, and so the database handled what you'd have to do in most RDBMSes as a lookup table. As strange as it sounds, someone who is more skilled at this might actually do it more cheaply than someone of moderate skill, because they can get it done quickly, even at a higher per-hour rate, it's going to be cheaper ... but I'd still try to get them to bid for the project, not per hour, as you don't want someone who's going to vastly under- estimate the hours, then end up billing you 2-3x their estimate ... of course, bids for the whole project means they have to pad it out some, so it'll seem higher up front, but it'll likely be lower in the end. Of course, you also risk someone who underestimates the work, bids it out, but then gets in so far over their head that they give up, and you never see anything ... so I'd recommend checking references, to try to mitigate this problem. Working with a company rather than an independent person usually helps with this case, as they don't want the bad reputation from something like this happening ... and they can throw extra people at it to get it done and out of their hair -Joe
Re: [CODE4LIB] memory management for grownups
On Aug 30, 2011, at 3:55 PM, Simon Spero wrote: On Tue, Aug 30, 2011 at 12:56 PM, Ken Irwin kir...@wittenberg.edu wrote: I have a feeling it may be time for me to learn some grown-up programming skills, and I hope someone here might be able to help. [trimmed] Sometimes it can make sense to use the database to do the aggregation; e.g. CREATE TABLE Summary AS SELECT inst,patron_type,item_barcode,min(date) first, min(call_no),min(renewals) min_renewals, max(renewals) max_renewals FROM Renewals GROUP BY inst,patron_type,item_barcode; Wow ... I didn't realize I was that asleep. No wonder I had such an unproductive day. I'll second Simon's recommendation. There's no reason to pull this into PHP if you can do it all in the database, which is quite likely based on what was described. -Joe
Re: [CODE4LIB] internet explorer and pdf files
On Aug 29, 2011, at 3:30 PM, Eric Lease Morgan wrote: I need some technical support when it comes to Internet Explorer (IE) and PDF files. Here at Notre Dame we have deposited a number of PDF files in a Fedora repository. Some of these PDF files are available at the following URLs: * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832898/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:999332/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832657/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1001919/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:832818/PDF1 * http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:834207/PDF1 Retrieving the URLs with any browser other than IE works just fine. Unfortunately IE's behavior is weird. The first time someone tries to load one of these URL nothing happens. When someone tries to load another one, it loads just fine. When they re-try the first one, it loads. We are banging our heads against the wall here at Catholic Pamphlet Central. Networking issue? Port issue? IE PDF plug-in? Invalid HTTP headers? On-campus versus off-campus issue? Could some of y'all try to load some of the URLs with IE and tell me your experience? Other suggestions would be greatly appreciated as well. I don't have IE to test from, but it's been my experience that in past versions of IE, it would use the file's extension no matter what the mime-type sent was. I'd first see if you can trick IE ... it looks like Fedora doesn't like you sending extra stuff in PATH_INFO, so you might have to abuse QUERY_STRING for this: http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1/?filename.pdf http://fedoraprod.library.nd.edu:8080/fedora/get/CATHOLLIC-PAMPHLET:1000793/PDF1/?file=filename.pdf If either of those work fine in IE, but the first one doesn't, that's the problem. I don't know what's possible in Fedora, so I don't know if it's possible to do some URL re-writing so it'd always serve something that IE accepts as a PDF. If you could insert an extra HTTP header, you might be able to trick it with Content-Disposition, but that'll also tell some browsers to download the file rather than display it themselves: http://www.ietf.org/rfc/rfc2183.txt -Joe
Re: [CODE4LIB] internet explorer and pdf files
On Aug 29, 2011, at 3:52 PM, Godmar Back wrote: Earlier versions of IE were known to sometimes disregard the Content-Type (which you set correctly to application/pdf) and look at the suffix of the URL instead. For instance, they would render HTML if you served a .html as text/plain, etc. You may try creating URLs that end with .pdf Separately, you're not sending a Content-Length header: HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Pragma: No-cache Cache-Control: no-cache Expires: Wed, 31 Dec 1969 19:00:00 EST Content-Type: application/pdf Date: Mon, 29 Aug 2011 19:47:27 GMT Connection: close Length: unspecified [application/pdf] which disregards RFC 2616, http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13 RFC2616 says 'SHOULD' for that section. HTTP/1.1 clients *must* support chunked encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding (which is why any time I write an HTTP client, I always claim to be HTTP/1.0, so I don't have to support it) If the data's stored on disk compressed, and being decompressed on the fly, it's pretty typical to not send Content-Length. (although, you could argue that they should save it when storing the value, so it's available when serving without needing to decompress first). -Joe
Re: [CODE4LIB] Apps to reduce large file on the fly when it's requested
On Aug 3, 2011, at 7:36 PM, Ranti Junus wrote: Dear All, My colleague came with this query and I hope some of you could give us some ideas or suggestion: Our Digital Multimedia Center (DMC) scanning project can produce very large PDF files. They will have PDFs that are about 25Mb and some may move into the 100Mb range. If we provide a link to a PDF of that large, a user may not want to try to download it even though she really needs to see the information. In the past, DMC has created a lower quality, smaller versions to the original file to reduce the size. Some thoughts have been tossed around to reduce the duplication or the work (e.g. no more creating the lower quality PDF manually.) They are wondering if there is an application that we could point to the end user, who might need it due to poor internet access, that if used will simplify the very large file transfer for the end user. Basically: - a client software that tells the server to manipulate and reduce the file on the fly - a server app that would to the actual manipulation of the file and then deliver it to the end user. Personally, I'm not really sure about the client software part. It makes more sense to me (from the user's perspective) that we provide a download the smaller size of this large file link that would trigger the server-side apps to manipulate the big file. However, we're all ears for any suggestions you might have. I've been dealing with related issues for a few years, and if you have the file locally, it's generally not too difficult to have a CGI or similar that you can call that will do some sort of transformation on the fly. Unfortunately, what we've run into is that in some cases, in part because it tends to be used by people with slow connections, and for very large files, they'll keep restarting to the process, and because it's a generated on-the-fly, the webserver can't just pick up where it left off, so has to re-start the process. The alternative is to write it out to disk, and then let the webserver handle it as a normal file. Depending on how many of these you're dealing with, you may have to have something manage the scratch space and remove the generated files that haven't been viewed in some time. What I've been hoping to do is: 1. Assign URLs to all of the processed forms, of the format: http://server/processing/ID (where 'ID' includes some hashing in it, so it's not 10mil files in a directory) 2. Write a 404 handler for each processing type, so that should a file not exist in that directory, it will: (a) verify that the ID is valid, otherwise, return a 404. (b) check to see if the ID's being processed, otherwise, kick off a process for the file to be generated (c) return a 503 status. Unfortunately, my initial testing (years ago) suggested that no clients at the time properly handled 503 requests (effectively, try back in (x) minutes, and you give 'em a time) The alternative is to just basically sleep for a period of time, and then return the file once it's been generated ... but for ones that take some time (some of my processing might take hours, as the files that it needs as input are stored near-line, and we're at the mercy of a tape robot) ... You might also be able to sleep and then use one of the various 30x status codes, but I don't know what a client might do if you returned the same URL. (they might abort, to prevent looping) -Joe
Re: [CODE4LIB] Advice on a class
On Jul 26, 2011, at 3:31 PM, Lepczyk, Timothy wrote: Thanks everyone. The reasons I thought of taking the C course is a) it's free, b) concepts might be transferrable to other languages. I may continue to focus on Ruby on Rails. Before everyone manages to scare you away from learning C, if you're going to be doing a lot of programming, it's useful to learn other languages so you can see how they handle different tasks. C is particularly useful as a lot of other language's implementations were primarily written in C. In college, I took a 68k assembly course ... I've never done *any* assembly since then, but it makes you appreciate the issues in optimization, and just how low-level you need to get when talking to processors. With C, pointers and pointer arithmetic are a bit of a pain, and strongly-typed languages aren't the greatest for all tasks ... and don't get me started on C-strings ... but you'll learn a lot ... even just where to look for people screwing up their assumptions creating security problems because of off-by-one issues or screwing up the length of their strings or neglecting their garbage collection. ... and, understanding C will also help you when it comes time to install stuff, especially if you're trying to port someone's linux-centric code to Solaris or MacOS. As for the stuff that translates: searching for the missing semi-colon error messages that make no sense finding the 'smart quote' that your lab partner pasted in because they do their editing in MS Word. um ... I'm not selling this very well, am I? Anyway ... C is a useful language ... almost all higher languages have some way of binding to C code, and if nothing else, learning it means you'll be able to port over someone's 1k line C program into 20 to 40 lines of whatever other modern language you prefer. -Joe
Re: [CODE4LIB] REST interface types
On Jul 19, 2011, at 11:33 AM, Ralph LeVan wrote: Where at all possible, I want a true REST interface. I recognize that sometimes you need to use POST to send data, but I've found it very helpful to be able to craft URLs that can be shared that contain a complete request. But there's more than just the URL that's used in REST. As it uses HTTP, you could vary the response based on the Accept-Language or Accept headers. Some implementations use file extensions in place of Accept, but then you're assigning URIs to the container and not the contents. Am I trying to identify the data, or the data formatted as XML? Language is a bit messier, as it's the content, but when we're looking up something like in Dewey ... are we trying to identify the DDC 600s, or specifically the German labels for the 600s? Dewey.info packs it in the URL: http://dewey.info/class/6/2009/03/about.de But am I supposed to know that the english and french don't share the same root as the german? http://dewey.info/class/6/2009/08/about.en http://dewey.info/class/6/2009/08/about.fr Some groups will then put this in either as part of the QUERY_STRING or pass it in via PATH_INFO. -Joe
Re: [CODE4LIB] TIFF Metadata to XML?
On Jul 19, 2011, at 10:34 AM, Stern, Randall wrote: Also, see FITS (http://code.google.com/p/fits/) FITS is an open source java toolset we wrote that wraps JHOVE, ExifTool, and several other format analysis tools and produces a single XML output stream. It also includes a crosswalk to MIX XML as an optional output. Really? You named a tool that deals with image data 'FITS' ? You do realize there's actually a 30+ year old image standard called FITS: http://fits.gsfc.nasa.gov/ (which has its own metadata standard, just to make things even more interesting) -Joe
Re: [CODE4LIB] TIFF Metadata to XML?
On Jul 18, 2011, at 9:18 AM, Edward M. Corrado wrote: Hello All, Before I re-invent the wheel or try many different programs, does anyone have a suggestion on a good way to extract embedded Metadata added by cameras and (more importantly) photo-editing programs such as Photoshop from TIFF files and save it as as XML? I have 60k photos that have metadata including keywords, descriptions, creator, and other fields embedded in them and I need to extract the metadata so I can load them into our digital archive. Right now, after looking at a few tools and having done a number of Google searches and haven't found anything that seems to do what I want. As of now I am leaning towards extracting the metadata using exiv2 and creating a script (shell, perl, whatever) to put the fields I need into a pseudo-Dublin Core XML format. I say pseudo because I have a few fields that are not Dublin Core. I am assuming there is a better way. (Although part of me thinks it might be easier to do that then exporting to XML and using XSLT to transform the file since I might need to do a lot of cleanup of the data regardless.) Anyway, before I go any further, does anyone have any thoughts/ideas/suggestions? I haven't (yet) used it myself, but Exiv2 ( http://www.exiv2.org ) supports reading and writing XMP, EXIF and IPTC metadata from a large number of file formats. -Joe
Re: [CODE4LIB] Trends with virtualization
On Jul 11, 2011, at 11:21 AM, Madrigal, Juan A wrote: Its true what they say, history does repeat itself! I don't see how virtualization is much different from a dummy terminal connected to a mainframe. I'd hate to see an entire computer lab go down should the network fail. The only real promise is for making web development and server management easier. re: web development I assume by that you're talking about cases like Citrix, where they force you to come in from the same OS web browser version, so they don't have to worry about Firefox rendering differently from Safari, or the IE6 vs. 7, etc. It's okay for an intranet, but I don't know that it's a good idea for general web usage, as they normally force people to use some outdated browser, as the web applications always seem to be designed for IE6, and never tested on anything else. (if they were, they then try to serve down alternative versions using browser detection, which in my experience is more likely to make things worse) ... The only reason I've heard to virtualize desktops wasn't for monetary considerations, and wasn't for general word processing and such ... it was for workstations for scientific processing. By using virtualized servers, you can more easily take snapshots of the machine's state to archive it, and later restore it to re-run the software. This gives you two advantages: (1) reduced down-time for patching / upgrading software -- you patch the image, then push the image into the processing pipeline. (2) Because you've archived the OS, libraries and all software, you have something you can analyze should someone identify problems with the data processing such as discontinuities after an update. I could see the first one being useful for most groups, but with tools like puppet and chef, it might not be a big deal. I can't remember what the software was that the university I formerly worked for used in their computer labs -- it basically reset the machine on each login, in hopes to prevent someone from installing malware (intentionally or accidentally) that would then affect later users. And then once a week each lab was closed down so they could do a complete re-format and re-image of each machine ... you might be able to do something similar with virtual desktops. -Joe
Re: [CODE4LIB] exposing website visitor IP addresses to webcrawlers
On May 20, 2011, at 10:35 AM, Keith Jenkins wrote: Just out of curiosity, does anyone on this list have any opinions about whether website owners should publicly post lists of their visitors' IP addresses (or hostnames) and to also allow such lists to be indexable by search engines? For example: https://www3.ietf.org/usagedata/site_201104.html Keith Somehow I missed this when it went by originally ... For websites being hosted by the federal government, although it's not considered PII (Personally Identifiable Information), most privacy policies state that we won't share information with third parties, and that we only use server logs for diagnostics and tuning. We're actually required to destroy our webserver logs within 30 days of rolling them, or at the very least, anonymize them. We specifically do *not* allow access logs or reports to be accessed from outside our local network. If nothing else, posting logs and/or reports invites 'referrer spam' : http://en.wikipedia.org/wiki/Referrer_spam And even if you're not posting referrer information, they'll embed it in the QUERY_STRING to connections to your site, so you'll have requests for: http://yoursite.example.edu/?http://spammer.example.com Which show up in most logs as: /?http://spammer.example.com ... I'd say there is *no* reason to make any of your logs, raw or processed, visible to search engines. If your administration insists on being able to see reports remotely, put them behind some sort of authentication. (although, in our case, authentication means more paperwork we have to fill out) -Joe
Re: [CODE4LIB] Jpeg2000 and XMP metadata
On Mar 23, 2011, at 9:45 AM, Richard, Joel M wrote: Morning, all! I thought I'd crowdsource this question. 8+ hours of beating up on this and I haven't found a good solution. We have some software that processes the scanned pages of a book. They come to me as TIFF and I am converting to JP2 in order to upload to the Internet Archive. The trouble is that I can't find a reliable piece of code or a process to add XMP metadata to the JP2. (FWIW, we're using the Jasper library) - ImageMagick (PHP+Imagick) doesn't seem to support XMP in JP2 (or adding profiles to JP2 at all) - GraphicsMagick crashes with malloc errors on images that are too big, and I am unwilling to recompile to 64-bit and simply hope for the best. Our images are large, though, and something is dying between GM and Jasper. - exiftool doesn't seem to be working either. I'm working in PHP, so that would be a preferred language. If necessary I can always drop back to the command line to run a script or whatever. Is anyone else doing this type of thing? Any help or advice would be most welcome. I've never used it, but exiv2 claims to support JP2 XMP writing: http://www.exiv2.org/ (not PHP directly, but could be called via the shell) -Joe
Re: [CODE4LIB] Need Apache log file analyzer for Mac OSX
On Mar 17, 2011, at 1:16 PM, Tim McGeary wrote: Does anyone know of a good (and free) Apache log file analyzer for Mac OSX? I have sets of Apache web logs that I need to analyze off server. I've been using analog for years: http://www.analog.cx/ The config syntax takes a little getting used to, but it generates HTML reports for just about anything. I also know people who are fans of webalizer, but I don't like how it only gives a month's report at a time: http://www.mrunix.net/webalizer/ -Joe
Re: [CODE4LIB] online course on the semantic web?
On Mar 5, 2011, at 3:01 PM, Cindy Harper wrote: Well, I just walked my 80-year-old mother through setting up her wireless router and wireless on her desktop and laptop via telephone NY-to-VA, and now I feel like I can think about another challenge for the coming season(s). Does anyone know of a good online course that's an introduction to semantic web technology that they could recommend? My goals are simply to understand more and be able to code a little, and afterward applying it to linked data? I know of one course this summer at Johns Hopkins Engineering for Professionals program http://ep.jhu.edu/course-homepages/viewpage.php?homepage_id=2993, but it's rather pricey. Anyone know of cheaper options or creative ideas for funding? I don't know how introductory it'd be, but ASIST has been doing a lot of 'webinars' this year, and there are ones coming up on the 9th and 13th on linked data, and the first one sounds like it'll cover some semantic web issues:: http://asis.org/Conferences/webinars/2011/linked-data.html (I can't compare prices to the JHU one, as I didn't see any pricing on the JHU site; this round of ASIST webinars are $25 for members, $59 for non-members; some in the past have been free for ASIST members) Also, looking at MIT's Open Courseware catalog, I see a few individual lessons that might be applicable: http://ocw.mit.edu/index.htm In the past, I've looked at some of the courses from W3schools (not affiliated with W3C, but has some tutorials on various things related to the web). They tend to be fairly introductory, but they have two that might be of interest: http://www.w3schools.com/rdf/default.asp http://www.w3schools.com/semweb/default.asp -Joe - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] online course on the semantic web?
On Mar 5, 2011, at 3:40 PM, Cindy Harper wrote: Now that I think about it, this may be an opportunity to apply another idea that I was exploring in another context: I had written to syslib-l looking for anyone interested in collaborating on a staff technology training wiki that would link staff to free and authoritative web-based resources on a range of technology training subjects. Would anyone be interested in applying that idea to code4lib technology learning? How much effort would be required for someone who's well acquainted with the Semantic Web to contribute to a site that lists texts or curriculum for those who are interested in learning? I don't know if this is doable. Anyone interested? Or should I just find myself a text and wade through it? I want to say that I remember someone presenting on some sort of modular courses to either be used as part of a library, museum or comp sci curriculum to deal with digital archives. I want to say it was IMLS funded. Basically, it was so that faculty could pick choose different courses to use as a basic course on the topic. I think I found the correct panel, but I'm not sure who it was who presented on that particular topic. (I was sick kept myself drugged up on dayquil for that whole meeting) http://www.ils.unc.edu/digccurr/asist2009_panel_paper.pdf Um ... I think this is the project, Digital Library Curriculum Project! (NSF funded, not IMLS, though): http://curric.dlib.vt.edu/ Unfortunately, it doesn't look like they (yet) have anything on the Semantic Web, but I think there's a lot of overlap with what you're proposing. -Joe - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] Apache URL redirect
On Feb 3, 2011, at 4:42 PM, Nate Hill wrote: Hi - I'm new to Apache and hope that someone out there might be able to help me with a configuration issue over at San Jose Public Library. I need to have the URL www.partnersinreading.org redirect to http://www.sjpl.org/par Right now if you go to www.partnersinreading.org it takes you to the root, sjpl.org and then if you navigate through the site all the urls are rewritten with partnersinreading as the root. That's no good. I went into Apache's httpd.conf file and added in the mod_alias area: Redirect permanent http://www.partnersinreading.org/ http://www.sjpl.org/par But the argument to match for redirecting is the local path, not the URL, so you'll have to either do some environmental matching, or put it in a virtual host block I'm used to mod_rewrite, so I'd probably do something like: RewriteCond %{HTTP_HOST} ^www\.partnersinreading\.org$ RewriteRule ^/(.*) http://www.sjpl.org/par/$1 [L,R=301] (that assumes that you've replicated the directory structure on the new site) - Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] Apache URL redirect
On Feb 3, 2011, at 5:21 PM, Nate Hill wrote: Thank you for your responses... Virtual host setup was also on the agenda, guess both things have to happen at the same time. You don't have to set up virtual hosts with the method that both Brian and I mentioned, although the syntax is a little more confusing for people who might not be used to mod_rewrite. You only need virtual hosts if you want your method (Redirect) to work: NameVirtualHost * VirtualHost * ServerName www.partnersinreading.org DocumentRoot /set/to/point/somewhere/ Redirect permanent / http://www.sjpl.org/par /VirtualHost -Joe
[CODE4LIB] Job Opportunity for Web Developer Focusing on Linked Data (fwd)
I'm just passing this along ... I know nothing about the actual job. The bad formating of the message is probably my fault -- I prefer plain text email, and that can sometimes do interesting things to messages. (random unknown characters, etc.) If you have questions, I'd suggest contacting Gail Hodge (address below) -Joe -- Forwarded message -- Date: Wed, 22 Dec 2010 22:01:39 -0500 From: Gail Hodge gho...@iiaweb.com To: onei...@grace.nascom.nasa.gov Subject: Job Opportunity for Web Developer Focusing on Linked Data Dear Joe, We're looking for a Web Developer to focus on Linked Data and related applications. Would you or anyone you know be interested? Could you pass this around to some relevant lists? Rob Raskin has already sent it to the ESIP SW list and I'll send it to SIG STI. Thanks, Gail *Web Developer ? Linked Data * Information International Associates, Inc. (IIa), an award-winning information and knowledge management company, is seeking a Web Developer. This position involves the ongoing website and application development, maintenance, database development, and implementation for various website applications for a variety of US Federal government agencies. The applications will specifically focus on new and emerging technologies such as linked open data, semantic technologies, ontologies, RDF and RDFa as applied to both text and data, and Web 2.0 and social media applications such as RSS and Twitter. Depending on the location of the successful candidate, the home office may be located in Hyattsville, MD or Falls Church, VA. Responsibilities: ? Designing/developing websites based on needs analysis and scope of work. ? Designing/developing the necessary back-end database and necessary SQL calls and web services for the applications. ? Deploying applications. ? Designing, creating and deploying linked open data applications, including mash-ups. ? Creating and managing triple stores based on existing relational databases. ? Designing, developing and deploying Web 2.0 and social media applications. ? Continued maintenance, development, and troubleshooting for applications. ? Documenting the website code and applications. Requirements: ? Bachelor of Science Degree in Computer Science or other related field(s). ? Extensive knowledge of HTML, JavaScript, DHTML, PHP. ? Knowledge of linked open data and various semantic web technologies and standards, including RDF, RDFa, etc. Knowledge of URIs. ? Extensive knowledge of MySQL, SQL Server, ODBC. ? Knowledge of web services (SOAP and REST). ? Dreamweaver/similar development environments. ? Excellent verbal and written communications skills. Desired (not required) experience: ? Tomcat, Java, C#, .NET ? Oracle, Excel and MS Access To apply online please access IIa Careers at: https://www7.ultirecruit.com/INF1002/JobBoard/JobDetails.aspx?__ID=*3BAC1347B2106567
Re: [CODE4LIB] LDAP Issues
On Wed, 6 Oct 2010, Amy wrote: We are having a problem with a single student whose account was deleted from LDAP by Technology, and then had her account re-established. She has the same username and status as she used to have. She is now unable to login to any of the library resources that use LDAP to authenticate patrons. This includes our catalog e-resources (through III) and a Ruby on Rails group study room web application that uses LDAP authentication. Has anyone had any experiences like this before or any thoughts/speculation on how to fix? .. this is why it's a good idea to lock accounts for a period before they're deleted fully. But anyway ... LDAP's used for authentication, but what's used for authorization? (ie, we use a login password to confirm they're who they say they are, but what says that person's allowed to use the system?) Sometimes it's stored in a field withing LDAP, sometimes it's stored in a separate system with a foreign key into LDAP. (which *might* be the login / uid / cn (common name) / dn (distinguished name), etc.) I've seen a few systems that use an assigned ID as the user component of the DN, rather than the UID / login, so should the user ever need to change the name of the account (eg, they get a name change, and want to change their login), they don't have to re-authorize them in all of the systems. (of course, this means that a delete recreate, even with the same name has issues). If I were trying to debug it, I'd try to get an ldif dump of their entry, and compare that to someone created through 'normal' means, and see if there's anything that looks strange (missing fields, random serial numbers, something incremented (eg. John-Smith-2). -Joe
Re: [CODE4LIB] Workflow analysis of archival arrangement and description
On Fri, 27 Aug 2010, Mark A. Matienzo wrote: I'm currently looking for any workflow and business process analysis of the processes involved in processing archival collections. At this point, I'm hoping to find fairly high-level information, ideally in the form of or easily translatable into workflow diagrams to serve as a strawman. Processing manuals may help, but likely are too detailed for my current purposes. Please let me know if you have anything that might help. Have you already looked at OAIS? (Open Archival Information System) It's a reference model, so goes over the sorts of things that digital archives should do as they're ingesting / storing / disseminating things, but isn't a specific implementation. Current version (2002): http://public.ccsds.org/publications/archive/650x0b1.pdf There's also drafts as they're working towards making it an ISO standard. I'm not sure if this is the most recent version or not, but it matches the last draft ID (p-1-1) up for review on the CCSDS site: http://ddp.nist.gov/refs/650x0p11_OAIS_pink_book.pdf -Joe
Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)
On Thu, 24 Jun 2010, Simon Spero wrote: Haven't seen any concrete plans. I'm fine with the same plan as last year -- meet in front of RFD on Monday, but from your comments you'll be gone by then. [trimmed] If folks can make it out to McLean,VA on Saturday the 26th (Falls Church East/West are closest Metro), we have propane, with a chance of dead things, animal or vegetable. Minerals are available from the Department of the Interior through the usual procedures. Any interest? Interest, yes, but also a conflict due to family stuff so won't be able to make it. (and as I live in PG, and have to be in Howard county, it's basically on the other side of the world) -Joe
Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)
On Thu, 24 Jun 2010, Schwartz, Raymond wrote: What is RFD? A restaurant in DC. Here's what was sent out the last time: -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, June 18, 2007 11:33 AM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] Informal get together Monday of ALA Some of us have spontaneously decided to have an informal Code4Lib get together the Monday of ALA in DC. We will meet on Monday the 25th of June at 8pm, at RFD, which reccomended by anarchivist, which appears to be a pub and Washington's Largest Multi-Tap. It's located just a couple blocks from the convention center. http://www.lovethebeer.com/rfd.html Some of the Talis crew have said they will be there. I will be there. Anarchivist and edsu have said they'll be there. (I forget if I just made up edsu). Please join us! Any and everyone interested in meeting code4lib folks or other assorted library technologists and library geeks and hangers on are welcome. No, I wasn't planning on making a reservation or anything. No, I have no idea how we'll all find each other. I think it'll work out. Jonathan I assumed we'd go with the same ast last time -- Monday, 8pm, just show up and we'll figure something out. (it worked out okay last time). -Joe Oh ... and for those from Tucson, it might be the largest multi-tap in DC, but it's only like 1/2 of what 1702 has)
Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?)
On Thu, 24 Jun 2010, KREYCHE, MICHAEL wrote: Since no one else has asked, does Monday the 25th mean Monday (the 28th) or Friday (the 25th). Monday, the 25th, 2007. ('the last time' being the last time this was done, in 2007). So, the proposal is: Monday, June 25th, 2010 Meet in front of RFD at 8pm 801 7th St, NW., Washington, DC http://www.lovethebeer.com/rfd.html If it's 6 people and intimate, or closer to the 2 dozen we had last time, we'll make it work. -Joe -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Joe Hourcle Sent: Thursday, June 24, 2010 3:32 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Cookout in McLean on Saturday? (was Re: [CODE4LIB] Get together in DC during ALA?) On Thu, 24 Jun 2010, Schwartz, Raymond wrote: What is RFD? A restaurant in DC. Here's what was sent out the last time: -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Monday, June 18, 2007 11:33 AM To: CODE4LIB@listserv.nd.edu Subject: [CODE4LIB] Informal get together Monday of ALA Some of us have spontaneously decided to have an informal Code4Lib get together the Monday of ALA in DC. We will meet on Monday the 25th of June at 8pm, at RFD, which reccomended by anarchivist, which appears to be a pub and Washington's Largest Multi-Tap. It's located just a couple blocks from the convention center. http://www.lovethebeer.com/rfd.html Some of the Talis crew have said they will be there. I will be there. Anarchivist and edsu have said they'll be there. (I forget if I just made up edsu). Please join us! Any and everyone interested in meeting code4lib folks or other assorted library technologists and library geeks and hangers on are welcome. No, I wasn't planning on making a reservation or anything. No, I have no idea how we'll all find each other. I think it'll work out. Jonathan I assumed we'd go with the same ast last time -- Monday, 8pm, just show up and we'll figure something out. (it worked out okay last time). -Joe Oh ... and for those from Tucson, it might be the largest multi-tap in DC, but it's only like 1/2 of what 1702 has)
Re: [CODE4LIB] SMS headers in email-sms
On Wed, 9 Jun 2010, Ken Irwin wrote: We originally tried changing the From and Reply-To mail headers, but the phones we tested on didn't honor the email headers. Instead they show an address @www6.wittenberg.edu (ie, our web server). That's why I was thinking there would be some sort of SMS-equivalent-header that it cared about more. Are you changing the line 'From:' (the From header) or 'From ' (the envelope from, which is part of the SMTP protocol's routing, and *not* part of the e-mail message) I don't know if this will help or not, but it sounds like the -f flag is the way to go: http://stackoverflow.com/questions/179014/how-to-change-envelope-from-address-using-php-mail -Joe -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Thomas Bennett Sent: Wednesday, June 09, 2010 9:36 AM I don't know if this will be any help but you would need to replace the reply- to header I expect. Thomas
Re: [CODE4LIB] drupal question
On Fri, 4 Jun 2010, Nate Vack wrote: On Fri, Jun 4, 2010 at 2:02 PM, Jill Ellern ell...@email.wcu.edu wrote: I know we can put this open source software on a PC...and we've done that but this isn't a solution for a production level web service What is the average cost of hosting a drupal server out there in the cloud? Â Are there things we should know? Â Would you recommend anyone that does this for libraries? It all depends on what production level web service means to you -- do you get lots of traffic? A little? Do you want to call someone on the phone when it goes pear-shaped? Even if the problem is with a customization you're making? How much downtime is OK? How snappy does it need to be? It sounds a bit like your IT department is trying to give you a brush-off (This sounds like a pain. Let's make them use a dedicated server, and say it'll cost INFINITY DOLLARS.) Sitting down with someone, being very clear with your expectations for support, and finding out what their major concerns are might help. As someone who's worked as a sysadmin for an ISP, a university IT department and a government agency that is the target of a lot of intrusion attempts, let me tell you that it *is* a pain. For the first one. The incremental cost is insignificant, but each new piece of software that you have to support is yet another round of finding out how the server needs to be tuned ; if it plays well with other software you're running ; more websites to watch for security updates ; more patches to apply ; more log files to watch to suspicious activity. Once you're hosting 10+ of the same piece of software, the incremental cost is relatively insignificant -- but that first instance sure as hell is not cheap in terms of man-hours (if you don't want your machine getting hacked, then get your domain black listed when someone starts pumping mail through it, etc, etc.) Generally, hosting will run something like $5-10/month for cheap shared hosting, and maybe $30-40/month for a small VPS. Yes, for a site that already has lots of Drupal instances they're already maintaining. -Joe
[CODE4LIB] SRU 2.0 / Accept-Ranges (was: Inlining HTTP Headers in URLs )
On Wed, 2 Jun 2010, Jonathan Rochkind wrote: Joe Hourcle wrote: Accept-Ranges is a response header, not something that the client's supposed to be sending. Weird. Then can anyone explain why it's included as a request parameter in the SRU 2.0 draft? Section 4.9.2. They're not the only ones who think it's a client header: http://en.wikipedia.org/wiki/List_of_HTTP_headers (which of course shows up #1 on google for 'http headers') It looks like someone decided to split it into two tables: http://en.wikipedia.org/w/index.php?title=List_of_HTTP_headersoldid=183353617 And within a week, someone decided to add Accept-Ranges where it didn't belong: http://en.wikipedia.org/w/index.php?title=List_of_HTTP_headersoldid=184742665 ... I'm guessing it's a mistake -- either the SRU authors looked at the Wikipedia entry, or they also misread the intent of the HTTP header in the RFC. Do we have anyone affiliated with the project on this list who can make a correction before it leaves draft? -Joe
Re: [CODE4LIB] Inlining HTTP Headers in URLs
On Tue, 1 Jun 2010, Jonathan Rochkind wrote: Accept-Ranges, I have no idea, I don't understand that header's purpose well enough. But SRU also provides a query param for that, it seems less clear to me if that's ever useful or justifiable. Accept-Ranges is a response header, not something that the client's supposed to be sending. The client sends a 'Range' header (with an optional 'If-Range' if you're concerned with the resource having changed), and in response, the server sends a 206 status with a 'Content-Range' header. See http://labs.apache.org/webarch/http/draft-fielding-http/p5-range.html ... I only know of two values for 'Accept-Ranges' -- none (ie, I don't accept partial downloads) and bytes, so for incomplete downloads you can start where you left off. If you know the file's excessively large, I guess you could use it to transfer it in parallel to abuse the TCP congestion rules. (or if you have a way of knowing that there are multiple mirrors, to spread the load across servers). -Joe
Re: [CODE4LIB] It's cool to love milk and cookies
You know, there are some of us who are milk intolerant on this mailing list. And emacs intolerant, too. (although, I did use 'ee' as my editor in elm, but elm took too long to support MIME, so I switched to pine, with their pico default editor, but I don't use any of those I mentioned for coding, even though I am in pico/pine right now, as I still haven't switched to alpine or mutt) -Joe
Re: [CODE4LIB] code4lib server downtime needed
On Wed, 28 Apr 2010, Ryan Ordway wrote: I need to move the server that hosts the code4lib.org website into another rack to make room for some other equipment, when is a good time to do this? You power down machines when moving them? Oh, sure, do it the easy way. (After waiting 2 months for the university I was working for to approve a maintenance window, as we had a machine lift and the machine had two power taps, I ran an extension cord and a long ethernet cable, swapped them in, pulled the machine as far out of the rack as it'd extend on its rails, brought the lift up from under it, ejected it from the rails, rolled it out the way, moved the rails to the new rack, rolled the lift over to the new place, cranked the lift to the new height, re-ingaged the rails, and then swapped back to the new rack's power and patch panel. Only ~ 2 min of downtime, and that was because the switch took 60 sec to test to make sure there wasn't a loop when you changed connections.) -Joe
Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
On Mon, 12 Apr 2010, Jonathan Rochkind wrote: So, as usual, the right tool for the job. If all you really need is a key-value store on ID, then a NoSQL solution may be the right thing. But if you need actual querrying and joining, then personally I'd stick with rdbms unless I had some concrete reason to think a more complicated nosql+solr solution was required. Certainly if you are planning on using Solr _anyway_ because your application is a search engine of some type, that would lessen the incremental 'cost' of a nosql+solr solution. I'm surprised that I keep hearing so much about NoSQL for key-value stores, and everyone seems to forget the *old* key-value stores, such as directory services (X.500 and LDAP, although that's actually the protocol used to query them, not the storage implementation). Yes, there are things that LDAP doesn't do so well (relationships being one of them), but it supports querying, you can adjust the matching by attribute (ie, this one's matched as a number, this one's matched as a string, this one's a case insensitive string ... I think some implementations have functionality to run the search term through a functions for things like soundex, so it might be possible add hooks for stemming and query expansion, etc.) I think that NoSQL got a lot of press because of Google having used it (and their having a *VERY* large data system -- but not everyone has that large of a system; also, Google did it 10+ years ago -- you can now through a lot more CPU and RAM at an RDBMS, so the point at which the database becomes a problem isn't the same as it was when Google first came out.) ... So, I think that there are cases where NoSQL is the right solution for the job, and I think there are times when an DRBMS is the right solution ... there are also plenty of times for flat file databases, XML, LDAP, and a slew of other storage standards. -Joe hmm ... now I'm going to have to try to bring back my attempt to put my catalogs into a directory service ... I have a feeling I'm going to run into issues with unit conversions when searching.
Re: [CODE4LIB] NoSQL - is this a real thing or a flash in the pan?
On Mon, 12 Apr 2010, Ryan Eby wrote: [trimmed] But I'm guessing they've thought about the data and what benefits they would get out of the backend. Wow. You obviously don't work with the same folks that I do. I've been attached to one project for about 16 months now, while the rest of the team's been together for 4 years ... I've been trying to get a few changes made to better support my user community (basically, all of the people who don't have access to their system, or don't want to spend the 6 months using the system 'to be able to do something almost useful'. About 2-3 months ago, the main project team finally realized that they have *no*idea* what the user community wants or needs. Oh, and they have to go live on April 21st. I'm expecting a major 'wtf?' reaction from the majority of the community. -Joe
Re: [CODE4LIB] Works API
On Wed, 31 Mar 2010, stuart yeates wrote: Jonathan Rochkind wrote: Karen Coyle wrote: The OL only has full text links, but the link goes to a page at the Internet Archive that lists all of the available formats. I would prefer that the link go directly to a display of the book, and offer other formats from there (having to click twice really turns people off, especially when they are browsing). So unfortunately, other than full text there won't be more to say. In an API, it would be _optimal_ if you'd reveal all these links, tagged with a controlled vocabulary of some kind letting us know what they are, so the client can decide for itself what to do with them (which may not even be immediately showing them to any user at all, but may be analyzing them for some other purpose). Even better, for those of us who have multiple formats of full text (TEI XML, HTML, ePub, original PDF, reflowed PDF, etc) expose multiple URLs to the full text, differentiated using the mime-type. Would different forms of processing have different mime-types? (ie, we can tell it's a PDF, but can we tell what's actually in it?) Personally, for the different packaging formats, if you're going to be selecting using mime-type, I'd be inclined to hide it all behind a single URL -- the user agent could set the appropriate Accept header, so long as it's being served by HTTP. ... I admit, it's possible that this works better for APIs than user browsing; they might prefer a PDF for digital library objects, but prefer HTML for other purposes. We were hoping to allow users to set cookies to set their preferences on processing packaging for our system, but I'm still waiting for a response to the paperwork that I filed to be allowed to use them. (little known fact -- OMB M-00-13 outlaws cookies on all government websites; OMB M-03-22 spells out some of the procedures for being allowed around it, but I've given up trying to let them know, when they're set up so bad you can't even report themm [3]) -Joe [OMB M-00-13] http://www.whitehouse.gov/omb/memoranda_m00-13/ [OMB M-03-22] http://www.whitehouse.gov/omb/memoranda_m03-22/ [3] http://politics.slashdot.org/comments.pl?sid=1021887cid=25678129
Re: [CODE4LIB] PHP bashing (was: newbie)
On Fri, 26 Mar 2010, Doran, Michael D wrote: As a first language, you want something that let's you Get Stuff Done with a minimum of fuss... If you are getting started and if you are not planning on being a full-time programmer, then you want to be looking at the high-level languages as Mike suggests: the strong candidates include Perl, Python, arguably PHP and my own favourite, Ruby... Even *if* you are looking to be a full-time programmer, I'd recommend most people do stuff in higher-level languages. That earlier development effort that I mentioned, the majority of their work is being done in C -- the system needs to go live in ~30 days, and they're *still* finding memory leaks (signs of poor garbage collection), string and integer overflows (one of the joys of strict typing), etc. Yes, I've done a fair bit of C, and even a little assembler -- and it's fine, if you really, really, need the speed boost. (and some people would argue that this might be a case where they *do* need it, but it'd have been more cost-effective to throw hardware at it, rather than a 10 person team for 2-3 years, even with their 100-node cluster; or better yet, wait to see what happens under real load, and optimize then, rather than building a system with no requirements, and no testing of simulated data flows until 2 months before launch) I've had my share of problems in Perl, where it'd attempt to assume what I mean has led to problems. (specifically, SOAP::Lite's attempt at guessing that a string full of numbers was an integer, not a string, and that a URL should be marked as such, and not a string ... once in a while, I'll hit one of the edge cases with braces where you have to force it as a block or a hash) ... but those are few and far between compared to the segfaults that I've gotten in trying to port their code over to a non-linux system. (spent months on it, as each new version would either not fix the problem, or break new things ... it doesn't help they decided to write their own configuration and build tools because someone must've read 'recursive make considered harmful') ... we finally gave up and just bought new hardware for our caching sites, as we had a feeling that we'd have to keep going through these headaches every new update through the life of the mission. ... sorry, went off on a tangent again. Anyway, the point is -- even us full-time programmers would rather be making new and interesting things, rather than trying to work around problems with our tools. If I'm painting a room, the roller gets the job done fast, and I can deal with a brush for the corners and edges -- there's no reason to do the whole thing with a brush, and it'd just look like crap if I tried doing the whole thing with a roller. Take the same approach in programming -- if you can do 90% of the work really fast and really well in one language, and have to do the other 10% in another language, it doesn't mean you need to do the *whole*thing* the slow way. I believe that all of the 'higher level' languages support some form of linking to C code, should you need it. (although, you don't always need it ... after dealing with scientists insisting that I use their libraries, and trying to get it compiled as an object so I could call it as a postgres function, I finally just gave up and hard-coded the table in PGPLSQL ... I'll just have to update it every few years as leap-seconds are added to UTC) ...crap, tangent again. okay, back to the hell of debugging crappy code with off-by-one errors and race conditions due to lack of locking, and no error checking to see if processes completed. -Joe
[CODE4LIB] PHP bashing (was: newbie)
On Thu, 25 Mar 2010, Brian Stamper wrote: On Wed, 24 Mar 2010 17:51:38 -0400, Mark Tomko mark.to...@simmons.edu wrote: I wouldn't recommend PHP to learn as a programming language, if your goal is to have a general purpose programming language at your disposal. PHP is a fine language for building dynamic web pages, but it won't help you to slice and dice a big text file or process a bunch of XML or do some other odd job that you don't want to do by hand. To be precise, PHP can indeed do these kind of things, particularly in command line mode. I certainly don't recommend it, but if you're used to PHP for other reasons, and you already have it available to you, you can do 'odd jobs' with PHP. You can also use your teeth to open a tight bottle cap, the edge of a knife as a screwdriver, and duct tape to perform auto repairs. You say that as if duct tape is a bad thing for auto repairs. Not all duct tape repairs are candidates for There, I fixed it![1]. It works just fine for the occassional hose repair. -Joe [1] http://thereifixedit.com/
Re: [CODE4LIB] newbie
On Thu, 25 Mar 2010, Yitzchak Schaffer wrote: On 3/24/2010 17:43, Joe Hourcle wrote: I know there's a lot of stuff written in it, but *please* don't recommend PHP to beginners. Yes, you can get a lot of stuff done with it, but I've had way too many incidents where newbie coders didn't check their inputs, and we've had to clean up after them. Another way of looking at this: part of learning a language is learning its vulnerabilities and how to deal with them. And how to avoid security holes in web code in general. Unfortunately, it's not all web code. Part of the issue is in selecting the correct tool for the job. Case in point -- I've been working for the last year to integrate a new data system into our federation. The system officially hasn't gone live yet, so as the institution building the system had replaced their full time DBA with a contractor, the contractor decided he was going to replace all of the work that the DBA had already done to enable external sites to subscribe to collections within the system. Unfortunately, he did the entire thing in shell, and he's passing around SQL scripts, applying them to the database without any validation, and he's hard-coded assumptions about how directories are laid out and where the script has permissions to write. Needless to say, when you get someone reading stuff from config files with *no* taint checking and *no* escaping or even quoting of arguments passed to other commands, I have to clean it up. I even try passing my changes back upstream, but I'm told that the contractor has to make the changes (and he then picks and chooses which security changes he's going to make ... then decides to wrap each 'rm' and dozen other commands in functions (so I can override what command's being called?), and I now have a shell script that's over 1000 lines. (okay, that's not fair ... his version is only 968 lines, it only gets over 1000 when I try to add my corrections to it, and it's only 702 lines when you strip out comments and blank lines) Now, much of it's just plain bad programming -- I mean, would you test to see if variables were set BEFORE loading the config file? Would you run through a series of functions where each one required the other one to complete without actually testing to see if any of them actually worked? (and well, one of those functions was the one that removed a tarball that took an hour to generate at the server, and the next one report back the 'success' to the server, so I couldn't get the server to run it again without getting someone to correct things manually) ... I probably wouldn't be so hot on the topic, if it hadn't occupied the better part of the last month of my life, and all of this last week. (well, it seems that scp'ing a file for the subscription manager to service to process, and create a tarball response with the contents for your database doesn't work too well when the service isn't actually running ... but the way it's written you have *no* idea what the status of the server is). ... sorry, I just needed to vent. Anyway, part of what makes a good programmer is knowing the correct tools to use. (and unfortunately, by definition, any newbie isn't going to have enough languages in their toolbox to be able to make a good selection). Yes, we always have to deal with determining the 'best' language based on what we know, who's going to maintain it, etc, so we sometimes have to go with sub-optimal choices. But much of it's trying to identify what's going to go wrong with what we build, and trying to make sure that it doesn't break in spectacularly bad ways.[1] I guess most people don't have the men with guns show up and take your servers for forensic analysis when some types of things go wrong, which makes me a little more paranoid in my error handling. But if you put it out there on the internet, someone, sooner or later will attempt to abuse it. It could be link spam on blogs, or usurping a guest book program to send spam, or even people claiming that compression artifacts in your data are UFOs[2], resulting in DDoS of your servers. The bad ones are where they find a way to modify your database, add something to your filesystem, or give them a shell on your system. -Joe [1] http://xkcd.com/327/ [2] http://www.google.com/search?q=disclosure+nasa+sun+2010
Re: [CODE4LIB] newbie
On Wed, 24 Mar 2010, Eric Lease Morgan wrote: On Mar 24, 2010, at 3:24 PM, jenny wrote: My question is, where would you recommend I would begin? What's hot right now in the library world? Python, PERL, Ruby? Any advice you'd have for a beginner like me or even recommendations for online courses would be extremely appreciated If you are approaching the problem for the point of view of learning a programming language, then then you have outlined pretty good choices. At the risk of starting a religious war, I like Perl, but PHP is more popular. Java is pretty good too, but IMHO it doesn't really matter. In the end you will need to use the best tool for the job. I know there's a lot of stuff written in it, but *please* don't recommend PHP to beginners. Yes, you can get a lot of stuff done with it, but I've had way too many incidents where newbie coders didn't check their inputs, and we've had to clean up after them. Just yesterday, I was helping someone at another federal agency clean up after someone got in through a PHP script and had turned their site into an ad for cialis. (but cleverly disguised, using their header / footer, and it only showed up when you passed the correct query_string to it) The problem's gotten so bad here, that we've been asked to send our entire web directory on each server to our security office, so that they can run it through some security scanner that looks for problems in PHP code. (they relented to my running 'find' on the system for PHP scripts, as we serve a few dozen TB of data over HTTP) We're also running intrusion detection software that managed to catch someone attempting to exploit refbase (and that was strike #2 against it ... I've never gotten a response to my e-mails to the maintainer, so we've since had to scrap the installs of it that we had). So, anyway ... don't do PHP. Even Tim Bray recommended that at ASIST's 2009 annual meeting, where he gave the plenary. (He recommended people learn Ruby, instead) Personally, I do most of my work in Perl, where I can, but I'd recommend Ruby or Python over someone learning PHP (unless it was to learn enough to migrate code off of PHP). ... and yes, I know I've stirred this pot before: http://www.mail-archive.com/code4lib@listserv.nd.edu/msg06630.html http://www.mail-archive.com/code4lib@listserv.nd.edu/msg06648.html ... And if you're using PHP, and can't get away from it, consider using something like mod_security to watch for signs of malicious behavior: http://www.modsecurity.org/ (note -- not an endorsement, I don't use it myself, as they've got something installed on the upstream firewall that does it ... which means that someone else sees it happen, and then we have to clean it up, fill out paperwork that we've cleaned it up, have meetings about how we're going to clean it up (when we already did), etc.) -Joe
Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas
On Sun, 21 Mar 2010, Karen Coyle wrote: One thing I am finding about FRBR (and want to think about more) is that one seems to come up with different conclusions depending on whether one works down from Work or works up from Item. The assumption that an aggregate in a bound volume is an Expression seems to make sense if you are working up from the Manifestation, but it makes less sense if you are working down from the Work. If decisions change based on the direction, then I think we have a real problem! It's a *reference model*. People are going to apply it differently, for what works in their situation. It is pointless to assume that we will ever get everyone to agree on a single implementation -- it's either too complex and waste's people time for stuff they don't care about, or it's not complex enough and doesn't handle your special situations and strange edge cases. Build the system that makese sense for your needs, and use FRBR as guidelines on issues to consider, basic requirements, etc. It is not an API spec. It is not an interchange format. RDA, on the other hand, is more concrete -- it has specific cataloging instructions on how to deal with specific situations. (and well, in the case of aggregates as new expressions without a resultant new work, as I've come to understand from this discussion, rules that might not comply with FRBR) With the RDA toolkit, you even have a specific implementation. ... Maybe my take on the situation is different because I don't deal with bibliographic objects. Technically, by FRBR, I don't even deal with Items, as it's all digital. (and I don't want to try to answer if little bits of magnetic film spread across my disk arrays make up an 'Item', as then I have to consider things being new Items when my disk array decides to move data around because a drive starts to fail) ... as such, there's no way in hell I'm going to be able to mesh my resultant catalogs with most other people's catalogs (and to do so, wouldn't make sense for the users). I also have to try to mesh other catalogs with our federation, where we just don't have the funding to re-catalog every object, so I'm just trying to see how each catalog fits within a common model, so I know how to talk to each system and how the granularity of their results compares to the results from other systems. I specifically have to plan for everyone coming up with their own systems; some are spectacularly bad. (A new database table every year or month, so we don't hit limits within our database. Multiple related tables, but not actually assigning foreign keys between them. Over 10k tables, with each catalog table storing both current and deprecated data and no way to easy way to select just the deprecated data without going through an overly cumbersome abstraction interface (which merges in constants as stored in other yet other tables) ... and each of the catalog tables has no fixed specification.) ... I'm with Jenn on this -- different groups can set set up their little idealized implementations of FRBR, as is being done with RDA, and the different groups working on their implementation can ignore them when it doesn't fit with their needs. More concrete systems *are* needed, or we're going to end up with a near-infinate number of variations, but some people are going to find it easier to deal with a more restrictive model, where they don't have to deal with complexity; and others are going to have strange edge cases that don't fit within the restrictions that require that same complexity. The final vote on if people accept the restrictions of RDA will be if they decide to adopt it, or if they have to go with some other implementation. -Joe
Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas
On Thu, 18 Mar 2010, Jonathan Rochkind wrote: Karen Coyle wrote: naturally favors the package over the contents. So we'll have some works that are what users think of as works, and other works that represent the publisher's package -- which sometimes will be something that makes sense to the user, but at other times, as in many music CDs, is bordering on the arbitrary. If we present these all as works to the user, confusion will ensue. So it's up to our systems to NOT present things that way, right? If a particular Work is just an aggregate which is not that meaningful to the user, it shouldn't be presented (at least in initial result sets), the meaningful expressions/manifestations should be presented, right? I'm not entirely clear on your example demonstrating that, but I believe you that it exists. I would personally assume so -- you don't want someone searching to see if you have a copy of 'Hamlet', and all you have is 'The Collected Works of William Shakespeare' and so your system reports that you don't. Of course, depending on what the user asks for affects what we respond back with -- even if we have 27 copies of 'Hamlet', we wouldn't respond with 27 records back in response to their request. It's entirely possible (and probable) that systems track objects at a granularity other than what's presented back to the user. If someone's searching for a specific song, so we expect them to know the names of every album it's been on? Yes, our local catalog might only track the albums, but if there's some sort of indication that they're aggregations, we know that we might need to expand them to be able to answer the question. The way I see it, our architectural job is _first_ to create a data model that allows all the neccesary things to be expressed, THEN create systems that use those necessary expressed things to create reasonable displays. I'm still thinking my interpretation (which is not JUST mine, I don't think I even invented it) of aggregate modelling is the only sane one I've seen that allows us to model what in many use cases we'd be allowed to model, without forcing us to model what in many use cases cost-benefit would not justify modelling. It's a *reference* *model* ... it is *not* an implementation. Everyone's allowed to model anything you want. In the RDA relationships (which I've summarized here http://kcoyle.net/rda/group1relsby.html) there seem to be two kinds: intellectual relationships, and bibliographic relationships. Is adapted from is an intellectual relationship; Contains is a bibliographic relationship. They're all mixed together as if they are the same thing. I think you may very well be right that there some be more clarification in the model here. I haven't thought about it enough. There definitely needs to be more clarification in the model as to how to handle aggregates. At one point there was a working group on that, I'm not sure what happened to it. Of course, if the working group came up with something OTHER than my preferred interpretation, I'd be very unhappy. :) The group's two proposals were to model aggregates as works, or as manifestatons, so RDA seems to be on their own modeling them as expressions: http://www.ifla.org/en/events/frbr-working-group-on-aggregates I don't know what happened at the August 2009 meeting, though. William Denton had a breakdown of the August 2008 meeting, which explained some of the issues that they were considering: http://www.frbr.org/2008/08/18/working-group-on-aggregates -Joe
Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas
On Thu, 18 Mar 2010, Jonathan Rochkind wrote: Joe Hourcle wrote: The group's two proposals were to model aggregates as works, or as manifestatons, so RDA seems to be on their own modeling them as expressions: See, this is what I don't understa.d As works, or as manifestations?? In the FRBR model, every single manifestation belongs to _some_ Work, does it not? So I don't understand how those can be alternatives. Or was the proposal to change this? So some manifestations exist free floating belonging to no work at all? (By belonging to in FRBR terms of art, I mean in the FRBR model, every manifestation is the embodiment of SOME expression, which is the realization of SOME Work. Whether that expression or work are yet described or not, they're there in the model. Was the proposal really to change this, so some manifestations are by definition the embodiment of no expression at all, not even an expression that has yet to have an identifier assigned to it? That seems horribly mistaken to me). There's a many-to-many relationship between Expressions and Manifestations in FRBR, so a single Manifestation can encompass multiple Expressions (and therefore, multiple Works). In the Aggregates-as-Manifestations model, something like the 'Complete Works of ...' would exist as a new manifestation, but *not* as a new work. (and those individual works might never exist as individual manifestations) It's of course much more simple to express some items (such as the Canterbury Tales) as a single work (Aggregations-as-Works), and then just make an expressions of them, and the corresponding dozens of possible manifestations. I guess it'd be the FRBR equivalent of data normalization. And aggregating at the work levels makes it easier to reconcile the cases where different catalogers can't agree if it's a single object or multiple objects. I'm torn -- I think both are valid ways of describing the relationships, and different domains are going to try to go the route that makes the most sense for them. (which is likely, which one's the least cost to implement while giving them the functionality they want) -Joe
[CODE4LIB] Any examples of using OAI-ORE for aggregation?
Most of the examples I've seen of OAI-ORE seem to assume that you're ultimately interested in only one object within the resource map -- effectively, it's content negotiation. Has anyone ever played with using ORE to point at an aggregation, with the expectation that the user will be interested in all parts, and automatically download them? ... Let me give a concrete example: A user searches for some data ... we find (x) number of records that match their criteria, and they then weed the list down to 10 files of interest. We then save this request as a Resource Map, as part of an OAIS order. I then want to be able to hand this off to a browser / downloader / whatever to try to obtain the individual files. Currently, I have something that can take the request, and create a tarball on the fly, but we have the unfortunate situation when some of the data is near-line and/or has to be regenerated -- I'm trying to find a good way to effectively fork the request into multiple smaller request, some of which I can service now, and some for which I can return an HTTP 503 status (service unavailable) w/ a retry-after header. ... Has anyone ever tried doing something like this? Should I even be looking at ORE, or is there something that better fits with what I'm trying to do? Thanks for any advice / insight you can give -Joe - Joe Hourcle Programmer/Analyst Solar Data Analysis Center
Re: [CODE4LIB] Q: XML2JSON converter
On Fri, 5 Mar 2010, Godmar Back wrote: On Fri, Mar 5, 2010 at 3:59 AM, Ulrich Schaefer ulrich.schae...@dfki.dewrote: Hi, try this: http://code.google.com/p/xml2json-xslt/ I should have mentioned that I already tried everything I could find after googling - this stylesheet doesn't meet the requirements, not by far. It drops attributes just like simplexml_json does. The one thing I didn't try is a program called 'BadgerFish.php' which I couldn't locate - Google once indexed it at badgerfish.ning.com http://web.archive.org/web/20080216200903/http://badgerfish.ning.com/ http://web.archive.org/web/20071013052842/badgerfish.ning.com/file.php?format=srcpath=lib/BadgerFish.php -Joe
Re: [CODE4LIB] Location of the first Code4Lib North meeting?
On Mon, 25 Jan 2010, Edward M. Corrado wrote: I never had a problem in the couple of times I crossed a border into Canada for a library conference, but I tend to make sure I have the program and hotel information readily available to show them in case they ask (yes, the Canadian border people have looked at it). My guess is that some of the border guards think only old ladies with hair buns can be librarians so they might be a bit confused when someone that doesn't meet that description tries to cross the boarder. I've only been to Canada once (for last year's ASIST), and the only question I had difficulty in answering was who do you work for, for which I probably confused the guy with an explanation of US government contracting. It's not as bad as Israel, where I've heard some people have been asked to give their presentation for the conference, when they said that's why they were visiting. -Joe
Re: [CODE4LIB] Online PHP course?
On Wed, 6 Jan 2010, MJ Ray wrote: Thomas Krichel kric...@openlib.org wrote: Joe Hourcle writes ps. yes, I could've used this response as an opportunity to bash PHP ... and I didn't, because they might be learning PHP to migrate it to something else. controversial ;-) what's the problem(s) with PHP? Oh please don't nuke the list from orbit like that! I hope that this is a balanced enough reply to keep everyone happy: Our experience is that PHP hosting environments vary much more, most PHP code is a mess (PHP-based software was part of 35% of the U.S. government's National Vulnerability Database in 2008 - http://www.coelho.net/php_cve.html) and few things (code and hosting) move between the different major versions smoothly. It's a personal home page tool which has grown massively, for better or worse. BUT! Even after all that, software.coop still supports some PHP applications because they can work well and be very useful, though we're under no illusions about PHP's warts. I can sum it up in one sentance: PHP makes it *very* easy to write insecure programs. Of the security incidents in our department (the ones where men with guns come and take your hard drive and/or whole server away for an 'investigaton'), PHP has been responsible for the majority of the incidents. Part of it is the perceived simplicity -- look at how easy it is to add some extra functionality to your website! You don't even need to understand good programming practices! Anyone can do it! (to be fair -- Perl used to be the software that fell into this niche 10 years ago, but I blame Matt's Script Archive more than the language itself, as Perl isn't specifically for web site automation) ... and they never get their code reviewed by one of the professional programmers in our department, it goes live, and then, a year or so later, someone shows up to take our server because the security monitoring showed that it looks like someone managed to pull our password file off the system. (never mind that (1) there's a shadow file, so /etc/passwd has no passwords in it, and (2) even if they got the password file, it only has the application users (none of whom have login privs) because it's macosx) Then you waste a week of your time trying to convince the security gestapo that yes, there was a security vulnerability, and there was an incident, but nothing confidential was actually lost ... and then we get everyone who had stuff on the server bitching us out because they can't get to their stuff, and they had some time-sensitive information to get out, or whatever, and we're trying to jump through security's hoops for a week or two while our other projects get further and further behind. ... Now, if they actually manage to *upload* a file to your system ... then expect to rebuild your whole machine from the ground up. so um ... if you're going to use PHP ... if you're on apache, look into suPHP. Consider making your website served from a read-only file system, and look online for other tips on hardening your server. -Joe oh, and I also really disike having to tye all of my stuff to one database. I know mysqli makes it better, but the original mysql stuff still taints my perception of PHP. I also have a dislike of ColdFusion servers, but that stems from the 'unix registry' crap they used (still use?) back when they were still Allaire, and I had a few times when the system choked and I had to rebuild all settings from memory the first time, and from printouts of the server configuration the next few times. And then there was the time at a previous job when we upgraded the server and they pushed in changes that made the service crash every night at about 2am ... so I'd get a call every night to restart the thing ... until I finally wrote a watchdog script which by the time I got fired, was restarting the service 5-8 times per night ... but I actually *liked* coldfusion as a developer. ... and so long as we're mentioning PHP, and this is code4lib -- anyone personally know the developer of refbase? I tried emailing him a few months back offering patches to get rid of all of the 'deprecated' warnings when running under php5.
Re: [CODE4LIB] Online PHP course?
On Tue, 5 Jan 2010, Tod Olson wrote: One of our staff needs to learn PHP, and an online course is preferred. Is there an online PHP course that any of you would recommend? If they already understand basic programming, and just need to pick up the syntactic issues, some of the documentation from w3schools is good -- I haven't looked over their PHP stuff specifically, though: http://www.w3schools.com/php/default.asp -Joe ps. yes, I could've used this response as an opportunity to bash PHP ... and I didn't, because they might be learning PHP to migrate it to something else.
Re: [CODE4LIB] good and best open source software
On Tue, 29 Dec 2009, Thomas Krichel wrote: Requiring an upfront healthy community is particurly problematic is a small community such as digital library work. On the other kind, there is widely adopted software that I got cajoled into maintaining, that consider bad. Apache is one of them. I run maybe 50 virtual servers an a bunch of boxes, I am still puzzled how it works and it's trial and error with each software upgrade, where goes that NameVirtualServer thing into, the constant croaks server foo has no virtualserver. I'm not a dunce, but Apache makes me feel I am one. When I look at these config files that are half-baked XML, I wonder what weed the guy smoked who invented this. If I could do it allover again, I would do it in lighttpd. Oh well it was not there in 1995 where I started running web servers. Other problematic case: Mailman. I run about 130 mailing lists, over 80 have a non-standard config, I am running every few months into problems with onne of them, despite the fact that I wrote a script to configure all the non-standard lists the same way. Even if they don't have specific forums, if they're more widely adopted software, you might have luck with well populated, but more generic forums: programming related: http://stackoverflow.com/ server administration: http://serverfault.com/ other IT stuff: http://superuser.com/ I admit that I haven't specifically asked any questions about Apache or Mailman, though. -Joe
Re: [CODE4LIB] good and best open source software
On Tue, 29 Dec 2009, Jonathan Rochkind wrote: I think you may find yourself somewhat in the minority in thinking Apache is bad software. (I certainly have my complaints about it, but in general I find it more robust, flexible, and bug-free than just about any other software I work with). But aside from getting into a war about some particular package: It may be true that in general popular software does not necessarily equal good software -- even popular open source software. And doesn't neccesarily equal the right software solution for your problem. (I could mention some library-sector-origin open source software I think proves that, but I won't, and it would just be my opinion anyways, like yours of Apache). But popular software _does_ mean software that has a much higher chance of continuing to evolve with the times instead of stagnating, getting it's bugs and security flaws fixed in a timely manner, and having a much larger base of question-answering and support available for it (both free and paid). Which is one important criteria for evaluating open source software. But nobody was suggesting it should be the _only_ criteria used for evaluating open source software, or even neccesarily the most important. It depends on your situation. I think that part of the problem here is that software tends to fill a niche, and some of these larger software projects tend to fill the 'enterprise' niche. Now, Apache 2 in many ways *is* easier to configure than Apache 1.3, but the sheer number of configuration options from all of the different modules makes it more difficult to configure than the Netscape/iPlanet/ SunOne product line. (at least to me, other people might not be making the sorts of changes that I deal with). However, there's a lot of power in Apache's configuration ability ... I just wish I didn't have to deal with all of it.* ... but it's like anything -- if I switch to a different server, it might be easier to configure, but then I lose mod_perl support, so it's a trade-off. -Joe * I think I lost a week trying to get some software virtual hosts working correctly, where there'd be a 'default' host, and one that only responded to specific names and had some alternate security options.
Re: [CODE4LIB] good and best open source software
On Mon, 28 Dec 2009, Eric Lease Morgan wrote: For my own education and cogitation, I have begun to list questions to help me address what I think is the best library-related open source software. [1] Your comments would be greatly appreciated. I have listed the questions here in (more or less) personal priority order: * Does the software work as advertised? * To what degree is the software supported? * Is the documentation thorough? * What are the licence terms? * To what degree is the software easy to install? * To what degree is the software implemented using the standard LAMP stack? * Is the distribution in question an application/system or a library/module? * To what degree does the software satisfy some sort of real library need? What sorts of things have I left out? Is there anything here that can be measurable or is everything left to subjective judgement? Just as importantly, can we as a community answer these questions in light of distributions to come up with the best of class? + How often do I have to update it to keep ahead of security exploits? + Does it play well with other software? (eg, does it break under updated libraries and/or does the installer try to force me to update every library on my system to bleeding edge for no good reason?) (aspect #2 might fall under the 'easy to install' item) ... You could also end up with some outdated software that meets all of the requirements, but is based on older standards that might not be relevant today. -Joe
Re: [CODE4LIB] calling another webpage within CGI script
On Mon, 23 Nov 2009, Ken Irwin wrote: Hi all, I'm moving to a new web server and struggling to get it configured properly. The problem of the moment: having a Perl CGI script call another web page in the background and make decisions based on its content. On the old server I used an antique Perl script called hcat (from the Pelican bookhttp://oreilly.com/openbook/webclient/ch04.html); I've also tried curl and LWP::Simple. In all three cases, I get the same behavior: it works just fine on the command line, but when called by the web server through a CGI script, the LWP (or other socket connection) gets no results. It sounds like a permissions thing, but I don't know what kind of permissions setting to tinker with. In the test script below, my command line outputs: Content-type: text/plain Getting URL: http://www.npr.org 885 lines Whereas the web output just says Getting URL: http://www.npr.org; - and doesn't even get to the Couldn't get error message. Any clue how I can make use of a web page's contents from w/in a CGI script? (The actual application has to do with exporting data from our catalog, but I need to work out the basic mechanism first.) Here's the script I'm using. #!/bin/perl use LWP::Simple; print Content-type: text/plain\n\n; my $url = http://www.npr.org;; print Getting URL: $url\n; my $content = get $url; die Couldn't get $url unless defined $content; @lines = split (/\n/, $content); foreach (@lines) { $i++; } print \n\n$i lines\n\n; Any ideas? I'd suggest testing the results of the call, rather than just looking for content, as an empty response could be a result of the server you're connecting to. (unlikely in this case, but it happens once in a while, particularly if you turn off redirection, or support caching). Unfortunately, you might have to use LWP::UserAgent, rather than LWP::Simple: #!/bin/perl -- use strict; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent-new( timeout = 60 ); my $response = $ua-get('http://www.npr.org/'); if ( $response-is_success() ) { my $content = $response-decoded_content(); ... } else { print HTTP Error : ,$response-status_line(),\n; } __END__ (and changing the shebang line for my location of perl, your version worked via both CGI and command line) oh ... and you don't need the foreach loop: my $i = @lines; -Joe
Re: [CODE4LIB] Library Linked Data
On Wed, 28 Oct 2009, Roy Tennant wrote: David, Could you elaborate a bit? In my mind, the only semantic web technology of any note is linked data. How that fits into library search is anyone's guess, and I'm wondering what, specifically, you're referring to when you say that Talis is active in this area. If you are asking about library linked data, then there are several examples, most notably the Library of Congress[1], the Swedish Union Catalogue [2], and OCLC[3][4]. I believe that a minimum both the Library of Congress and OCLC plan on releasing more linked data sets. So can you elaborate a bit more on what, exactly, you're seeking? Thanks, Roy [1] http://id.loc.gov/authorities/ [2] http://article.gmane.org/gmane.culture.libraries.ngc4lib/4617 [3] http://dewey.info/ [4] http://outgoing.typepad.com/outgoing/2009/09/viaf-as-linked-data.html For some other information on what other groups are doing in this regard, the DCMI (Dublin Core) just had a meeting in Korea two weeks ago, with the theme Semantic Interoperability of Linked Data http://www.dc2009.kr/ And there was a CENDI/NKOS workshop that I attended last week, that featured many of the same speakers. http://nkos.slis.kent.edu/2009workshop/NKOS-CENDI2009.htm Both sites have presentations linked from their sites. I can forward on my notes from the CENDI/NKOS workshop, but I'll warn you in advance that I wrote them for a different intended audience (folks on an interoperability project that I'm attached to), so I might've trimmed some stuff that's of general interest to folks in libraries, while bringing out stuff that isn't. The CENDI folks are all US Government, but there seems to be a wider range of people in NKOS. I don't know how much of it fits into the typical 'library' definition, other than the Library of Congress stuff that was already mentioned. -Joe
Re: [CODE4LIB] Bookmarking web links - authoritativeness or focused searching
On Tue, 29 Sep 2009, Cindy Harper wrote: I've been thinking about the role of libraries as promoter of authoritative works - helping to select and sort the plethora of information out there. And I heard another presentation about social media this morning. So I though I'd bring up for discussion here some of the ideas I've been mulling over. [trimmed] Is anyone else thinking about these ideas? or do you know of projects that approach this goal of leveraging librarian's vetting of authoritative sources? I don't know of any projects that specifically do what you've mentioned, but for the last few years, we've been mulling over how to store various lists and catalogs so that we could present interesting intersections of them. In my case, I deal with scientific catalogs, so it's stuff like when was RHESSI observing the same area as TRACE? or When was there an X-class flare within 2 hours of a CME? or even lack of intersections When were there type-II radio bursts without a CME or flare within 6 hours? For the science catalogs, we specifically don't want to just make some sort of single ranking from each list, and it's not really easy to merge the catalogs into some form of union catalog as they're cataloging different concepts. ... and I think that there's use in library searches to keep the catalogs different, particularly when you're bringing up authority (which then gets to reputation, etc.). I'm not sure how many other people out there would try to search for Hugo award winning novels that weren't on the New York Times best seller list, so it might not be as useful for general patron use ... unless you could give it your *own* catalog (AFI top 100 movies ... that I don't already own) - Joe Hourcle Solar Data Analysis Center Goddard Space Flight Center
Re: [CODE4LIB] Implementing OpenURL for simple web resources
On Mon, 14 Sep 2009, Mike Taylor wrote: 2009/9/14 Jonathan Rochkind rochk...@jhu.edu: Seriously, don't use OpenURL unless you really can't find anything else that will do, or you actually want your OpenURLs to be used by the existing 'in the wild' OpenURL resolvers. In the latter case, don't count on them doing anything in particular or consistent with 'novel' OpenURLs, like ones that put an end-user access URL in rft_id... don't expect actually existing in the wild OpenURLs to do anything in particular with that. Jonathan, I am getting seriously mixed messages from you on this thread. In one message, you'll strongly insist that some facility in OpenURL is or isn't useful; in the next, you'll be saying that the whole standard is dead. The last time I was paying serious attention to OpenURL, that certainly wasn't true -- has something happened in the last few months to make it so? My interpretation of the part of Jonathan's response that you quoted was basically, don't use OpenURL when you're just looking for persistant URLs. The whole point of OpenURL was that the local resolver could determine what the best way to get you the resource was (eg, digital library vs. ILL vs. giving you a specific room shelf). If you're using OpenURLs for the reason of having it work with the established network of resolvers, don't get cute w/ encoding the information, as you can't rely on it to work. ... From what I've seen of the thread (and I admit, I didn't read every message), what's needed here is PURL, not OpenURL. -Joe