Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Bill Dueber
So much duplication. If only there were some sort of organization that might serve as a clearinghouse for this sort of code that's useful to libraries... [Yes, I know the only appropriate response is, "Well, Dueber, step up and do something about it." ] On Mon, Oct 24, 2011 at 4:59 PM, Jon Gorman

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Jon Gorman
Also, I don't know OpenBook to know your source data, but don't forget a lot of publishers have printed ISBNs in different ways over the past few years. The regex would choke on any hyphens. If users are copying from printed material, they could type them in. For example, one of the books near my

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Tom Pasley
If you're looking for PHP code, then I've done some work for a long-dormant project: http://code.google.com/p/txtckr/source/browse/trunk/mvc/components/identifiers HTH, Tom On Sat, Oct 22, 2011 at 6:44 AM, Kozlowski,Brendon wrote: > Hi all. > > > > I'm somewhat surprised that I've never had to

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
> But I had no idea Marc8 allowed escape sequences to temporarily switch > to a different encoding. Really? Oh my god. For you young'uns that were "born Unicode" and are a bit foggy on the MARC-8 environment (and all its... intricacies), I did a short write-up a few years ago: Coded Cha

Re: [CODE4LIB] marc-8

2011-10-24 Thread Eric Lease Morgan
On Oct 24, 2011, at 3:03 PM, Jon Gorman wrote: > yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw This worked great! My version of yaz-marcdump was older and was not doing the trick. code4lib++ -- Eric

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jonathan Rochkind
Yeah, but if there's Perl code and Java code to do it, can't be _that_ hard to port to ruby if I could figure out what you need to do to get first-class char encoding support in ruby 1.9 anyway. I mean, you could do it just as a library without that... but it's enough trouble that, yeah, I

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
Hi Jonathan, > I tried to figure out how to custom add a new encoding to ruby 1.9 with > the idea of adding Marc8 as an actuall ruby 1.9 character encoding > supported same as any other built in char encoding Not a trivial undertaking. Remember that the MARC-8 environment allows alternate char

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jon Gorman
>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing >>> (encoding) data? >> >> You can't. MARC-8 is a character set that is unknown to the operating >> system. Your best bet is to convert MARC-8-encoded records into UTF-8. > > /me throws his hands up in the air and screams!

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jonathan Rochkind
What _ought_ to be easiest of all is getting our ILS's to NEVER export Marc8 _ever_ again. UTF8 only. Sadly, that only ought to be easiest. But IMO there's no reason any of us should be dealing with Marc8 ever again. The only thing that should deal in Marc8 is an ILS, and should only input

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jonathan Rochkind
On 10/24/2011 2:52 PM, Ross Singer wrote: On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote: Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I

Re: [CODE4LIB] marc-8

2011-10-24 Thread Michael J. Giarlo
If I understand correctly, there's some support for this in pymarc as well: https://github.com/edsu/pymarc/blob/master/pymarc/marc8.py#L22 -Mike On Mon, Oct 24, 2011 at 14:52, Jonathan Rochkind wrote: > Woah, there is a library in Perl to do that? Sweet!  Okay, now I know two > languages with

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
Eric, Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or internationalization -- it's worth pinging these lists: perl4...@perl.org (yes, still alive and kicking) perl-i...@perl.org (very low traffic list, but some knowledgeable subscribers) -- Michael

Re: [CODE4LIB] marc-8

2011-10-24 Thread Ross Singer
On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote: > Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know > yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert > MARC-8 characters to UTF-8? (I guess I could simply try it and see what > hap

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jonathan Rochkind
Woah, there is a library in Perl to do that? Sweet! Okay, now I know two languages with such a library, Perl and Java. Anyone want to write one for ruby? :) On 10/24/2011 2:47 PM, Doran, Michael D wrote: Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using t

Re: [CODE4LIB] marc-8

2011-10-24 Thread Jonathan Rochkind
The only language that I know of with a library for reading Marc8 and converting to another encoding (such as UTF-8) is Java. The Marc4J package will do it. I suppose there may be C libraries too; is yaz written in C? As Michael suggests the easiest thing to do (if you're not in Java) is prob

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
> Okay. How do I go about converting MARC-8 encoded records into UTF-8? In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers, and now maintained by Galen Charlton). -- Michael > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On B

Re: [CODE4LIB] marc-8

2011-10-24 Thread Walker, David
> I know yaz-marcdump changes the encoding bit in MARC > leaders. Does it also convert MARC-8 characters to UTF-8? Yes. We use it for that purpose all the time. --Dave - David Walker Library Web Services Manager California State University -Original Message- From: Code

Re: [CODE4LIB] marc-8

2011-10-24 Thread Eric Lease Morgan
On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: >> In Perl, how do I specify MARC-8 when reading (decoding) and writing >> (encoding) data? > > You can't. MARC-8 is a character set that is unknown to the operating > system. Your best bet is to convert MARC-8-encoded records into UTF-8.

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
> As an FTY, Oops, in a hurry. s/FTY/FYI/ > -Original Message- > From: Doran, Michael D > Sent: Monday, October 24, 2011 1:35 PM > To: 'Code for Libraries' > Subject: RE: marc-8 > > Hi Eric, > > > In Perl, how do I specify MARC-8 when reading (decoding) and writing > > (encoding) data?

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Thomas Dowling
On 10/24/2011 01:48 PM, Jonathan Rochkind wrote: > > Or perhaps the fact that my web form has a 'name' and 'email' form makes > the spambots decide it just _must_ be a blog comment form. I suppose > taking away the 'name' and 'email' labels might help, although it might > mess up our workflow too

Re: [CODE4LIB] marc-8

2011-10-24 Thread Doran, Michael D
Hi Eric, > In Perl, how do I specify MARC-8 when reading (decoding) and writing > (encoding) data? You can't. MARC-8 is a character set that is unknown to the operating system. Your best bet is to convert MARC-8-encoded records into UTF-8. > ...it is converted it Perl's > internal encoding (

[CODE4LIB] marc-8

2011-10-24 Thread Eric Lease Morgan
In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding) data? Character encoding is the bane of my existence. I have learned that when reading from a file I ought to specify the type of encoding the file is in and decode accordingly, or else. Once read, it is converted i

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Andreas Orphanides
Exactly. The thing is that the incremental cost to the spambot operator for hitting any form is essentially zero. It's the same model as traditional spam: hit everything, and hope that a fraction of a fraction of a fraction of a percent produce a return. It's easier and cheaper to program the

Re: [CODE4LIB] Digital archiving and preservation

2011-10-24 Thread Stephen Marks
Hi Mike-- As far predicting the cost of digital archives, you might check out the LIFE project [1], which has been through three cycles now. The first couple focused on analysis of costing structures for various use cases (including electronic journal content), and the third one focused on cr

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread David Mayo
I can say from experience that that won't help - spambots even hit lone forms with nondescript names. - Dave Mayo On Mon, Oct 24, 2011 at 1:48 PM, Jonathan Rochkind wrote: > On 10/24/2011 1:15 PM, MJ Ray wrote: > >> trying to design things so that the return on investment >> for spammers is fai

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Jonathan Rochkind
On 10/24/2011 1:15 PM, MJ Ray wrote: trying to design things so that the return on investment for spammers is fairly low, In my experience, this is irrelevant. I have spammers spamming my "ask a librarian a question" link, which _only_ results in email to a librarian's inbox (several of them

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread MJ Ray
Ken Irwin wrote: > Some of our online forms (contact, archives request, etc.) have been > getting a bunch of spam lately. I have heretofore avoided using any > of those obnoxious Captcha things and would rather not start now. (I > personally loathe them and they keep getting harder, which tells me

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Kyle Banerjee
In addition to the methods mentioned here which I've had good experiences with, another thing that can be effective is simply checking the fields for inappropriate content. For example, bots love to try to insert links, raw HTML, things that belong in mail headers, etc. On the off chance that a hu

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Ken Irwin
Using Dre and Erin's method, I've got a fix in place. It caught two spams in the first 10 minutes! I named the field to see if I can trick it into knowing exactly what to fill in. I just realized that I didn't echo the results of that field in my notification of the possible spam; I'm going to

Re: [CODE4LIB] Digital archiving and preservation

2011-10-24 Thread Keith Jenkins
On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote: > Are there any published studies that predict and compare the long-term > preservation ability and cost efficiency of physical and digital > archives?  I would like to either back up or refute my intuition! Regarding the cost, this paper from P

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Erin R White/FS/VCU
Yep, we're assuming spambots fill out any/all text fields they can find, so any submission that has a value in that field is discarded. Ideally this method comes with some kind of graceful failure message, so that if you're dealing with a human, that person won't be tearing their hair out wond

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Demian Katz
Perhaps this code would be of some use: https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/sys/ISBN.php - Demian > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Monday, October 24, 2011 10:45 AM > To: COD

Re: [CODE4LIB] What lists of listserv, forums, websites, et al are around the web asking for your reference desk stumpers!?...

2011-10-24 Thread Black, Elizabeth
Check out Project Wombat - http://project-wombat.org/. Which succeeded the stumpers-l list. Beth Black Associate Professor and Systems Librarian Head, Web Implementation Team Ohio State University Libraries Science and Engineering Library, Room 002A 175 West 18th Avenue Columbus, Ohio 43210 614-

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Jonathan Rochkind
Is there a particular label you give it that causes spambots to fill it out, or you find that spambots stick some text in any type="text"> you include? On 10/24/2011 10:46 AM, Erin R White/FS/VCU wrote: I'll second Dre's method here. We've used it with great success on our mobile website - it a

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Erin R White/FS/VCU
I'll second Dre's method here. We've used it with great success on our mobile website - it adds zero effort for users and we've had maybe one false positive since March 2010. The field is input type="text" with CSS hiding it and its label from display. From my extensive googling it like as of J

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Mike Taylor
I blog extensively at WordPress, which uses akismet for spam filtering. I am frequently astonished at how well it works. I average maybe five to ten real comments per day and perhaps twice as many spams. Of those, maybe one or two real comments PER YEAR are falsely flagged as spam, and perhaps f

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Jonathan Rochkind
John: That's not going to work, an ISBN can end in "X" as a check digit, which is not [0-9]. You are going to be rejecting valid ISBN's, you have a bug. On 10/24/2011 10:40 AM, John Miedema wrote: Here's a php function I use in OpenBook to test if a user has entered a 10 or 13 digit ISBN. //

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Jonathan Rochkind
Using the free Akismet API used to work fairly well for me, and is completely invisible to the user, it basically uses a behind-the-scenes maintained blacklist of IP addresses/email addresses, and other heuristics to identify spam. Haven't used it in a while though, don't know how well it's do

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread John Miedema
Here's a php function I use in OpenBook to test if a user has entered a 10 or 13 digit ISBN. //test if 10 or 13 digits ISBN function openbook_utilities_validISBN($testisbn) { return (ereg ("([0-9]{10})", $testisbn, $regs) || ereg ("([0-9]{13})", $testisbn, $regs)); } On Fri, Oct 21, 2011 at 1:4

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Ken Irwin
This is an intriguing approach, Dre. I wonder how to render this non-problematic for folks with screen-readers too. You could just say "leave this field blank" but that's sort of weird too. Is there a WAI-ARIA approach that would get screen readers to hide this field too? I'm looking into Moll

Re: [CODE4LIB] ISBN Regular Expression

2011-10-24 Thread Jacobs, Jane W
> if Wikipedia is to be believed, there are books out there issued with ISBNs that aren't valid, so any strong enforcement using the check digit may lead to the occasional issue. I can personally vouch for the veracity of Wikipedia on this point. In a library of our size, I've see it at least a f

Re: [CODE4LIB] Digital archiving and preservation

2011-10-24 Thread Mike Taylor
I gave the example of journal articles. -- Mike. On 24 October 2011 15:15, Mark A. Matienzo wrote: > What do you mean by "documents"? > > Mark A. Matienzo > Digital Archivist, Manuscripts and Archives > Yale University Library > > > > On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote: >> Does

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Andrew Darby
I've found that a simple skill testing question does wonders for form spam reduction, e.g., What is five times five? [input box 2 chars wide] Maybe I've dealt with a dumber class of bots, though . . . . Andrew On Mon, Oct 24, 2011 at 10:12 AM, Andreas Orphanides wrote: > > Here's a method tha

Re: [CODE4LIB] Digital archiving and preservation

2011-10-24 Thread Mark A. Matienzo
What do you mean by "documents"? Mark A. Matienzo Digital Archivist, Manuscripts and Archives Yale University Library On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote: > Does anyone out there know what recent studies predict for the > lifetime of digitally archived and preserved documents? >

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Andreas Orphanides
Here's a method that's by no means foolproof but is practically zero cost (you may be using a version already). Disclaimer -- I have not actually tested this to any extent: Include a text input field in your form that needs to be blank for the form to validate in the back end. Keep the field

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Nate Vack
On Mon, Oct 24, 2011 at 8:26 AM, Ken Irwin wrote: > One that occurs to me to try, and I have no idea if this would match well > with actual bot behavior: at the time the form loads, include at hidden field > with id=[unixtimestamp]. When the form is submitted, ignore any forms that > took less

[CODE4LIB] What lists of listserv, forums, websites, et al are around the web asking for your reference desk stumpers!?...

2011-10-24 Thread don warner saklad
What lists of listserv, forums, websites, et al are around the web ask for your reference desk stumpers!?...

Re: [CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Parker, Anson (adp6j)
Mollom is pretty decent... http://mollom.com works with a lot of cms's It is commercial with 100 free positives per day, and can require captcha, but it tries to avoid it with a crowd sourced algorithm approach On 10/24/11 9:26 AM, "Ken Irwin" wrote: >Hi folks, > >Some of our online forms (con

[CODE4LIB] web spam block less awful than Captcha?

2011-10-24 Thread Ken Irwin
Hi folks, Some of our online forms (contact, archives request, etc.) have been getting a bunch of spam lately. I have heretofore avoided using any of those obnoxious Captcha things and would rather not start now. (I personally loathe them and they keep getting harder, which tells me that the sp

[CODE4LIB] Web Application Developer Position at Ohio State University

2011-10-24 Thread Black, Elizabeth
Founded in 1870, The Ohio State University is a comprehensive, state-assisted university offering a complete environment for learning for its 3,000 faculty and 56,000 students. The Ohio State University Libraries is seeking a motivated, creative person to join our team charged with designing, de

[CODE4LIB] Front End Web Developer Position at Ohio State University

2011-10-24 Thread Black, Elizabeth
Founded in 1870, The Ohio State University is a comprehensive, state-assisted university offering a complete environment for learning for its 3,000 faculty and 56,000 students. The Ohio State University Libraries is seeking a motivated, creative person to join our team charged with designing, de

[CODE4LIB] Digital archiving and preservation

2011-10-24 Thread Mike Taylor
Does anyone out there know what recent studies predict for the lifetime of digitally archived and preserved documents? For example, there is a long and mostly pretty successful tradition of preserving journal articles as paper in bound volumes, and experiment shows that this approach is usually go