So much duplication. If only there were some sort of organization that might
serve as a clearinghouse for this sort of code that's useful to libraries...
[Yes, I know the only appropriate response is, "Well, Dueber, step up and do
something about it." ]
On Mon, Oct 24, 2011 at 4:59 PM, Jon Gorman
Also, I don't know OpenBook to know your source data, but don't forget
a lot of publishers have printed ISBNs in different ways over the past
few years. The regex would choke on any hyphens. If users are
copying from printed material, they could type them in. For example,
one of the books near my
If you're looking for PHP code, then I've done some work for a long-dormant
project:
http://code.google.com/p/txtckr/source/browse/trunk/mvc/components/identifiers
HTH,
Tom
On Sat, Oct 22, 2011 at 6:44 AM, Kozlowski,Brendon wrote:
> Hi all.
>
>
>
> I'm somewhat surprised that I've never had to
> But I had no idea Marc8 allowed escape sequences to temporarily switch
> to a different encoding. Really? Oh my god.
For you young'uns that were "born Unicode" and are a bit foggy on the MARC-8
environment (and all its... intricacies), I did a short write-up a few years
ago:
Coded Cha
On Oct 24, 2011, at 3:03 PM, Jon Gorman wrote:
> yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 marc21.raw >marc21.utf8.raw
This worked great! My version of yaz-marcdump was older and was not doing the
trick. code4lib++
--
Eric
Yeah, but if there's Perl code and Java code to do it, can't be _that_
hard to port to ruby if I could figure out what you need to do to
get first-class char encoding support in ruby 1.9 anyway.
I mean, you could do it just as a library without that... but it's
enough trouble that, yeah, I
Hi Jonathan,
> I tried to figure out how to custom add a new encoding to ruby 1.9 with
> the idea of adding Marc8 as an actuall ruby 1.9 character encoding
> supported same as any other built in char encoding
Not a trivial undertaking. Remember that the MARC-8 environment allows
alternate char
>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>>> (encoding) data?
>>
>> You can't. MARC-8 is a character set that is unknown to the operating
>> system. Your best bet is to convert MARC-8-encoded records into UTF-8.
>
> /me throws his hands up in the air and screams!
What _ought_ to be easiest of all is getting our ILS's to NEVER export
Marc8 _ever_ again. UTF8 only.
Sadly, that only ought to be easiest.
But IMO there's no reason any of us should be dealing with Marc8 ever
again. The only thing that should deal in Marc8 is an ILS, and should
only input
On 10/24/2011 2:52 PM, Ross Singer wrote:
On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote:
Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know
yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert
MARC-8 characters to UTF-8? (I guess I
If I understand correctly, there's some support for this in pymarc as well:
https://github.com/edsu/pymarc/blob/master/pymarc/marc8.py#L22
-Mike
On Mon, Oct 24, 2011 at 14:52, Jonathan Rochkind wrote:
> Woah, there is a library in Perl to do that? Sweet! Okay, now I know two
> languages with
Eric,
Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or
internationalization -- it's worth pinging these lists:
perl4...@perl.org (yes, still alive and kicking)
perl-i...@perl.org (very low traffic list, but some knowledgeable
subscribers)
-- Michael
On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan wrote:
> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know
> yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert
> MARC-8 characters to UTF-8? (I guess I could simply try it and see what
> hap
Woah, there is a library in Perl to do that? Sweet! Okay, now I know
two languages with such a library, Perl and Java.
Anyone want to write one for ruby? :)
On 10/24/2011 2:47 PM, Doran, Michael D wrote:
Okay. How do I go about converting MARC-8 encoded records into UTF-8?
In Perl... using t
The only language that I know of with a library for reading Marc8 and
converting to another encoding (such as UTF-8) is Java. The Marc4J
package will do it.
I suppose there may be C libraries too; is yaz written in C?
As Michael suggests the easiest thing to do (if you're not in Java) is
prob
> Okay. How do I go about converting MARC-8 encoded records into UTF-8?
In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed Summers,
and now maintained by Galen Charlton).
-- Michael
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On B
> I know yaz-marcdump changes the encoding bit in MARC
> leaders. Does it also convert MARC-8 characters to UTF-8?
Yes. We use it for that purpose all the time.
--Dave
-
David Walker
Library Web Services Manager
California State University
-Original Message-
From: Code
On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:
>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>> (encoding) data?
>
> You can't. MARC-8 is a character set that is unknown to the operating
> system. Your best bet is to convert MARC-8-encoded records into UTF-8.
> As an FTY,
Oops, in a hurry. s/FTY/FYI/
> -Original Message-
> From: Doran, Michael D
> Sent: Monday, October 24, 2011 1:35 PM
> To: 'Code for Libraries'
> Subject: RE: marc-8
>
> Hi Eric,
>
> > In Perl, how do I specify MARC-8 when reading (decoding) and writing
> > (encoding) data?
On 10/24/2011 01:48 PM, Jonathan Rochkind wrote:
>
> Or perhaps the fact that my web form has a 'name' and 'email' form makes
> the spambots decide it just _must_ be a blog comment form. I suppose
> taking away the 'name' and 'email' labels might help, although it might
> mess up our workflow too
Hi Eric,
> In Perl, how do I specify MARC-8 when reading (decoding) and writing
> (encoding) data?
You can't. MARC-8 is a character set that is unknown to the operating system.
Your best bet is to convert MARC-8-encoded records into UTF-8.
> ...it is converted it Perl's
> internal encoding (
In Perl, how do I specify MARC-8 when reading (decoding) and writing (encoding)
data?
Character encoding is the bane of my existence. I have learned that when
reading from a file I ought to specify the type of encoding the file is in and
decode accordingly, or else. Once read, it is converted i
Exactly. The thing is that the incremental cost to the spambot operator for
hitting any form is essentially zero. It's the same model as traditional spam:
hit everything, and hope that a fraction of a fraction of a fraction of a
percent produce a return. It's easier and cheaper to program the
Hi Mike--
As far predicting the cost of digital archives, you might check out the
LIFE project [1], which has been through three cycles now. The first
couple focused on analysis of costing structures for various use cases
(including electronic journal content), and the third one focused on
cr
I can say from experience that that won't help - spambots even hit lone
forms with nondescript names.
- Dave Mayo
On Mon, Oct 24, 2011 at 1:48 PM, Jonathan Rochkind wrote:
> On 10/24/2011 1:15 PM, MJ Ray wrote:
>
>> trying to design things so that the return on investment
>> for spammers is fai
On 10/24/2011 1:15 PM, MJ Ray wrote:
trying to design things so that the return on investment
for spammers is fairly low,
In my experience, this is irrelevant. I have spammers spamming my "ask a
librarian a question" link, which _only_ results in email to a
librarian's inbox (several of them
Ken Irwin wrote:
> Some of our online forms (contact, archives request, etc.) have been
> getting a bunch of spam lately. I have heretofore avoided using any
> of those obnoxious Captcha things and would rather not start now. (I
> personally loathe them and they keep getting harder, which tells me
In addition to the methods mentioned here which I've had good experiences
with, another thing that can be effective is simply checking the fields for
inappropriate content.
For example, bots love to try to insert links, raw HTML, things that belong
in mail headers, etc. On the off chance that a hu
Using Dre and Erin's method, I've got a fix in place. It caught two spams in
the first 10 minutes!
I named the field to see if I can trick it into
knowing exactly what to fill in. I just realized that I didn't echo the results
of that field in my notification of the possible spam; I'm going to
On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote:
> Are there any published studies that predict and compare the long-term
> preservation ability and cost efficiency of physical and digital
> archives? I would like to either back up or refute my intuition!
Regarding the cost, this paper from P
Yep, we're assuming spambots fill out any/all text fields they can find,
so any submission that has a value in that field is discarded.
Ideally this method comes with some kind of graceful failure message, so
that if you're dealing with a human, that person won't be tearing their
hair out wond
Perhaps this code would be of some use:
https://vufind.svn.sourceforge.net/svnroot/vufind/trunk/web/sys/ISBN.php
- Demian
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Jonathan Rochkind
> Sent: Monday, October 24, 2011 10:45 AM
> To: COD
Check out Project Wombat - http://project-wombat.org/. Which succeeded the
stumpers-l list.
Beth Black
Associate Professor and Systems Librarian
Head, Web Implementation Team
Ohio State University Libraries
Science and Engineering Library, Room 002A
175 West 18th Avenue
Columbus, Ohio 43210
614-
Is there a particular label you give it that causes spambots to fill it
out, or you find that spambots stick some text in any type="text"> you include?
On 10/24/2011 10:46 AM, Erin R White/FS/VCU wrote:
I'll second Dre's method here. We've used it with great success on our
mobile website - it a
I'll second Dre's method here. We've used it with great success on our
mobile website - it adds zero effort for users and we've had maybe one
false positive since March 2010.
The field is input type="text" with CSS hiding it and its label from
display. From my extensive googling it like as of J
I blog extensively at WordPress, which uses akismet for spam
filtering. I am frequently astonished at how well it works. I
average maybe five to ten real comments per day and perhaps twice as
many spams. Of those, maybe one or two real comments PER YEAR are
falsely flagged as spam, and perhaps f
John: That's not going to work, an ISBN can end in "X" as a check digit,
which is not [0-9]. You are going to be rejecting valid ISBN's, you
have a bug.
On 10/24/2011 10:40 AM, John Miedema wrote:
Here's a php function I use in OpenBook to test if a user has entered a 10
or 13 digit ISBN.
//
Using the free Akismet API used to work fairly well for me, and is
completely invisible to the user, it basically uses a behind-the-scenes
maintained blacklist of IP addresses/email addresses, and other
heuristics to identify spam.
Haven't used it in a while though, don't know how well it's do
Here's a php function I use in OpenBook to test if a user has entered a 10
or 13 digit ISBN.
//test if 10 or 13 digits ISBN
function openbook_utilities_validISBN($testisbn) {
return (ereg ("([0-9]{10})", $testisbn, $regs) || ereg ("([0-9]{13})",
$testisbn, $regs));
}
On Fri, Oct 21, 2011 at 1:4
This is an intriguing approach, Dre. I wonder how to render this
non-problematic for folks with screen-readers too. You could just say "leave
this field blank" but that's sort of weird too. Is there a WAI-ARIA approach
that would get screen readers to hide this field too?
I'm looking into Moll
> if Wikipedia is to be believed, there are books out there issued with
ISBNs that aren't valid, so any strong enforcement using the check
digit may lead to the occasional issue.
I can personally vouch for the veracity of Wikipedia on this point. In
a library of our size, I've see it at least a f
I gave the example of journal articles.
-- Mike.
On 24 October 2011 15:15, Mark A. Matienzo wrote:
> What do you mean by "documents"?
>
> Mark A. Matienzo
> Digital Archivist, Manuscripts and Archives
> Yale University Library
>
>
>
> On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote:
>> Does
I've found that a simple skill testing question does wonders for form
spam reduction, e.g.,
What is five times five? [input box 2 chars wide]
Maybe I've dealt with a dumber class of bots, though . . . .
Andrew
On Mon, Oct 24, 2011 at 10:12 AM, Andreas Orphanides
wrote:
>
> Here's a method tha
What do you mean by "documents"?
Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library
On Mon, Oct 24, 2011 at 5:36 AM, Mike Taylor wrote:
> Does anyone out there know what recent studies predict for the
> lifetime of digitally archived and preserved documents?
>
Here's a method that's by no means foolproof but is practically zero cost (you
may be using a version already). Disclaimer -- I have not actually tested this
to any extent:
Include a text input field in your form that needs to be blank for the form to
validate in the back end. Keep the field
On Mon, Oct 24, 2011 at 8:26 AM, Ken Irwin wrote:
> One that occurs to me to try, and I have no idea if this would match well
> with actual bot behavior: at the time the form loads, include at hidden field
> with id=[unixtimestamp]. When the form is submitted, ignore any forms that
> took less
What lists of listserv, forums, websites, et al are around the web ask
for your reference desk stumpers!?...
Mollom is pretty decent...
http://mollom.com works with a lot of cms's
It is commercial with 100 free positives per day, and can require captcha,
but it tries to avoid it with a crowd sourced algorithm approach
On 10/24/11 9:26 AM, "Ken Irwin" wrote:
>Hi folks,
>
>Some of our online forms (con
Hi folks,
Some of our online forms (contact, archives request, etc.) have been getting a
bunch of spam lately. I have heretofore avoided using any of those obnoxious
Captcha things and would rather not start now. (I personally loathe them and
they keep getting harder, which tells me that the sp
Founded in 1870, The Ohio State University is a comprehensive, state-assisted
university offering a complete environment for learning for its 3,000 faculty
and 56,000 students. The Ohio State University Libraries is seeking a
motivated, creative person to join our team charged with designing, de
Founded in 1870, The Ohio State University is a comprehensive, state-assisted
university offering a complete environment for learning for its 3,000 faculty
and 56,000 students. The Ohio State University Libraries is seeking a
motivated, creative person to join our team charged with designing, de
Does anyone out there know what recent studies predict for the
lifetime of digitally archived and preserved documents?
For example, there is a long and mostly pretty successful tradition of
preserving journal articles as paper in bound volumes, and experiment
shows that this approach is usually go
52 matches
Mail list logo