Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jon Stroop
Coming from nowhere on this...is there a place where it would be convenient to flag which behavior the user (of the library) wants? I think you're correct that most of the time you'd just want to blow through it (or replace it), but for the situation where this isn't the case, I think the

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Scott Prater
We run into this problem fairly regularly, and in fact, ran into it on Monday with ruby-marc. The way we've traditionally handled it is to put our marc stream through a cleanup preprocessor before passing it off to a marc parser (ruby marc or marc4j). The preprocessor can do one of two

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jonathan Rochkind
I am not sure how you ran into this problem on Monday with ruby-marc, since ruby-marc doesn't currently handle Marc8 conversion to UTF-8 at all -- how could you have run into a problem with Marc8 to UTF8 conversion? But that is what I am adding. But yeah, using a preprocessor is certainly

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Scott Prater
Not sure what the details of our issue was on Monday -- but we do have records that are supposedly encoded in UTF-8, but nonetheless contain invalid characters. I think raising an exception is fine, as long as we can still continue to walk the records with the reader. The right thing for

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jonathan Rochkind
Yeah, the default in ruby-marc for encodings that _aren't_ MARC8 are to ignore bad bytes entirely -- leave them in the MARC::Record as bad bytes. This is likely end up raising an exception later when you try to DO something with those Strings, but was left this way for backwards compatiblity

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jonathan Rochkind
On 11/20/13 11:40 AM, Scott Prater wrote: Not sure what the details of our issue was on Monday -- but we do have records that are supposedly encoded in UTF-8, but nonetheless contain invalid characters. Oh, and I'd clarify, if you haven't figured it out already, if those are ISO 2709 binary

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Scott Prater
On 11/20/2013 11:18 AM, Jonathan Rochkind wrote: On 11/20/13 11:40 AM, Scott Prater wrote: I would suggest one or the other -- the default of leaving bad bytes in your ruby strings is asking for trouble, and you probably don't want to do it, but was made the default for backwards compat

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Robert Haschart
When I first started working on marc4j, its behavior was to behave as suggested here, ie. expect the records to be correctly formed in almost every respect, and to throw an exception when an error was encountered, it was done in a way that didn't even allow the processing to continue with the

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Jonathan Rochkind
On 11/20/13 12:51 PM, Scott Prater wrote: I think the issue comes down to a distinction between a stream and a record. Ideally, the ruby-marc library would keep pointers to which record it is in, where the record begins, and where the record ends in the stream. If a valid header and

Re: [CODE4LIB] ruby-marc api design feedback wanted

2013-11-20 Thread Scott Prater
Thanks, Jonathan. We'll definitely check it out. -- Scott On 11/20/2013 12:13 PM, Jonathan Rochkind wrote: On 11/20/13 12:51 PM, Scott Prater wrote: I think the issue comes down to a distinction between a stream and a record. Ideally, the ruby-marc library would keep pointers to which

[CODE4LIB] ruby-marc api design feedback wanted

2013-11-19 Thread Jonathan Rochkind
ruby-marc users, a question. I am working on some Marc8 to UTF-8 conversion for ruby-marc. Sometimes, what appears to be an illegal byte will appear in the Marc8 input, and it can not be converted to UTF8. The software will support two alternatives when this happens: 1) Raising an