date:20110406

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread William Denton

On 6 April 2011, Eric Lease Morgan wrote: http://zoia.library.nd.edu/tmp/tor.marc Happily, Kevin's magic formula recognizes this as MARC! Bill -- William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Jon Gorman

I'm not quite convinced that it's marc-8 just because there's \xC2 ;). If you look at a hex dump I'm seeing a lot of what might be combining characters. The leader appears to have 'a' in the field to indicate unicode. In the raw hex I'm seeing a lot of two character sequences like: 756c 69c3 83

[CODE4LIB] Digital Humanities Developer - Stanford

2011-04-06 Thread Elijah Meeks

My apologies for any cross-posting. I'm happy to announce the posting of a 1-year fixed term Digital Humanities Developer position in Academic Computing Services here at Stanford University. The job description is provided below. Interested candidates can apply for the position at jobs.stanfo

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Jonathan Rochkind

That's hilarious, that Terry has had to do enough ugliness with Marc encodings that he indeed can recognize 0xC2 off the bat as the Marc8 encoding it represents! I am in awe, as well as sympathy. If the record is in Marc8, then you need to know if Perl Batch::Marc can handle Marc8. If it's s

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread LeVan,Ralph

Lol! So right off the bat I see that the leader says the record is 1091 bytes long, but it is actually 1089 bytes long and I end up missing the leader for the next record. Maybe a CR/LF problem? I see that frequently as a way to mangle MARC records when moving them around. Is your problem in th

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Reese, Terry

I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in MARC-8. I'd guess the file isn't in UTF8. --TR > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Wednesday, April 06, 2011 1:28 PM > To

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Eric Lease Morgan

On Apr 6, 2011, at 4:46 PM, LeVan,Ralph wrote: >> Ack! While using the venerable Perl MARC::Batch module I get the >> following error while trying to read a MARC record: >> >> utf8 "\xC2" does not map to Unicode > > Can you share the record somewhere? I suspect many of us have tools we > can

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Mike Taylor

On 6 April 2011 19:53, Jonathan Rochkind wrote: > On 4/6/2011 2:43 PM, William Denton wrote: >> >> "Validity" does mean something definite ... but Postel's Law is a good >> guideline, especially with the swamp of bad MARC, old MARC, alternate >> MARC, that's out there. Valid MARC is valid MARC, b

[CODE4LIB] SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY VOCABULARY ENGINEERING

2011-04-06 Thread Kevin S. Clarke

Forwarding because I think this will be of interest to some folks on the list... -- Forwarded message -- ***SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY VOCABULARY ENGINEERING*** We are pleased to announce the addition of more HIVE workshops! *DATES AND

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread LeVan,Ralph

Can you share the record somewhere? I suspect many of us have tools we can turn loose on it. Ralph > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Wednesday, April 06, 2011 4:28 PM > To: CODE4LIB@LISTSERV.ND.EDU >

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Jonathan Rochkind

I am not familar with that Perl module. But I'm more familiar then I'd want with char encoding in Marc. I don't recognize the bytes 0xC2 (there are some bytes I became pathetically familiar with in past debugging, but I've forgotten em), but the first things to look at: 1. Is your Marc file

[CODE4LIB] utf8 "\xC2" does not map to Unicode

2011-04-06 Thread Eric Lease Morgan

Ack! While using the venerable Perl MARC::Batch module I get the following error while trying to read a MARC record: utf8 "\xC2" does not map to Unicode This is a real pain, and I'm hoping someone here can help me either: 1) trap this error allowing me to move on, or 2) figure out how to open

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee

> Well, the problem is when the original Marc4J author took the spec at it's > word, and actually _acted upon_ the '4' and the '5', changing file semantics > if they were different, and throwing an exception if it was a non-digit. > At least the author actually used the values rather than checking

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind

On 4/6/2011 2:43 PM, William Denton wrote: "Validity" does mean something definite ... but Postel's Law is a good guideline, especially with the swamp of bad MARC, old MARC, alternate MARC, that's out there. Valid MARC is valid MARC, but if---for the sake of file and its magic---we can identify

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton

On 6 April 2011, Jonathan Rochkind wrote: I think we computer programmers are really better-served by reserving the notion of "validity" for things specified by formal specifications -- as we normally do, talking about any other data format. And the only formal specifications I can find for

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind

On 4/6/2011 2:02 PM, Kyle Banerjee wrote: I'd go so far as to question the value of validating redundant data that theoretically has meaning but which are never supposed to vary. The 4 and the 5 simply repeat what is already known about the structure of the MARC record. Choking on stuff like this

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee

.. Maybe we have different understandings of "valid". > > If leader bytes 20-23 are not "4500", I suggest that is _by definition_ not > a "valid" Marc21 file. It violates the Marc21 specification. > > Now, they may still be _usable_, by software that ignores these bytes > anyway or works aroun

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind

Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open to

[CODE4LIB] Fwd: OAC RFP Annoncement

2011-04-06 Thread Robert Sanderson

Forwarded: The Open Annotation Collaboration (OAC) project is pleased to announce a Request For Proposal to collaborate with OAC researchers for building implementations of the OAC data model and ontology. The OAC is seeking to collaborate with scholars and/or librarians currently using and/or cur

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry

I'm honestly not family with magic. I can tell you in MarcEdit, the way that the process works is there is a very generic function that reads the structure of the data not trusting the information in the leader (since I find this data very un-reliable). Then, if users want to apply a set of ru

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton

On 6 April 2011, Reese, Terry wrote: Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fie

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry

Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Prettyman, Timothy

Just as a historical note, this non-standard use of LDR/22 is likely due to OCLC's use of the character as a hexadecimal flag from back in the days when marc records were mostly schlepped around on tapes. They referred to it as the "Transaction type code". When records were sent to oclc for pr

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind

I'm not sure what you mean Terry. Maybe we have different understandings of "valid". If leader bytes 20-23 are not "4500", I suggest that is _by definition_ not a "valid" Marc21 file. It violates the Marc21 specification. Now, they may still be _usable_, by software that ignores these bytes

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry

Actually, you can have records that are MARC21 coming out of vendor databases (who sometime embed control characters into the leader) and still be valid. Once you stop looking at just your ILS or OCLC, you probably wouldn't be surprised to know that records start looking very different. --TR

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind

Can't you have a legal "MARC" file that does NOT have 4500 in those leader positions? It's just not legal "Marc21", right? Other marc formats may specify or even allow flexibility in the things these bytes specify: * Length of the length-of-field portion * Number of characters in the starti

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Ford, Kevin

Well, this brings us right up against the issue of files that adhere to their specifications versus forgiving applications. Think of browsers and HTML. Suffice it to say, MARC applications are quite likely to be forgiving of leader positions 20-23. In my non-conforming MARC file and in Bill's

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

[CODE4LIB] Digital Humanities Developer - Stanford

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] MARC magic for file

[CODE4LIB] SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY VOCABULARY ENGINEERING

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

[CODE4LIB] utf8 "\xC2" does not map to Unicode

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

[CODE4LIB] Fwd: OAC RFP Annoncement

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

27 matches

Site Navigation

Mail list logo

Footer information