On 6 April 2011, Eric Lease Morgan wrote:
http://zoia.library.nd.edu/tmp/tor.marc
Happily, Kevin's magic formula recognizes this as MARC!
Bill
--
William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
I'm not quite convinced that it's marc-8 just because there's \xC2 ;).
If you look at a hex dump I'm seeing a lot of what might be combining
characters. The leader appears to have 'a' in the field to indicate
unicode. In the raw hex I'm seeing a lot of two character sequences
like: 756c 69c3 83
My apologies for any cross-posting. I'm happy to announce the posting of a
1-year fixed term Digital Humanities Developer position in Academic Computing
Services here at Stanford University. The job description is provided below.
Interested candidates can apply for the position at jobs.stanfo
That's hilarious, that Terry has had to do enough ugliness with Marc
encodings that he indeed can recognize 0xC2 off the bat as the Marc8
encoding it represents! I am in awe, as well as sympathy.
If the record is in Marc8, then you need to know if Perl Batch::Marc can
handle Marc8. If it's s
Lol!
So right off the bat I see that the leader says the record is 1091 bytes
long, but it is actually 1089 bytes long and I end up missing the leader
for the next record. Maybe a CR/LF problem? I see that frequently as a
way to mangle MARC records when moving them around.
Is your problem in th
I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in
MARC-8. I'd guess the file isn't in UTF8.
--TR
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, April 06, 2011 1:28 PM
> To
On Apr 6, 2011, at 4:46 PM, LeVan,Ralph wrote:
>> Ack! While using the venerable Perl MARC::Batch module I get the
>> following error while trying to read a MARC record:
>>
>> utf8 "\xC2" does not map to Unicode
>
> Can you share the record somewhere? I suspect many of us have tools we
> can
On 6 April 2011 19:53, Jonathan Rochkind wrote:
> On 4/6/2011 2:43 PM, William Denton wrote:
>>
>> "Validity" does mean something definite ... but Postel's Law is a good
>> guideline, especially with the swamp of bad MARC, old MARC, alternate
>> MARC, that's out there. Valid MARC is valid MARC, b
Forwarding because I think this will be of interest to some folks on the
list...
-- Forwarded message --
***SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY
VOCABULARY ENGINEERING***
We are pleased to announce the addition of more HIVE workshops!
*DATES AND
Can you share the record somewhere? I suspect many of us have tools we
can turn loose on it.
Ralph
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
> Jonathan Rochkind
> Sent: Wednesday, April 06, 2011 4:28 PM
> To: CODE4LIB@LISTSERV.ND.EDU
>
I am not familar with that Perl module. But I'm more familiar then I'd
want with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became
pathetically familiar with in past debugging, but I've forgotten em),
but the first things to look at:
1. Is your Marc file
Ack! While using the venerable Perl MARC::Batch module I get the following
error while trying to read a MARC record:
utf8 "\xC2" does not map to Unicode
This is a real pain, and I'm hoping someone here can help me either: 1) trap
this error allowing me to move on, or 2) figure out how to open
> Well, the problem is when the original Marc4J author took the spec at it's
> word, and actually _acted upon_ the '4' and the '5', changing file semantics
> if they were different, and throwing an exception if it was a non-digit.
>
At least the author actually used the values rather than checking
On 4/6/2011 2:43 PM, William Denton wrote:
"Validity" does mean something definite ... but Postel's Law is a good
guideline, especially with the swamp of bad MARC, old MARC, alternate
MARC, that's out there. Valid MARC is valid MARC, but if---for the sake
of file and its magic---we can identify
On 6 April 2011, Jonathan Rochkind wrote:
I think we computer programmers are really better-served by reserving the
notion of "validity" for things specified by formal specifications -- as we
normally do, talking about any other data format. And the only formal
specifications I can find for
On 4/6/2011 2:02 PM, Kyle Banerjee wrote:
I'd go so far as to question the value of validating redundant data that
theoretically has meaning but which are never supposed to vary. The 4 and
the 5 simply repeat what is already known about the structure of the MARC
record. Choking on stuff like this
.. Maybe we have different understandings of "valid".
>
> If leader bytes 20-23 are not "4500", I suggest that is _by definition_ not
> a "valid" Marc21 file. It violates the Marc21 specification.
>
> Now, they may still be _usable_, by software that ignores these bytes
> anyway or works aroun
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fields, which in my opinion,
is more open to
Forwarded:
The Open Annotation Collaboration (OAC) project is pleased to announce
a Request For Proposal to collaborate with OAC researchers for
building implementations of the OAC data model and ontology. The OAC
is seeking to collaborate with scholars and/or librarians currently
using and/or cur
I'm honestly not family with magic. I can tell you in MarcEdit, the way that
the process works is there is a very generic function that reads the structure
of the data not trusting the information in the leader (since I find this data
very un-reliable). Then, if users want to apply a set of ru
On 6 April 2011, Reese, Terry wrote:
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fie
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fields, which in my opinion, is more
open
Just as a historical note, this non-standard use of LDR/22 is likely due to
OCLC's use of the character as a hexadecimal flag from back in the days when
marc records were mostly schlepped around on tapes. They referred to it as the
"Transaction type code". When records were sent to oclc for pr
I'm not sure what you mean Terry. Maybe we have different
understandings of "valid".
If leader bytes 20-23 are not "4500", I suggest that is _by definition_
not a "valid" Marc21 file. It violates the Marc21 specification.
Now, they may still be _usable_, by software that ignores these bytes
Actually, you can have records that are MARC21 coming out of vendor databases
(who sometime embed control characters into the leader) and still be valid.
Once you stop looking at just your ILS or OCLC, you probably wouldn't be
surprised to know that records start looking very different.
--TR
Can't you have a legal "MARC" file that does NOT have 4500 in those
leader positions? It's just not legal "Marc21", right? Other marc
formats may specify or even allow flexibility in the things these bytes
specify:
* Length of the length-of-field portion
* Number of characters in the starti
Well, this brings us right up against the issue of files that adhere to their
specifications versus forgiving applications. Think of browsers and HTML.
Suffice it to say, MARC applications are quite likely to be forgiving of leader
positions 20-23. In my non-conforming MARC file and in Bill's
27 matches
Mail list logo