Actually, you can have records that are MARC21 coming out of vendor databases
(who sometime embed control characters into the leader) and still be valid.
Once you stop looking at just your ILS or OCLC, you probably wouldn't be
surprised to know that records start looking very different.
--TR
I'm not sure what you mean Terry. Maybe we have different
understandings of valid.
If leader bytes 20-23 are not 4500, I suggest that is _by definition_
not a valid Marc21 file. It violates the Marc21 specification.
Now, they may still be _usable_, by software that ignores these bytes
Just as a historical note, this non-standard use of LDR/22 is likely due to
OCLC's use of the character as a hexadecimal flag from back in the days when
marc records were mostly schlepped around on tapes. They referred to it as the
Transaction type code. When records were sent to oclc for
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fields, which in my opinion, is more
On 6 April 2011, Reese, Terry wrote:
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of
I'm honestly not family with magic. I can tell you in MarcEdit, the way that
the process works is there is a very generic function that reads the structure
of the data not trusting the information in the leader (since I find this data
very un-reliable). Then, if users want to apply a set of
Forwarded:
The Open Annotation Collaboration (OAC) project is pleased to announce
a Request For Proposal to collaborate with OAC researchers for
building implementations of the OAC data model and ontology. The OAC
is seeking to collaborate with scholars and/or librarians currently
using and/or
.. Maybe we have different understandings of valid.
If leader bytes 20-23 are not 4500, I suggest that is _by definition_ not
a valid Marc21 file. It violates the Marc21 specification.
Now, they may still be _usable_, by software that ignores these bytes
anyway or works around them. We
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fields, which in my opinion,
is more open
On 4/6/2011 2:02 PM, Kyle Banerjee wrote:
I'd go so far as to question the value of validating redundant data that
theoretically has meaning but which are never supposed to vary. The 4 and
the 5 simply repeat what is already known about the structure of the MARC
record. Choking on stuff like
On 6 April 2011, Jonathan Rochkind wrote:
I think we computer programmers are really better-served by reserving the
notion of validity for things specified by formal specifications -- as we
normally do, talking about any other data format. And the only formal
specifications I can find for
On 4/6/2011 2:43 PM, William Denton wrote:
Validity does mean something definite ... but Postel's Law is a good
guideline, especially with the swamp of bad MARC, old MARC, alternate
MARC, that's out there. Valid MARC is valid MARC, but if---for the sake
of file and its magic---we can identify
Well, the problem is when the original Marc4J author took the spec at it's
word, and actually _acted upon_ the '4' and the '5', changing file semantics
if they were different, and throwing an exception if it was a non-digit.
At least the author actually used the values rather than checking to
Ack! While using the venerable Perl MARC::Batch module I get the following
error while trying to read a MARC record:
utf8 \xC2 does not map to Unicode
This is a real pain, and I'm hoping someone here can help me either: 1) trap
this error allowing me to move on, or 2) figure out how to open
I am not familar with that Perl module. But I'm more familiar then I'd
want with char encoding in Marc.
I don't recognize the bytes 0xC2 (there are some bytes I became
pathetically familiar with in past debugging, but I've forgotten em),
but the first things to look at:
1. Is your Marc file
Can you share the record somewhere? I suspect many of us have tools we
can turn loose on it.
Ralph
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
Of
Jonathan Rochkind
Sent: Wednesday, April 06, 2011 4:28 PM
To: CODE4LIB@LISTSERV.ND.EDU
Forwarding because I think this will be of interest to some folks on the
list...
-- Forwarded message --
***SKOS-2-HIVE: CREATING SKOS VOCABULARIES TO HELP INTERDISCIPLINARY
VOCABULARY ENGINEERING***
We are pleased to announce the addition of more HIVE workshops!
*DATES
On Apr 6, 2011, at 4:46 PM, LeVan,Ralph wrote:
Ack! While using the venerable Perl MARC::Batch module I get the
following error while trying to read a MARC record:
utf8 \xC2 does not map to Unicode
Can you share the record somewhere? I suspect many of us have tools we
can turn loose
On 6 April 2011 19:53, Jonathan Rochkind rochk...@jhu.edu wrote:
On 4/6/2011 2:43 PM, William Denton wrote:
Validity does mean something definite ... but Postel's Law is a good
guideline, especially with the swamp of bad MARC, old MARC, alternate
MARC, that's out there. Valid MARC is valid
I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in
MARC-8. I'd guess the file isn't in UTF8.
--TR
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, April 06, 2011 1:28 PM
To:
Lol!
So right off the bat I see that the leader says the record is 1091 bytes
long, but it is actually 1089 bytes long and I end up missing the leader
for the next record. Maybe a CR/LF problem? I see that frequently as a
way to mangle MARC records when moving them around.
Is your problem in
That's hilarious, that Terry has had to do enough ugliness with Marc
encodings that he indeed can recognize 0xC2 off the bat as the Marc8
encoding it represents! I am in awe, as well as sympathy.
If the record is in Marc8, then you need to know if Perl Batch::Marc can
handle Marc8. If it's
I'm not quite convinced that it's marc-8 just because there's \xC2 ;).
If you look at a hex dump I'm seeing a lot of what might be combining
characters. The leader appears to have 'a' in the field to indicate
unicode. In the raw hex I'm seeing a lot of two character sequences
like: 756c 69c3
On 6 April 2011, Eric Lease Morgan wrote:
http://zoia.library.nd.edu/tmp/tor.marc
Happily, Kevin's magic formula recognizes this as MARC!
Bill
--
William Denton, Toronto : miskatonic.org www.frbr.org openfrbr.org
24 matches
Mail list logo