On Wed, May 23, 2012 at 6:16 PM, Kyle Banerjee
wrote:
> I'm not sure whether to laugh or cry that it's a sign of progress that a 40
> year old utility designed to identify file types is now just beginning to
> be able to recognize a format that's been around for almost 50 years...
Laugh :-)
//Ed
On Wed, May 23, 2012 at 12:14 PM, Ford, Kevin wrote:
> I finally had occasion today (read: remembered) to see if the *nix "file"
> command would recognize a MARC record file. I haven't tested extensively,
> but it did identify the file as MARC21 Bibliographic record. It also
> correctly identif
The file format magic format magic changed between versions; I think the
OSX version was not compatible with more up to date versions (in the
original thread, this caused me some confusion).
Simon
On Wed, May 23, 2012 at 4:34 PM, Ross Singer wrote:
> On May 23, 2012, at 4:22 PM, Kevin Ford wrot
On Wed, May 23, 2012 at 04:34:47PM -0400, Ross Singer wrote:
> On May 23, 2012, at 4:22 PM, Kevin Ford wrote:
>
> > Don't know what to say. Crawling through the source for "file" at [1], the
> > pattern matching code as in place as of Sept 2011. It could be present
> > earlier than Sept 2011,
On May 23, 2012, at 4:22 PM, Kevin Ford wrote:
> Don't know what to say. Crawling through the source for "file" at [1], the
> pattern matching code as in place as of Sept 2011. It could be present
> earlier than Sept 2011, but I stopped hunting for it. The earliest it would
> have made its w
tification. If
requested, the mimetype - application/marc - should also be outputted.
Rgds,
Kevin
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Ross Singer
Sent: Wednesday, May 23, 2012 3:29 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC Ma
Don't know what to say. Crawling through the source for "file" at [1],
the pattern matching code as in place as of Sept 2011. It could be
present earlier than Sept 2011, but I stopped hunting for it. The
earliest it would have made its way into the magic db would have been
April 2011.
Perh
On 24/05/12 07:14, Ford, Kevin wrote:
I finally had occasion today (read: remembered) to see if the *nix "file"
command would recognize a MARC record file. I haven't tested extensively, but it did
identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21
Authori
are not "4500" then "(non-conforming)" should be appended to the identification. If
requested, the mimetype - application/marc - should also be outputted.
Rgds,
Kevin
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Ross Singer
S
ODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Ross Singer
> Sent: Wednesday, May 23, 2012 3:29 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] MARC Magic for file
>
> Wow, this is pretty cool.
>
> Kevin, do you have examples of the output?
>
> Does it work
On Wed, May 23, 2012 at 03:28:56PM -0400, Ross Singer wrote:
> Wow, this is pretty cool.
>
> Kevin, do you have examples of the output?
>
> Does it work for bulk files?
>
> I mean, I could just try this on my Ubuntu machine, but it's all the way
> downstairs...
My OS lists it as `data`
$ cd
$
Wow, this is pretty cool.
Kevin, do you have examples of the output?
Does it work for bulk files?
I mean, I could just try this on my Ubuntu machine, but it's all the way
downstairs...
-Ross.
On May 23, 2012, at 3:14 PM, Ford, Kevin wrote:
> I finally had occasion today (read: remembered) to
I finally had occasion today (read: remembered) to see if the *nix "file"
command would recognize a MARC record file. I haven't tested extensively, but
it did identify the file as MARC21 Bibliographic record. It also correctly
identified a MARC21 Authority Record. I'm running the most recent
http://i.imgur.com/6WtA0.png
(Sorry, it's Friday. Also, blame dchud for the idea.)
-Sean
On 4/6/11 4:53 PM, "Mike Taylor" wrote:
> On 6 April 2011 19:53, Jonathan Rochkind wrote:
>> On 4/6/2011 2:43 PM, William Denton wrote:
>>>
>>> "Validity" does mean something definite ... but Postel's L
On 6 April 2011 19:53, Jonathan Rochkind wrote:
> On 4/6/2011 2:43 PM, William Denton wrote:
>>
>> "Validity" does mean something definite ... but Postel's Law is a good
>> guideline, especially with the swamp of bad MARC, old MARC, alternate
>> MARC, that's out there. Valid MARC is valid MARC, b
> Well, the problem is when the original Marc4J author took the spec at it's
> word, and actually _acted upon_ the '4' and the '5', changing file semantics
> if they were different, and throwing an exception if it was a non-digit.
>
At least the author actually used the values rather than checking
On 4/6/2011 2:43 PM, William Denton wrote:
"Validity" does mean something definite ... but Postel's Law is a good
guideline, especially with the swamp of bad MARC, old MARC, alternate
MARC, that's out there. Valid MARC is valid MARC, but if---for the sake
of file and its magic---we can identify
On 6 April 2011, Jonathan Rochkind wrote:
I think we computer programmers are really better-served by reserving the
notion of "validity" for things specified by formal specifications -- as we
normally do, talking about any other data format. And the only formal
specifications I can find for
On 4/6/2011 2:02 PM, Kyle Banerjee wrote:
I'd go so far as to question the value of validating redundant data that
theoretically has meaning but which are never supposed to vary. The 4 and
the 5 simply repeat what is already known about the structure of the MARC
record. Choking on stuff like this
.. Maybe we have different understandings of "valid".
>
> If leader bytes 20-23 are not "4500", I suggest that is _by definition_ not
> a "valid" Marc21 file. It violates the Marc21 specification.
>
> Now, they may still be _usable_, by software that ignores these bytes
> anyway or works aroun
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fields, which in my opinion,
is more open to
ess.
--tr
From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of William Denton
[w...@pobox.com]
Sent: Wednesday, April 06, 2011 10:29 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC magic for file
On 6 April 2011, Reese, Terry wrote:
> Actually --
On 6 April 2011, Reese, Terry wrote:
Actually -- I'd disagree because that is a very narrow view of the
specification. When validating MARC, I'd take the approach to validate
structure (which allows you to then read any MARC format) -- then use a
separate process for validating content of fie
use the two are not exclusive.
--TR
> -Original Message-
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Wednesday, April 06, 2011 9:59 AM
> To: Code for Libraries
> Cc: Reese, Terry
> Subject: Re: [CODE4LIB] MARC magic for file
>
> I'm not sure w
Kevin
>
> --
> Library of Congress
> Network Development and MARC Standards Office
>
>
>
>
>
>
> From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero
> [s...@unc.edu]
> Sent: Sunday, Apri
***
-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Jonathan Rochkind
Sent: Wednesday, April 06, 2011 9:44 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC magic for file
Can't you have a legal "MARC"
to:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, April 06, 2011 9:44 AM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] MARC magic for file
>
> Can't you have a legal "MARC" file that does NOT have 4500 in those
> leader positions? It
] On Behalf Of Simon Spero
[s...@unc.edu]
Sent: Sunday, April 03, 2011 14:01
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC magic for file
I am pretty sure that the marc4j standard reader ignores them; the tolerant
reader definitely does. Otherwise JHU might have about two parseable record
t and MARC Standards Office
From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero
[s...@unc.edu]
Sent: Sunday, April 03, 2011 14:01
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] MARC magic for file
I am pretty sure that the
I am pretty sure that the marc4j standard reader ignores them; the tolerant
reader definitely does. Otherwise JHU might have about two parseable records
based on the mangled leaders that J-Rock gets stuck with :-)
An analysis of the ~7M LC bib records from the scriblio.net data files (~
Dec 2006)
"I'm sure any decent MARC tool can deal with them, since decent MARC tools
are certainly going to be forgiving enough to deal with four characters that
apparently don't even really matter."
You say that, but I'm pretty sure Marc4J throws errors MARC records where
these characters are incorrect
Ow
On 28 March 2011, Ford, Kevin wrote:
I couldn't get Simon's MARC 21 Magic file to work. Among other issues,
I received "line too long" errors. But, since I've been curious about
this for sometime, I figured I'd take a whack at it myself. Try this:
This is very nice! Thanks. I tried it on
formal inclusion in the magic file.
Warmly,
Kevin
--
Library of Congress
Network Development and MARC Standards Office
From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero
[s...@unc.edu]
Sent: Thursday, March 24, 2011 12:28
To: C
Some of the problems in your first cut are:
1. Offsets for regex are given in terms of lines. MARC files don't have
newlines in them, unless you're Millennium, in which case they can be
inserted every 200,000 bytes to keep things interesting.
2. Byte matches match byte values, so "20 byte 4" i
Has anyone figured out the magic necessary for file to recognize MARC
files?
If you don't know it, file is a Unix command that tells you what kind of
file a file is. For example:
$ file 101015_001.mp3
101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer
III, v1, 192
35 matches
Mail list logo