Re: [CODE4LIB] MARC Magic for file

2012-05-24 Thread Ed Summers
On Wed, May 23, 2012 at 6:16 PM, Kyle Banerjee wrote: > I'm not sure whether to laugh or cry that it's a sign of progress that a 40 > year old utility designed to identify file types is now just beginning to > be able to recognize a format that's been around for almost 50 years... Laugh :-) //Ed

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kyle Banerjee
On Wed, May 23, 2012 at 12:14 PM, Ford, Kevin wrote: > I finally had occasion today (read: remembered) to see if the *nix "file" > command would recognize a MARC record file. I haven't tested extensively, > but it did identify the file as MARC21 Bibliographic record. It also > correctly identif

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Simon Spero
The file format magic format magic changed between versions; I think the OSX version was not compatible with more up to date versions (in the original thread, this caused me some confusion). Simon On Wed, May 23, 2012 at 4:34 PM, Ross Singer wrote: > On May 23, 2012, at 4:22 PM, Kevin Ford wrot

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Francis Kayiwa
On Wed, May 23, 2012 at 04:34:47PM -0400, Ross Singer wrote: > On May 23, 2012, at 4:22 PM, Kevin Ford wrote: > > > Don't know what to say. Crawling through the source for "file" at [1], the > > pattern matching code as in place as of Sept 2011. It could be present > > earlier than Sept 2011,

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ross Singer
On May 23, 2012, at 4:22 PM, Kevin Ford wrote: > Don't know what to say. Crawling through the source for "file" at [1], the > pattern matching code as in place as of Sept 2011. It could be present > earlier than Sept 2011, but I stopped hunting for it. The earliest it would > have made its w

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kevin Ford
tification. If requested, the mimetype - application/marc - should also be outputted. Rgds, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ross Singer Sent: Wednesday, May 23, 2012 3:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC Ma

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kevin Ford
Don't know what to say. Crawling through the source for "file" at [1], the pattern matching code as in place as of Sept 2011. It could be present earlier than Sept 2011, but I stopped hunting for it. The earliest it would have made its way into the magic db would have been April 2011. Perh

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread stuart yeates
On 24/05/12 07:14, Ford, Kevin wrote: I finally had occasion today (read: remembered) to see if the *nix "file" command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21 Authori

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Jonathan Rochkind
are not "4500" then "(non-conforming)" should be appended to the identification. If requested, the mimetype - application/marc - should also be outputted. Rgds, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ross Singer S

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ford, Kevin
ODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Ross Singer > Sent: Wednesday, May 23, 2012 3:29 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] MARC Magic for file > > Wow, this is pretty cool. > > Kevin, do you have examples of the output? > > Does it work

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Francis Kayiwa
On Wed, May 23, 2012 at 03:28:56PM -0400, Ross Singer wrote: > Wow, this is pretty cool. > > Kevin, do you have examples of the output? > > Does it work for bulk files? > > I mean, I could just try this on my Ubuntu machine, but it's all the way > downstairs... My OS lists it as `data` $ cd $

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ross Singer
Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I mean, I could just try this on my Ubuntu machine, but it's all the way downstairs... -Ross. On May 23, 2012, at 3:14 PM, Ford, Kevin wrote: > I finally had occasion today (read: remembered) to

[CODE4LIB] MARC Magic for file

2012-05-23 Thread Ford, Kevin
I finally had occasion today (read: remembered) to see if the *nix "file" command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21 Authority Record. I'm running the most recent

Re: [CODE4LIB] MARC magic for file

2011-04-08 Thread Sean Hannan
http://i.imgur.com/6WtA0.png (Sorry, it's Friday. Also, blame dchud for the idea.) -Sean On 4/6/11 4:53 PM, "Mike Taylor" wrote: > On 6 April 2011 19:53, Jonathan Rochkind wrote: >> On 4/6/2011 2:43 PM, William Denton wrote: >>> >>> "Validity" does mean something definite ... but Postel's L

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Mike Taylor
On 6 April 2011 19:53, Jonathan Rochkind wrote: > On 4/6/2011 2:43 PM, William Denton wrote: >> >> "Validity" does mean something definite ... but Postel's Law is a good >> guideline, especially with the swamp of bad MARC, old MARC, alternate >> MARC, that's out there.  Valid MARC is valid MARC, b

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee
> Well, the problem is when the original Marc4J author took the spec at it's > word, and actually _acted upon_ the '4' and the '5', changing file semantics > if they were different, and throwing an exception if it was a non-digit. > At least the author actually used the values rather than checking

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 2:43 PM, William Denton wrote: "Validity" does mean something definite ... but Postel's Law is a good guideline, especially with the swamp of bad MARC, old MARC, alternate MARC, that's out there. Valid MARC is valid MARC, but if---for the sake of file and its magic---we can identify

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton
On 6 April 2011, Jonathan Rochkind wrote: I think we computer programmers are really better-served by reserving the notion of "validity" for things specified by formal specifications -- as we normally do, talking about any other data format. And the only formal specifications I can find for

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 2:02 PM, Kyle Banerjee wrote: I'd go so far as to question the value of validating redundant data that theoretically has meaning but which are never supposed to vary. The 4 and the 5 simply repeat what is already known about the structure of the MARC record. Choking on stuff like this

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee
.. Maybe we have different understandings of "valid". > > If leader bytes 20-23 are not "4500", I suggest that is _by definition_ not > a "valid" Marc21 file. It violates the Marc21 specification. > > Now, they may still be _usable_, by software that ignores these bytes > anyway or works aroun

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open to

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
ess. --tr From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of William Denton [w...@pobox.com] Sent: Wednesday, April 06, 2011 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file On 6 April 2011, Reese, Terry wrote: > Actually --

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton
On 6 April 2011, Reese, Terry wrote: Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fie

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
use the two are not exclusive. --TR > -Original Message- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Wednesday, April 06, 2011 9:59 AM > To: Code for Libraries > Cc: Reese, Terry > Subject: Re: [CODE4LIB] MARC magic for file > > I'm not sure w

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Prettyman, Timothy
Kevin > > -- > Library of Congress > Network Development and MARC Standards Office > > > > > > > From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero > [s...@unc.edu] > Sent: Sunday, Apri

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
*** -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 9:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Can't you have a legal "MARC"

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
to:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Wednesday, April 06, 2011 9:44 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] MARC magic for file > > Can't you have a legal "MARC" file that does NOT have 4500 in those > leader positions? It

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
] On Behalf Of Simon Spero [s...@unc.edu] Sent: Sunday, April 03, 2011 14:01 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file I am pretty sure that the marc4j standard reader ignores them; the tolerant reader definitely does. Otherwise JHU might have about two parseable record

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Ford, Kevin
t and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Sunday, April 03, 2011 14:01 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file I am pretty sure that the

Re: [CODE4LIB] MARC magic for file

2011-04-03 Thread Simon Spero
I am pretty sure that the marc4j standard reader ignores them; the tolerant reader definitely does. Otherwise JHU might have about two parseable records based on the mangled leaders that J-Rock gets stuck with :-) An analysis of the ~7M LC bib records from the scriblio.net data files (~ Dec 2006)

Re: [CODE4LIB] MARC magic for file

2011-04-01 Thread Owen Stephens
"I'm sure any decent MARC tool can deal with them, since decent MARC tools are certainly going to be forgiving enough to deal with four characters that apparently don't even really matter." You say that, but I'm pretty sure Marc4J throws errors MARC records where these characters are incorrect Ow

Re: [CODE4LIB] MARC magic for file

2011-03-31 Thread William Denton
On 28 March 2011, Ford, Kevin wrote: I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I received "line too long" errors. But, since I've been curious about this for sometime, I figured I'd take a whack at it myself. Try this: This is very nice! Thanks. I tried it on

Re: [CODE4LIB] MARC magic for file

2011-03-28 Thread Ford, Kevin
formal inclusion in the magic file. Warmly, Kevin -- Library of Congress Network Development and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Thursday, March 24, 2011 12:28 To: C

Re: [CODE4LIB] MARC magic for file

2011-03-24 Thread Simon Spero
Some of the problems in your first cut are: 1. Offsets for regex are given in terms of lines. MARC files don't have newlines in them, unless you're Millennium, in which case they can be inserted every 200,000 bytes to keep things interesting. 2. Byte matches match byte values, so "20 byte 4" i

[CODE4LIB] MARC magic for file

2011-03-23 Thread William Denton
Has anyone figured out the magic necessary for file to recognize MARC files? If you don't know it, file is a Unix command that tells you what kind of file a file is. For example: $ file 101015_001.mp3 101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 192