Re: [CODE4LIB] MARC Magic for file

2012-05-24 Thread Ed Summers
On Wed, May 23, 2012 at 6:16 PM, Kyle Banerjee baner...@orbiscascade.org wrote: I'm not sure whether to laugh or cry that it's a sign of progress that a 40 year old utility designed to identify file types is now just beginning to be able to recognize a format that's been around for almost 50

[CODE4LIB] MARC Magic for file

2012-05-23 Thread Ford, Kevin
I finally had occasion today (read: remembered) to see if the *nix file command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21 Authority Record. I'm running the most recent

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ross Singer
Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I mean, I could just try this on my Ubuntu machine, but it's all the way downstairs... -Ross. On May 23, 2012, at 3:14 PM, Ford, Kevin wrote: I finally had occasion today (read: remembered) to

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Francis Kayiwa
On Wed, May 23, 2012 at 03:28:56PM -0400, Ross Singer wrote: Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I mean, I could just try this on my Ubuntu machine, but it's all the way downstairs... My OS lists it as `data` $ cd $ ls dev

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ford, Kevin
To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC Magic for file Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I mean, I could just try this on my Ubuntu machine, but it's all the way downstairs... -Ross. On May 23, 2012, at 3

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Jonathan Rochkind
- should also be outputted. Rgds, Kevin -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ross Singer Sent: Wednesday, May 23, 2012 3:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC Magic for file Wow, this is pretty cool. Kevin, do

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread stuart yeates
On 24/05/12 07:14, Ford, Kevin wrote: I finally had occasion today (read: remembered) to see if the *nix file command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly identified a MARC21

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kevin Ford
Don't know what to say. Crawling through the source for file at [1], the pattern matching code as in place as of Sept 2011. It could be present earlier than Sept 2011, but I stopped hunting for it. The earliest it would have made its way into the magic db would have been April 2011.

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kevin Ford
- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Ross Singer Sent: Wednesday, May 23, 2012 3:29 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC Magic for file Wow, this is pretty cool. Kevin, do you have examples of the output? Does it work for bulk files? I

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Ross Singer
On May 23, 2012, at 4:22 PM, Kevin Ford wrote: Don't know what to say. Crawling through the source for file at [1], the pattern matching code as in place as of Sept 2011. It could be present earlier than Sept 2011, but I stopped hunting for it. The earliest it would have made its way

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Francis Kayiwa
On Wed, May 23, 2012 at 04:34:47PM -0400, Ross Singer wrote: On May 23, 2012, at 4:22 PM, Kevin Ford wrote: Don't know what to say. Crawling through the source for file at [1], the pattern matching code as in place as of Sept 2011. It could be present earlier than Sept 2011, but I

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Simon Spero
The file format magic format magic changed between versions; I think the OSX version was not compatible with more up to date versions (in the original thread, this caused me some confusion). Simon On Wed, May 23, 2012 at 4:34 PM, Ross Singer rossfsin...@gmail.com wrote: On May 23, 2012, at

Re: [CODE4LIB] MARC Magic for file

2012-05-23 Thread Kyle Banerjee
On Wed, May 23, 2012 at 12:14 PM, Ford, Kevin k...@loc.gov wrote: I finally had occasion today (read: remembered) to see if the *nix file command would recognize a MARC record file. I haven't tested extensively, but it did identify the file as MARC21 Bibliographic record. It also correctly

Re: [CODE4LIB] MARC magic for file

2011-04-08 Thread Sean Hannan
http://i.imgur.com/6WtA0.png (Sorry, it's Friday. Also, blame dchud for the idea.) -Sean On 4/6/11 4:53 PM, Mike Taylor m...@indexdata.com wrote: On 6 April 2011 19:53, Jonathan Rochkind rochk...@jhu.edu wrote: On 4/6/2011 2:43 PM, William Denton wrote: Validity does mean something

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 9:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Can't you have a legal MARC file that does NOT have 4500 in those leader positions? It's just not legal Marc21, right? Other marc formats may

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
] On Behalf Of Jonathan Rochkind Sent: Wednesday, April 06, 2011 9:44 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Can't you have a legal MARC file that does NOT have 4500 in those leader positions? It's just not legal Marc21, right? Other marc formats may specify

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Prettyman, Timothy
Network Development and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Sunday, April 03, 2011 14:01 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
are not exclusive. --TR -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, April 06, 2011 9:59 AM To: Code for Libraries Cc: Reese, Terry Subject: Re: [CODE4LIB] MARC magic for file I'm not sure what you mean Terry. Maybe we have different

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton
On 6 April 2011, Reese, Terry wrote: Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Reese, Terry
From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of William Denton [w...@pobox.com] Sent: Wednesday, April 06, 2011 10:29 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file On 6 April 2011, Reese, Terry wrote: Actually -- I'd disagree

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee
.. Maybe we have different understandings of valid. If leader bytes 20-23 are not 4500, I suggest that is _by definition_ not a valid Marc21 file. It violates the Marc21 specification. Now, they may still be _usable_, by software that ignores these bytes anyway or works around them. We

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
Actually -- I'd disagree because that is a very narrow view of the specification. When validating MARC, I'd take the approach to validate structure (which allows you to then read any MARC format) -- then use a separate process for validating content of fields, which in my opinion, is more open

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 2:02 PM, Kyle Banerjee wrote: I'd go so far as to question the value of validating redundant data that theoretically has meaning but which are never supposed to vary. The 4 and the 5 simply repeat what is already known about the structure of the MARC record. Choking on stuff like

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread William Denton
On 6 April 2011, Jonathan Rochkind wrote: I think we computer programmers are really better-served by reserving the notion of validity for things specified by formal specifications -- as we normally do, talking about any other data format. And the only formal specifications I can find for

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Jonathan Rochkind
On 4/6/2011 2:43 PM, William Denton wrote: Validity does mean something definite ... but Postel's Law is a good guideline, especially with the swamp of bad MARC, old MARC, alternate MARC, that's out there. Valid MARC is valid MARC, but if---for the sake of file and its magic---we can identify

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Kyle Banerjee
Well, the problem is when the original Marc4J author took the spec at it's word, and actually _acted upon_ the '4' and the '5', changing file semantics if they were different, and throwing an exception if it was a non-digit. At least the author actually used the values rather than checking to

Re: [CODE4LIB] MARC magic for file

2011-04-06 Thread Mike Taylor
On 6 April 2011 19:53, Jonathan Rochkind rochk...@jhu.edu wrote: On 4/6/2011 2:43 PM, William Denton wrote: Validity does mean something definite ... but Postel's Law is a good guideline, especially with the swamp of bad MARC, old MARC, alternate MARC, that's out there.  Valid MARC is valid

Re: [CODE4LIB] MARC magic for file

2011-04-03 Thread Simon Spero
I am pretty sure that the marc4j standard reader ignores them; the tolerant reader definitely does. Otherwise JHU might have about two parseable records based on the mangled leaders that J-Rock gets stuck with :-) An analysis of the ~7M LC bib records from the scriblio.net data files (~ Dec

Re: [CODE4LIB] MARC magic for file

2011-04-01 Thread Owen Stephens
I'm sure any decent MARC tool can deal with them, since decent MARC tools are certainly going to be forgiving enough to deal with four characters that apparently don't even really matter. You say that, but I'm pretty sure Marc4J throws errors MARC records where these characters are incorrect

Re: [CODE4LIB] MARC magic for file

2011-03-31 Thread William Denton
On 28 March 2011, Ford, Kevin wrote: I couldn't get Simon's MARC 21 Magic file to work. Among other issues, I received line too long errors. But, since I've been curious about this for sometime, I figured I'd take a whack at it myself. Try this: This is very nice! Thanks. I tried it on

Re: [CODE4LIB] MARC magic for file

2011-03-28 Thread Ford, Kevin
Development and MARC Standards Office From: Code for Libraries [CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Simon Spero [s...@unc.edu] Sent: Thursday, March 24, 2011 12:28 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC magic for file Some of the problems

Re: [CODE4LIB] MARC magic for file

2011-03-24 Thread Simon Spero
Some of the problems in your first cut are: 1. Offsets for regex are given in terms of lines. MARC files don't have newlines in them, unless you're Millennium, in which case they can be inserted every 200,000 bytes to keep things interesting. 2. Byte matches match byte values, so 20 byte 4 is

[CODE4LIB] MARC magic for file

2011-03-23 Thread William Denton
Has anyone figured out the magic necessary for file to recognize MARC files? If you don't know it, file is a Unix command that tells you what kind of file a file is. For example: $ file 101015_001.mp3 101015_001.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1,