Thanks, Bill. What you say about assumptions is a good part of what is
motivating me to try to instigate a discussion. As you know, both FRBR
and RDA were developed by the cataloging community with no input from
technologists. There are sweeping statements about FRBR being more
efficient than
Anybody have data for the average length of specific MARC fields in some
reasonably representative database? I mainly need 100, 245, 6xx.
Thanks,
kc
--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet
That sounds like a request for Roy to fire up the ole OCLC Hadoop.
-Sean
On 10/16/13 1:06 PM, Karen Coyle li...@kcoyle.net wrote:
Anybody have data for the average length of specific MARC fields in some
reasonably representative database? I mainly need 100, 245, 6xx.
Thanks,
kc
--
Karen
I'm running it against the HathiTrust catalog right now. It'll just take a
while, given that I don't have access to Roy's Hadoop cluster :-)
On Wed, Oct 16, 2013 at 1:38 PM, Sean Hannan shan...@jhu.edu wrote:
That sounds like a request for Roy to fire up the ole OCLC Hadoop.
-Sean
On
I don't even have to fire it up. That's a statistic that we generate
quarterly (albeit via Hadoop). Here you go:
100 - 30.3
245 - 103.1
600 - 41
610 - 48.8
611 - 61.4
630 - 40.8
648 - 23.8
650 - 35.1
651 - 39.6
653 - 33.3
654 - 38.1
655 - 22.5
656 - 30.6
657 - 27.4
658 - 30.7
662 - 41.7
Roy
On
This squares with what I'm seeing. Data for all holdings of the Orbis
Cascade Alliance is:
100: 30.1
245: 114.1
6XX: 36.1
My values include indicators (2 characters) as well as any delimiters but
not the tag number itself. I breaking up 6XX up as Roy has as 6XX's are far
from created equal and
Argh. Must learn to write at third grade level
I wanted to say I like breaking up 6XX as Roy has done because 6XX fields
vary in purpose and tag frequency varies considerably.
On Wed, Oct 16, 2013 at 11:08 AM, Kyle Banerjee kyle.baner...@gmail.comwrote:
This squares with what I'm seeing.
Thanks, Roy (and others!)
It looks like the 245 is including the $c - dang! I should have been
more specific. I'm mainly interested in the title, which is $a $b -- I'm
looking at the gains and losses of bytes should one implement FRBR. As a
hedge, could I ask what've you got for the 240? that
245 not including $c, indicators, or delimiters, |h (which occurs before
|b), |n, |p, with trailing slash preceding |c stripped for about 9 million
records for Orbis Cascade collections is 70.1
kyle
On Wed, Oct 16, 2013 at 12:00 PM, Karen Coyle li...@kcoyle.net wrote:
Thanks, Roy (and
[li...@kcoyle.net]
Sent: Wednesday, October 16, 2013 7:06 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] MARC field lengths
Anybody have data for the average length of specific MARC fields in some
reasonably representative database? I mainly need 100, 245, 6xx.
Thanks,
kc
--
Karen Coyle
BTW, I don't think 240 is a good substitute as the content is very
different than in the regular title. That's where you'll find music, laws,
selections, translations and it's totally littered with subfields. The 70.1
figure from the stripped 245 is probably closer to the mark
IMO, what you stand
On 10/16/13 12:33 PM, Kyle Banerjee wrote:
BTW, I don't think 240 is a good substitute as the content is very
different than in the regular title. That's where you'll find music, laws,
selections, translations and it's totally littered with subfields. The 70.1
figure from the stripped 245 is
For the HathiTrust catalog's 6,046,746 bibs and looking at only the lengths
of the subfields $a and $b in 245s, I get an average length of 62.0
On Wed, Oct 16, 2013 at 3:24 PM, Kyle Banerjee kyle.baner...@gmail.comwrote:
245 not including $c, indicators, or delimiters, |h (which occurs before
Yes, that's my take as well, but I think it's worth quantifying if
possible. There is the usual trade-off between time and space -- and I'd
be interested in hearing whether anyone here thinks that there is any
concern about traversing the WEM structure for each search and display.
Does it
Depends on how many requests the service has to accommodate. Up to a point,
it's no big deal. After a certain point, servicing lots of calls gets
expensive and bang for the buck is brought into question.
My bigger concern would be getting data encoded/structured consistently.
Even though FRBR has
On 10/16/13 4:22 PM, Kyle Banerjee wrote:
In some ways, FRBR strikes me as the catalogers' answer to the miserable
seven layer OSI model which often confuses rather than clarifies -- largely
because it doesn't reflect reality very well.
Agreed. I am having trouble seeing FRBR as being
16 matches
Mail list logo