Re: [CODE4LIB] [WC-DEVNET-L] WorldCat Terminologies

2010-03-18 Thread Ya'aqov Ziso
Karen, 
At the CODE4LIB Wednesday in Asheville breakout session on API queries
http://wiki.code4lib.org/index.php/2010_Breakout_Sessions#Wednesday
we questioned the level of maintenance for NAF, LCSH (by OCLC Research for
Identities and Terminologies).  I also added a question (below) per your
distributed brochure. What is the status of these questions? Are we to deal
with dirtier data (compared to NAF/LCSH in CONNEXION) for now?
Note: without a WC-DEVNET tracking system, some questions get lost, by
chance or by intent.
Ya¹aqov 




On 3/3/10 2:30 PM, "Ya'aqov Ziso"  wrote:

> Karen Coombs,  Hi,
> 
> ³Terminologies ... all those terminologies databases that you used to have to
> buy, load, and maintain locally ‹ now available remotely for free ... (from
> the blurb OCLC distributed at the CODE4LIB in Asheville, 2/21-25-2010)
> 
> Could you please elaborate, how can Terminologies Services substitute for what
> libraries upkeep and pay for currently, given the other statement on that
> blurb¹s page ³WorldCat Terminologies is still an experiment research service
> with no service assurances². Kind thanks,
> 
> Ya¹aqov 
>> 
>> 
>> 
> 
> ---
> Posted on: WorldCat Developer Network discussion list
> To post:  email to "wc-devne...@oclc.org"
> To subscribe, go to https://www3.oclc.org/app/listserv/
> To unsubscribe, change options, change to digest mode, or view archive, go to:
> http://listserv.oclc.org/scripts/wa.exe?A0=WC-DEVNET-L
> list owners:  Roy Tennant, Don Hamparian
> 
> 


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Joe Hourcle

On Thu, 18 Mar 2010, Jonathan Rochkind wrote:


Joe Hourcle wrote:


The group's two proposals were to model aggregates as works, or as 
manifestatons, so RDA seems to be on their own modeling them as 
expressions:


See, this is what I don't understa.d "As works, or as manifestations"??  In 
the FRBR model, every single manifestation belongs to _some_ Work, does it 
not?  So I don't understand how those can be alternatives. Or was the 
proposal to change this? So some manifestations exist "free floating" 
belonging to no work at all? (By "belonging to" in FRBR terms of art, I mean 
in the FRBR model, every manifestation is "the embodiment of" SOME 
expression, which is "the realization of" SOME Work. Whether that expression 
or work are yet described or not, they're there in the model.  Was the 
proposal really to change this, so some manifestations are  by definition 
"the embodiment of" no expression at all, not even an expression that has yet 
to have an identifier assigned to it? That seems horribly mistaken to me).


There's a many-to-many relationship between Expressions and 
Manifestations in FRBR, so a single Manifestation can encompass multiple 
Expressions (and therefore, multiple Works).


In the Aggregates-as-Manifestations model, something like the 'Complete 
Works of ...' would exist as a new manifestation, but *not* as a new work. 
(and those individual works might never exist as individual 
manifestations)


It's of course much more simple to express some items (such as the 
Canterbury Tales) as a single work (Aggregations-as-Works), and then just 
make an expressions of them, and the corresponding dozens of possible 
manifestations.  I guess it'd be the FRBR equivalent of data 
normalization.  And aggregating at the work levels makes it easier to 
reconcile the cases where different catalogers can't agree if it's a 
single object or multiple objects.


I'm torn -- I think both are valid ways of describing the relationships, 
and different domains are going to try to go the route that makes the most 
sense for them.  (which is likely, which one's the least cost to implement 
while giving them the functionality they want)


-Joe


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Jonathan Rochkind

Joe Hourcle wrote:


The group's two proposals were to model aggregates as works, or as 
manifestatons, so RDA seems to be on their own modeling them as 
expressions:
  


See, this is what I don't understa.d "As works, or as manifestations"??  
In the FRBR model, every single manifestation belongs to _some_ Work, 
does it not?  So I don't understand how those can be alternatives. Or 
was the proposal to change this? So some manifestations exist "free 
floating" belonging to no work at all? (By "belonging to" in FRBR terms 
of art, I mean in the FRBR model, every manifestation is "the embodiment 
of" SOME expression, which is "the realization of" SOME Work. Whether 
that expression or work are yet described or not, they're there in the 
model.  Was the proposal really to change this, so some manifestations 
are  by definition "the embodiment of" no expression at all, not even an 
expression that has yet to have an identifier assigned to it? That seems 
horribly mistaken to me).


I guess I need to find time to read the report.

Jonathan


http://www.ifla.org/en/events/frbr-working-group-on-aggregates

I don't know what happened at the August 2009 meeting, though.  William 
Denton had a breakdown of the August 2008 meeting, which explained 
some of the issues that they were considering:


http://www.frbr.org/2008/08/18/working-group-on-aggregates


-Joe

  


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Joe Hourcle

On Thu, 18 Mar 2010, Jonathan Rochkind wrote:


Karen Coyle wrote:


naturally favors the package over the contents. So we'll have some  works 
that are what users think of as works, and other works that  represent the 
publisher's package -- which sometimes will be something  that makes sense 
to the user, but at other times, as in many music  CDs, is bordering on the 
arbitrary. If we present these all as works  to the user, confusion will 
ensue.


So it's up to our systems to NOT present things that way, right?  If a 
particular Work is just an aggregate which is not that meaningful to the 
user, it shouldn't be presented (at least in initial result sets), the 
meaningful expressions/manifestations should be presented, right?  I'm not 
entirely clear on your example demonstrating that, but I believe you that it 
exists.


I would personally assume so -- you don't want someone searching to see if 
you have a copy of 'Hamlet', and all you have is 'The Collected Works of 
William Shakespeare' and so your system reports that you don't.


Of course, depending on what the user asks for affects what we respond 
back with -- even if we have 27 copies of 'Hamlet', we wouldn't respond 
with 27 records back in response to their request.  It's entirely possible 
(and probable) that systems track objects at a granularity other than 
what's presented back to the user.


If someone's searching for a specific song, so we expect them to know the 
names of every album it's been on?  Yes, our local catalog might only 
track the albums, but if there's some sort of indication that they're 
aggregations, we know that we might need to expand them to be able to 
answer the question.



The way I see it, our architectural job is _first_ to create a data model 
that allows all the neccesary things to be expressed, THEN create systems 
that use those necessary expressed things to create reasonable displays.


I'm still thinking my interpretation (which is not JUST mine, I don't think I 
even invented it) of aggregate modelling is the only sane one I've seen that 
allows us to model what in many use cases we'd be allowed to model, without 
forcing us to model what in many use cases cost-benefit would not justify 
modelling.


It's a *reference* *model* ... it is *not* an implementation.  Everyone's
allowed to model anything you want.


In the RDA relationships (which I've summarized here 
http://kcoyle.net/rda/group1relsby.html) there seem to be two kinds: 
intellectual relationships, and bibliographic relationships. "Is  adapted 
from" is an intellectual relationship; "Contains" is a  bibliographic 
relationship. They're all mixed together as if they are  the same thing.


I think you may very well be right that there some be more clarification in 
the model here. I haven't thought about it enough.


There definitely needs to be more clarification in the model as to how to 
handle aggregates. At one point there was a working group on that, I'm not 
sure what happened to it. Of course, if the working group came up with 
something OTHER than my preferred interpretation, I'd be very unhappy. :)


The group's two proposals were to model aggregates as works, or as 
manifestatons, so RDA seems to be on their own modeling them as 
expressions:


http://www.ifla.org/en/events/frbr-working-group-on-aggregates

I don't know what happened at the August 2009 meeting, though.  William 
Denton had a breakdown of the August 2008 meeting, which explained 
some of the issues that they were considering:


http://www.frbr.org/2008/08/18/working-group-on-aggregates


-Joe


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Jonathan Rochkind

Karen Coyle wrote:


naturally favors the package over the contents. So we'll have some  
works that are what users think of as works, and other works that  
represent the publisher's package -- which sometimes will be something  
that makes sense to the user, but at other times, as in many music  
CDs, is bordering on the arbitrary. If we present these all as works  
to the user, confusion will ensue.
  
So it's up to our systems to NOT present things that way, right?  If a 
particular Work is just an aggregate which is not that meaningful to the 
user, it shouldn't be presented (at least in initial result sets), the 
meaningful expressions/manifestations should be presented, right?  I'm 
not entirely clear on your example demonstrating that, but I believe you 
that it exists.


The way I see it, our architectural job is _first_ to create a data 
model that allows all the neccesary things to be expressed, THEN create 
systems that use those necessary expressed things to create reasonable 
displays.


I'm still thinking my interpretation (which is not JUST mine, I don't 
think I even invented it) of aggregate modelling is the only sane one 
I've seen that allows us to model what in many use cases we'd be allowed 
to model, without forcing us to model what in many use cases 
cost-benefit would not justify modelling.



In the RDA relationships (which I've summarized here  
http://kcoyle.net/rda/group1relsby.html) there seem to be two kinds:  
intellectual relationships, and bibliographic relationships. "Is  
adapted from" is an intellectual relationship; "Contains" is a  
bibliographic relationship. They're all mixed together as if they are  
the same thing.


I think you may very well be right that there some be more clarification 
in the model here. I haven't thought about it enough.


There definitely needs to be more clarification in the model as to how 
to handle aggregates. At one point there was a working group on that, 
I'm not sure what happened to it. Of course, if the working group came 
up with something OTHER than my preferred interpretation, I'd be very 
unhappy. :)


Jonathan


Re: [CODE4LIB] Q: XML2JSON converter [MARC-JSON]

2010-03-18 Thread Jonathan Rochkind
Oh, I wasn't actually suggesting limiting to UTF-8 was the right way to 
go, I was asking your opinion!  It's not at all clear to me, but if your 
opinion is that UTF-8 is indeed the right way to go, that's comforting. :)


Bandwidth _does_ matter I think, it's primarily intended as a 
transmission format, and the reasons _I_ am interested in it as a 
transmission format over MarcXML is in large part precisely because it 
will be so much smaller a package, I'm running into various performance 
problems caused by the very large package size of MarcXML. (Disk space 
might be cheap, but bandwidth, over the network, or to the file system, 
is not neccesarily, for me anyway.)


But I'm not sure I'm concerned about UTF-8 bloating size of response, I 
think it will still be manageable and worth it to avoid confusion. I 
pretty much do _everything_ in UTF-8 myself these days, because it's 
just not worth the headache to me to do anything else. But I have MUCH 
less experience dealing with international character sets than you, 
which is why I was curious as to your opinion.  There's no reason the 
marc-hash-in-json proto-spec couldn't allow any valid JSON character 
encoding, if you/we/someone thinks it's neccesary/more-convenient.


Jonathan

Dan Scott wrote:

I hate Groupwise for forcing me to top-post.

Yes, you are right about everything. Limiting MARC-HASH to just UTF8, rather 
than supporting the full range of encodings allowed by JSON, probably makes it 
easier to generate and parse; it will bloat the size of the format for 
characters outside of the Basic Multilingual Plane but probably nobody cares, 
bandwidth is cheap, right? And this is primarily meant as a transmission format.

I missed the part in the blog entry about the newline-delimited JSON because I was 
specifically looking for a mention of "collections". newline-delimited JSON 
would work, yes, and probably be easier / faster / less memory-intensive to parse.

Dan

  

Jonathan Rochkind  03/18/10 10:41 AM >>>

So do you think the marc-hash-to-json "proto-spec" should suggest that 
the encoding HAS to be UTF-8, or should it leave it open to anything 
that's legal JSON?   (Is there a problem I don't know about with 
expressing "characters outside of the Basic Multilingual Plane" in 
UTF-8?  Any unicode char can be encoded in any of the unicode encodings, 
right?). 

If "collections" means what I think, Bill's blog proto-spec says they 
should be serialized as JSON-seperated-by-newlines, right?  That is, 
JSON for each record, seperated by newlines. Rather than the alternative 
approach you hypothesize there; there are various reasons to prefer 
json-seperated-by-newlines, which is an actual convention used in the 
wild, not something made up just for here.


Jonathan

Dan Scott wrote:
  

Hey Bill:

Do you have unit tests for MARC-HASH / JSON anywhere? If you do, that would 
make it easier for me to create a compliant PHP File_MARC_JSON variant, which 
I'll be happy-ish to create.

The only concerns I have with your write-up are:
  * JSON itself allows UTF8, UTF16, and UTF32 encoding - and we've seen in 
Evergreen some cases where characters outside of the Basic Multilingual Plane 
are required. We eventually wound up resorting to surrogate pairs, in that 
case; so maybe this isn't a real issue.
  * You've mentioned that you would like to see better support for collections 
in File_MARC / File_MARCXML; but I don't see any mention of how collections 
would work in MARC-HASH / JSON. Would it just be something like the following?

"collection": [
  {
"type" : "marc-hash"
"version" : [1, 0]
"leader" : "…leader string … "
"fields" : [array, of, fields]
  },
  {
"type" : "marc-hash"
"version" : [1, 0]
"leader" : "…leader string … "
"fields" : [array, of, fields]
  }
]

Dan

  


Bill Dueber  03/15/10 12:22 PM >>>

  

I'm pretty sure Andrew was (a) completely unaware of anything I'd done, and
(b) looking to match marc-xml as strictly as reasonable.

I also like the array-based rather than hash-based format, but I'm not gonna
go to the mat for it if no one else cares much.

I would like to see ind1 and ind2 get their own fields, though, for easier
use of stuff like jsonpath in json-centric nosql databases.

On Mon, Mar 15, 2010 at 10:52 AM, Jonathan Rochkind wrote:

  


I would just ask why you didn't use Bill Dueber's already existing
proto-spec, instead of making up your own incomptable one.

I'd think we could somehow all do the same consistent thing here.

Since my interest in marc-json is getting as small a package as possible
for transfer accross the wire, I prefer Bill's approach.

http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/


Houghton,Andrew wrote:


  

From: Houghton,Andrew
  


Sent: Saturday, March 06, 2010 06:59 PM
To: Code for Libraries
Subject: RE: [CODE4LIB] Q: XML2JSON converter

Depending on how much time I get next week I'll t

Re: [CODE4LIB] Q: XML2JSON converter [MARC-JSON]

2010-03-18 Thread Dan Scott
I hate Groupwise for forcing me to top-post.

Yes, you are right about everything. Limiting MARC-HASH to just UTF8, rather 
than supporting the full range of encodings allowed by JSON, probably makes it 
easier to generate and parse; it will bloat the size of the format for 
characters outside of the Basic Multilingual Plane but probably nobody cares, 
bandwidth is cheap, right? And this is primarily meant as a transmission format.

I missed the part in the blog entry about the newline-delimited JSON because I 
was specifically looking for a mention of "collections". newline-delimited JSON 
would work, yes, and probably be easier / faster / less memory-intensive to 
parse.

Dan

>>> Jonathan Rochkind  03/18/10 10:41 AM >>>
So do you think the marc-hash-to-json "proto-spec" should suggest that 
the encoding HAS to be UTF-8, or should it leave it open to anything 
that's legal JSON?   (Is there a problem I don't know about with 
expressing "characters outside of the Basic Multilingual Plane" in 
UTF-8?  Any unicode char can be encoded in any of the unicode encodings, 
right?). 

If "collections" means what I think, Bill's blog proto-spec says they 
should be serialized as JSON-seperated-by-newlines, right?  That is, 
JSON for each record, seperated by newlines. Rather than the alternative 
approach you hypothesize there; there are various reasons to prefer 
json-seperated-by-newlines, which is an actual convention used in the 
wild, not something made up just for here.

Jonathan

Dan Scott wrote:
> Hey Bill:
>
> Do you have unit tests for MARC-HASH / JSON anywhere? If you do, that would 
> make it easier for me to create a compliant PHP File_MARC_JSON variant, which 
> I'll be happy-ish to create.
>
> The only concerns I have with your write-up are:
>   * JSON itself allows UTF8, UTF16, and UTF32 encoding - and we've seen in 
> Evergreen some cases where characters outside of the Basic Multilingual Plane 
> are required. We eventually wound up resorting to surrogate pairs, in that 
> case; so maybe this isn't a real issue.
>   * You've mentioned that you would like to see better support for 
> collections in File_MARC / File_MARCXML; but I don't see any mention of how 
> collections would work in MARC-HASH / JSON. Would it just be something like 
> the following?
>
> "collection": [
>   {
> "type" : "marc-hash"
> "version" : [1, 0]
> "leader" : "…leader string … "
> "fields" : [array, of, fields]
>   },
>   {
> "type" : "marc-hash"
> "version" : [1, 0]
> "leader" : "…leader string … "
> "fields" : [array, of, fields]
>   }
> ]
>
> Dan
>
>   
 Bill Dueber  03/15/10 12:22 PM >>>
 
> I'm pretty sure Andrew was (a) completely unaware of anything I'd done, and
> (b) looking to match marc-xml as strictly as reasonable.
>
> I also like the array-based rather than hash-based format, but I'm not gonna
> go to the mat for it if no one else cares much.
>
> I would like to see ind1 and ind2 get their own fields, though, for easier
> use of stuff like jsonpath in json-centric nosql databases.
>
> On Mon, Mar 15, 2010 at 10:52 AM, Jonathan Rochkind wrote:
>
>   
>> I would just ask why you didn't use Bill Dueber's already existing
>> proto-spec, instead of making up your own incomptable one.
>>
>> I'd think we could somehow all do the same consistent thing here.
>>
>> Since my interest in marc-json is getting as small a package as possible
>> for transfer accross the wire, I prefer Bill's approach.
>>
>> http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
>>
>>
>> Houghton,Andrew wrote:
>>
>> 
>>> From: Houghton,Andrew
>>>   
 Sent: Saturday, March 06, 2010 06:59 PM
 To: Code for Libraries
 Subject: RE: [CODE4LIB] Q: XML2JSON converter

 Depending on how much time I get next week I'll talk with the developer
 network folks to see what I need to do to put a specification under
 their infrastructure


 
>>> I finished documenting our existing use of MARC-JSON.  The specification
>>> can be found on the OCLC developer network wiki [1].  Since it is a wiki,
>>> registered developer network members can edit the specification and I would
>>> ask that you refrain from doing so.
>>>
>>> However, please do use the discussion tab to record issues with the
>>> specification or add additional information to existing issues.  There are
>>> already two open issues on the discussion tab and you can use them as a
>>> template for new issues.  The first issue is Bill Dueber's request for some
>>> sort of versioning and the second issue is whether the specification should
>>> specify the flavor of MARC, e.g., marc21, unicode, etc.
>>>
>>> It is recommended that you place issues on the discussion tab since that
>>> will be the official place for documenting and disposing of them.  I do
>>> monitor this listserve and the OCLC developer network listserve, but I only
>>> selectively look at messages on those listserves.  If 

Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Karen Coyle

Quoting Jonathan Rochkind :


Karen Coyle wrote:



I think the confusion is that I believe there are MORE THAN ONE
"wemi" element involved in an agregate.

Collected Works of John Doe  (Work1)
   expressed by:  Collected Works of John Doe (first edition) (Expression1)
   manifested by: Collected Works of John Doe PDF version
(manifestation1)
---> CONTAINS/AGGREGATES
   Does and Deers by John Doe (Work2)
expressed by: Does and Deers by John Doe (only
edition that ever existed) (Expression2)
manifested by: Does and Deers by John Doe [as
included in the Collected Works (Work1)] (manifestation2)
   Badgers by John Doe  (Work3)
 etc


Yes, absolutely. But what I see happening here is so very like what we  
have today with a bib description of the manifestation and then "added  
entries" -- in this case added relationships -- for the individual  
works/expressions. And there's some logic to that view, although it  
naturally favors the package over the contents. So we'll have some  
works that are what users think of as works, and other works that  
represent the publisher's package -- which sometimes will be something  
that makes sense to the user, but at other times, as in many music  
CDs, is bordering on the arbitrary. If we present these all as works  
to the user, confusion will ensue.


In the RDA relationships (which I've summarized here  
http://kcoyle.net/rda/group1relsby.html) there seem to be two kinds:  
intellectual relationships, and bibliographic relationships. "Is  
adapted from" is an intellectual relationship; "Contains" is a  
bibliographic relationship. They're all mixed together as if they are  
the same thing. I think there's a big difference between describing a  
publication and describing an intellectual universe. I would prefer  
for there to be some line (perhaps not a bright line) between those  
functions. Library cataloging is mainly about bibliographic  
description. The intellectual relationships get very little attention  
in that view -- perhaps a note ("Based on") and an  
undifferentiated added entry.


It could be that cataloging *should* limit itself to that  
bibliographic description, and that some other function -- something  
akin to the creation of subject bibliographies -- should be allowed to  
create the intellectual connections between works. Where I think  
library catalogs lose their users is in trying to do a little of the  
latter, but not doing it well, and mixing the two functions in a way  
that is confusing.


kc



--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [CODE4LIB] Q: XML2JSON converter [MARC-JSON]

2010-03-18 Thread Jonathan Rochkind
So do you think the marc-hash-to-json "proto-spec" should suggest that 
the encoding HAS to be UTF-8, or should it leave it open to anything 
that's legal JSON?   (Is there a problem I don't know about with 
expressing "characters outside of the Basic Multilingual Plane" in 
UTF-8?  Any unicode char can be encoded in any of the unicode encodings, 
right?). 

If "collections" means what I think, Bill's blog proto-spec says they 
should be serialized as JSON-seperated-by-newlines, right?  That is, 
JSON for each record, seperated by newlines. Rather than the alternative 
approach you hypothesize there; there are various reasons to prefer 
json-seperated-by-newlines, which is an actual convention used in the 
wild, not something made up just for here.


Jonathan

Dan Scott wrote:

Hey Bill:

Do you have unit tests for MARC-HASH / JSON anywhere? If you do, that would 
make it easier for me to create a compliant PHP File_MARC_JSON variant, which 
I'll be happy-ish to create.

The only concerns I have with your write-up are:
  * JSON itself allows UTF8, UTF16, and UTF32 encoding - and we've seen in 
Evergreen some cases where characters outside of the Basic Multilingual Plane 
are required. We eventually wound up resorting to surrogate pairs, in that 
case; so maybe this isn't a real issue.
  * You've mentioned that you would like to see better support for collections 
in File_MARC / File_MARCXML; but I don't see any mention of how collections 
would work in MARC-HASH / JSON. Would it just be something like the following?

"collection": [
  {
"type" : "marc-hash"
"version" : [1, 0]
"leader" : "…leader string … "
"fields" : [array, of, fields]
  },
  {
"type" : "marc-hash"
"version" : [1, 0]
"leader" : "…leader string … "
"fields" : [array, of, fields]
  }
]

Dan

  

Bill Dueber  03/15/10 12:22 PM >>>


I'm pretty sure Andrew was (a) completely unaware of anything I'd done, and
(b) looking to match marc-xml as strictly as reasonable.

I also like the array-based rather than hash-based format, but I'm not gonna
go to the mat for it if no one else cares much.

I would like to see ind1 and ind2 get their own fields, though, for easier
use of stuff like jsonpath in json-centric nosql databases.

On Mon, Mar 15, 2010 at 10:52 AM, Jonathan Rochkind wrote:

  

I would just ask why you didn't use Bill Dueber's already existing
proto-spec, instead of making up your own incomptable one.

I'd think we could somehow all do the same consistent thing here.

Since my interest in marc-json is getting as small a package as possible
for transfer accross the wire, I prefer Bill's approach.

http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/


Houghton,Andrew wrote:



From: Houghton,Andrew
  

Sent: Saturday, March 06, 2010 06:59 PM
To: Code for Libraries
Subject: RE: [CODE4LIB] Q: XML2JSON converter

Depending on how much time I get next week I'll talk with the developer
network folks to see what I need to do to put a specification under
their infrastructure




I finished documenting our existing use of MARC-JSON.  The specification
can be found on the OCLC developer network wiki [1].  Since it is a wiki,
registered developer network members can edit the specification and I would
ask that you refrain from doing so.

However, please do use the discussion tab to record issues with the
specification or add additional information to existing issues.  There are
already two open issues on the discussion tab and you can use them as a
template for new issues.  The first issue is Bill Dueber's request for some
sort of versioning and the second issue is whether the specification should
specify the flavor of MARC, e.g., marc21, unicode, etc.

It is recommended that you place issues on the discussion tab since that
will be the official place for documenting and disposing of them.  I do
monitor this listserve and the OCLC developer network listserve, but I only
selectively look at messages on those listserves.  If you would like to use
this listserve or the OCLC developer network listserve to discuss the
MARC-JSON specification, make sure you place MARC-JSON in the subject line,
to give me a clue that I *should* look at that message, or directly CC my
e-mail address on your post.

This message marks the beginning of a two week comment period on the
specification which will end on midnight 2010-03-28.

[1] 


Thanks, Andy.


  



  


Re: [CODE4LIB] Variations/FRBR project relases FRBR XML Schemas

2010-03-18 Thread Jonathan Rochkind

Karen Coyle wrote:

Quoting Jonathan Rochkind :


  

So there's no way to "call an aggregate a Work/Expression" _instead of_
a manifestation, if that aggregate is an actual physical item in your
hand.



No, no one said "instead of". What the RDA folks (that is, the folks  
who have created RDA, the JSC members) said (some of them off-list to  
me), is that if your manifestation is an aggregate, then your  
Expression must be an equal aggregate. So the Expression is pretty  
much one-to-one with the Manifestation. (And I think we were all  
seeing a many-to-many.)
  
I think the confusion is that I believe there are MORE THAN ONE   "wemi" 
element involved in an agregate.


Collected Works of John Doe  (Work1)
   expressed by:  Collected Works of John Doe (first edition) (Expression1)
   manifested by: Collected Works of John Doe PDF version  
(manifestation1)

---> CONTAINS/AGGREGATES
   Does and Deers by John Doe (Work2)
expressed by: Does and Deers by John Doe (only 
edition that ever existed) (Expression2)
manifested by: Does and Deers by John Doe [as 
included in the Collected Works (Work1)] (manifestation2)

   Badgers by John Doe  (Work3)
 etc

So, yes, the Work that is _realized by_ Collected Works of John Doe 
(expression) is an aggregate too.   But meanwhile, the aggregate 
_includes_ other Works, that's what makes it an aggregate.


This is not neccesarily "official" FRBR, official FRBR for dealing with 
aggregates is still somewhat in flux, how to deal with aggregates is 
still somewhat unstated.  This above way is, I argue, _not incompatible_ 
with official FRBR, and is, I argue, the _most sane_, _most useful_, way 
to deal with aggregates in FRBR.


Note that any individual node in that tree above _may or may not_ 
actually be modelled/fleshed out in a given system/corpus.  But the 
nodes are there waiting to be fleshed out by someone who needs the 
semantic information expressed.  For instance,  a given system with 
Collected Works of John Doe in it may not bother drawing out the 
relationships to the _other_ Works that are "contained in" it. (Just as 
under current cataloging, you may not create 700s for the analytics).   
If those aren't fleshed out, then the entity labelled manifestation2 
above -- representing Does and Deers manifested in a particular editiong 
of the Collected Works -- may not have been created yet anywhere.  But 
it's waiting to be created if someone wants to analyze and record the 
relationship.  Alternatively, If Does and Deers never was printed 
anywhere else, than a Work record for it doesn't even exist yet -- and 
may never be created, unless someone needs to make assertions about 
"Does and Deers" as an individual entity. 

That is my argument. I'm not saying this is what the FRBR report tells 
you you _have_ to do with aggregates. I'm saying the FRBR report does 
not tell you what to do with aggregates, this treatment is _consistent_ 
with it however, and this treatment is what leads to the most sane, 
useful modelling, easiest to merge assertions from different systems 
into a whole, etc.


Jonathan
  




This is what I was told (off-list):

"the additional
bibliographies or other intellectual or artistic content are viewed
as parts of a new expression - not just new pieces for the
manifestation ... - it's useful to declare expression level changes to
facilitate collocation and make distinctions, but sometimes such
distinctions aren't necessary and we can collocate at the work
level.  Please don't start people getting confused with throwing in
expression level elements at the manifestation level."

So those were my marching orders! (And I don't see how anyone could be  
more confused than I am.) But a reprint of Moby Dick with a new  
preface or bibliography becomes a new expression. In crude MARC terms,  
every time the 245 $c changes, you've got a new expression, unless you  
determine that it's something really insignificant. And I would guess  
that you can link the Expression to one or more Works, as you wish,  
except that the FRBR diagram shows that expressions can only relate to  
one Work. (See, no one could be more confused than I am!)


kc



  

If people on the RDA-L list came to a "consensus" that is
otherwise... I suspect you misunderstood them, but otherwise their
consensus does not match any interpretation of FRBR I have previously
encountered, or any that makes sense to me.

You've got a manifestation whether you like it or not.   The question
is how much "authority work" are you going to do on identifying the
Expression and Work it belongs to.  If you don't do much because it
doesn't make sense for you to do so, maybe it starts out modelled as a
manifestation just belonging to a "dummy" Expression/Work that contains
only that Manifestation. Some other cataloger somewhere else does the
"authority" work to flesh out an Expression and/o

[CODE4LIB] Looking for an Elag 2010 workshop moderator

2010-03-18 Thread Boheemen, Peter van
The program for Elag 2010 is almost completed, but we are still looking
for somebody who would be willing to moderate a workshop on:
'Tweaking search results relevance ranking'
A workshop leader at Elag conferences prepares a starting document that
will help to discuss a topic during the workshop breaks at the
conference.
At the end of the conference each workshop will present their results at
a plenary session. The presenter does not have to be the workshop
leader.
So if you are interested in this topic and you were already coming to
helsinki in June or if this would be a good reason for you to go
there,please let me know asap. The theme is described as follows:

"Google has proved that having the most relevant results on the first
page works. So users expect any discovery tool to perform this way.
Google's pagerank algorithm is based on references between pages. Pages
that get linked to often are supposed to be important.
Traditional relevance ranking methods, using word frequencies etc, do
help but do not work as well. These algorithms are even less effective
in resources that lack full text, but only consists of metadata. On the
other hand, this metadata could be used to improve relevance ranking.
Knowledge about year of publication, loans frequency or impact factors
and lots of other meta data could be used to improve relevance ranking
in library catalogs and other bibliographies. And what about indexing
full text to improve relevance ranking ?"

P.S. If you have any other interesting topic that you would see a
workshop on and are willing to moderate, you may still suggest something
as well.

Peter


Drs. P.J.C. van Boheemen
Hoofd Applicatieontwikkeling en beheer - Bibliotheek Wageningen UR
Head of Application Development and Management - Wageningen University
and Research Library
tel. +31 317 48 25 17
http://library.wur.nl
P Please consider the environment before printing this e-mail