Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-21 Thread Jonathan Rochkind

On 5/19/2012 10:52 AM, Karen Coyle wrote:


This is what worries me about FRBR and the assumptions that every
bibliographic record will be made up of at least four and probably more
like 6-8 table joins. If every record to be displayed requires a join of
a Manifestation, an Expression, and a Work



FRBR is an ontology, I don't think it makes any demands on how a system 
stores the data.


We frequently 'denormalize' data in indexes/runtime systems, for 
performance. This is an ordinary thing to do.


How the records are stored 'canonically' for cooperative cataloging and 
transfer does not need to be how they are stored in the system at 
'runtime' for performance -- the latter is an implementation detail.


Compare to _current_ practice -- some data is part of an 'authority' 
record, some is part of a 'bibliographic' record.  Does that mean that 
every system needs to do a 'join' on every display? Not neccesarily, it 
depends on how the system prepares the data internally to support it's 
use cases in a performant way. Usually there is some 'denormalization'.


Compare to RDF data model -- if RDF data were really stored internally 
_only_ in a triple store format -- this can in fact be even _more_ of 
a performance problem, effectively requiring many _more_ joins to 
display anything at all. RDF triples are kind of ultra normalized, any 
performance problems due to joins in an rdbms are _even more so_ with 
RDF.   But this isn't a fatal flaw, actual systems will de-normalize 
and cache data as required for adequate performance with their real 
use cases.


This is just how software is constructed.  It's not a reason to, say, 
copy all of the information from an authority record into every bib 
record in their shared/exchanged/canonical representations, causing a 
maintenance nightmare where every time a line is changed in an authority 
record there are a thousand bib records that all need to be updated in 
the central database.


Jonathan

 (because you can't get to the

Work unless you go through the Expression even if you aren't using
anything in it for display), plus an author, I think we'll see some
response time problems.

I know that XC is using a FRBR-ish design. VTLS also has one. Can anyone
comment on the relative efficiency, or how one can mitigate the design
to improve response time? Also, is a triple store more efficient?

kc

On 5/18/12 6:28 PM, Simon Spero wrote:

On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu
mailto:jtom...@uwm.edu wrote:

Simon,

In your model, does the stored information for an individual
author or publisher constitute a record within a table (as would
likely be the case in a typical relational database), or is each
author, publisher, etc. effectively its own table?


Typically you would have a table for each type of entity; you wouldn't
have a table for each instance (that would be a lot of tables :-)

In the examples I gave I actually presented four different models,
representing different ways of using a relational model.

In the first model we had a table where the reference to the right
entry in the names table was included as a column in the table for
bibliographic records.
In this case we have 2 tables

In the second model, we created a separate join table, which had a
reference to an entry in the bibliographic records table, and a
reference to an entry in the names table (this approach can be used
with fields that could have multiple values for the same record, e.g.
added entries).
In this case we have 3 tables.

In the third model, we had a separate table for every property, each
with two columns. One column identified the thing that this was a
property of (for example bib record number 9); the other gave a value
of that property - in a performer table this might be value of
n91064231, or possibly http://lccn.loc.gov/n91064231 ).
In this case we have a separate table for every property, not for
every record. The subject, table name, and value correspond to the
three parts of an RDF triple.

In the fourth model, we store the subject, property name, and value in
a single table. This corresponds to a naive implementation of a triple
store.
In this case we only have a single table.

Does this make things clearer?

Simon



--
Karen Coyle
kco...@kcoyle.net  mailto:kco...@kcoyle.net  http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet



Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-21 Thread Karen Coyle

On 5/21/12 7:28 AM, Jonathan Rochkind wrote:

On 5/19/2012 10:52 AM, Karen Coyle wrote:


This is what worries me about FRBR and the assumptions that every
bibliographic record will be made up of at least four and probably more
like 6-8 table joins. If every record to be displayed requires a join of
a Manifestation, an Expression, and a Work



FRBR is an ontology, I don't think it makes any demands on how a 
system stores the data.


First, FRBR in its IFLA document form is a mental model. FRBRer, as 
encoded in the Open Metadata Registry, is an RDF ontology that has 
*very* strict requirements on how the elements can be used in a linked 
data environment. In other applications, like XC, presumably how you 
instantiate FRBR in your data store, the field is wide open.


But read what I said: I worry about the assumptions that are being 
made. There are actually folks creating systems in which each bib 
description has a separate record for WEMI because that's how they 
interpret FRBR, and the IFLA RDF definition of FRBR actually encodes the 
relationships between the entities that, if followed, many of us think 
lead down the wrong path. I have a page that captures some of the 
discussion on this at:


http://futurelib.pbworks.com/w/page/48221836/FRBR%20Models%20Discussion

Obviously, you can do what you want with FRBR inside your own system, 
but we're talking about massive sharing of data. It's the sharing part 
that matters. The danger is that the library community will form 
standards that are widely followed but that are not a good idea. Or that 
deteriorate over time, like MARC, but we're so stuck to our standards 
that we can't imagine changing. If you actually look at that page and 
read the arguments there, rather than just shoot back an email telling 
me that I don't know what I'm talking about, you might see why some 
folks are concerned.


I think a good working meeting about FRBR and what it means for 
implementations is long overdue. We can prattle on about it, but I think 
it's time to get concrete. For example, I would like to see an 
implementation of the Murray/Tillett model, and compare that perhaps to 
an implementation of Rob Stiles' model (if he's still thinking that 
way). Jakob Voss also has some great ideas. It does make a big 
difference whether we are assuming RDF or some other way of expressing 
the bibliographic data. The Dublin Core community is starting to 
re-address standards for Application Profiles and will (hopefully) 
eventually get to the point of addressing FRBR as it has been modeled in 
various ways in RDF. (A list of those is on the futurelib page.) At the 
moment the AP discussion is taking on some easier issues.


http://wiki.dublincore.org/index.php/DCAM_Revision_Design_Patterns

and in particular

http://wiki.dublincore.org/index.php/DCAM_Revision_High_Level_Example_Publication_Statement

My assumption is that there will be silo'd database implementations 
that export some of the data as RDF. I also suspect that there will be 
something like WorldCat that is used for cataloging, and that the result 
of that will either stay in the library cloud (much like Ex Libris' 
Alma) or will be pulled into local databases for local uses. These are 
different applications, but they will need to play well together if we 
are to link our data to the web. I think we need to model all kinds of 
possibilities -- perhaps as part of the study for the new bibliographic 
framework.


kc


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-21 Thread James Weinheimer
On 21/05/2012 18:06, Karen Coyle wrote:
snip
 Obviously, you can do what you want with FRBR inside your own system,
 but we're talking about massive sharing of data. It's the sharing part
 that matters. The danger is that the library community will form
 standards that are widely followed but that are not a good idea. Or
 that deteriorate over time, like MARC, but we're so stuck to our
 standards that we can't imagine changing. If you actually look at that
 page and read the arguments there, rather than just shoot back an
 email telling me that I don't know what I'm talking about, you might
 see why some folks are concerned.
/snip

Yes, sharing data, and sharing it in the ways as seen in the Linked Data
world, is entering unknown territory. The non-libraries who are already
there, and those who are trying to get there, are not waiting for
libraries to show them the right ways to do it. I don't think they
really care if library metadata is added or not. Therefore, it is up to
libraries to enter *their* world in the best ways possible and not
expect everyone to follow us.

I personally cannot believe the FRBR structures/ontology will be widely
followed, but to expect the (weird) WEMI structure to magically become
compatible with other structures that are only W or E or M or I or
strange amalgamations that change constantly, or are generated
dynamically--such as XSL Transformations and the on-the-fly
transformations such as Google Translate, or when browser plugins are
used--is taking a lot for granted. What I personally believe is that
WEMI is more of a remnant of the print/physical world and has little to
do with most digital information.

Not that most members of the public wanted WEMI anyway.
-- 
*James Weinheimer* weinheimer.ji...@gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html


Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-19 Thread Karen Coyle
The theory of database design and the practice don't always coincide, 
especially for large datasets. When I was working on large databases in 
Oracle the catchword was that joins are costly and the more of them 
that it took to respond to your query (between search and display) the 
worse your response time. Today's computers are bigger and faster so the 
constraints are probably lessened, but I suspect some constraints still 
exist.


This is what worries me about FRBR and the assumptions that every 
bibliographic record will be made up of at least four and probably more 
like 6-8 table joins. If every record to be displayed requires a join of 
a Manifestation, an Expression, and a Work (because you can't get to the 
Work unless you go through the Expression even if you aren't using 
anything in it for display), plus an author, I think we'll see some 
response time problems.


I know that XC is using a FRBR-ish design. VTLS also has one. Can anyone 
comment on the relative efficiency, or how one can mitigate the design 
to improve response time? Also, is a triple store more efficient?


kc

On 5/18/12 6:28 PM, Simon Spero wrote:
On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu 
mailto:jtom...@uwm.edu wrote:


Simon,

In your model, does the stored information for an individual
author or publisher constitute a record within a table (as would
likely be the case in a typical relational database), or is each
author, publisher, etc. effectively its own table?


Typically you would have a table  for each type of entity; you 
wouldn't have a table for each instance (that would be a lot of tables :-)


In the examples I gave I actually presented four different models, 
representing different ways of using a relational model.


In the first model  we had a table where the reference to the right 
entry in the names table was included as a column in the table for 
bibliographic records.

In this case we have 2 tables

In the second model, we created a separate join table, which had a 
reference to an entry in the bibliographic records table,  and a 
reference to an entry in the names table (this approach can be used 
with fields that could have multiple values for the same record, e.g. 
added entries).

 In this case we have 3 tables.

In the third model, we had a separate table for every property, each 
with two columns.  One column identified the thing that this was a 
property of (for example  bib record number 9); the other gave a value 
of that property - in a performer table  this might be value 
of n91064231, or possibly http://lccn.loc.gov/n91064231 ).
In this case we have a separate table for every property, not for 
every record.  The subject, table name, and value correspond to the 
three parts of an RDF triple.


In the fourth model, we store the subject, property name, and value in 
a single table.  This corresponds to a naive implementation of a 
triple store.

In this case we only have a single table.

Does this make things clearer?

Simon



--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet



Re: [RDA-L] RDA, DBMS and RDF

2012-05-18 Thread Joe Tomich
I'm coming into this discussion somewhat late so apologies if the 
following has been covered. As someone who works both with MARC directly 
as a cataloger and MS Access (both in the same ILS--Voyager) I was very 
interested in this discussion and wanted to weigh in. My question:


Is it RDA that is incompatible with relational database principles, or 
does the underlying nature of information that a library must convey 
with respect to its holdings prevent that information from integrating 
fully into a relational database environment?


The building blocks of a relational database are, of course, tables 
containing attributes of a particular entity. A classic, real-world 
example would attributes such as name, address, age, SSN, etc. of 
employees (entities) of a company. The strength of a relational database 
(“RDB” hereafter) is the ability to link those tables and pull together 
different attributes of user-sought combinations of entities, while 
eliminating data redundancy. The purpose in typical scenarios is to gain 
information about the entities themselves. In a library catalog, 
however, the entities are really means to an end, an end which is 
largely pre-defined and often inherently redundant.


Ostensibly, it's tempting to look at the relationship between a 
bibliographic and authority record and, due to the parallels with fully 
realized relational databases, see the potential to move library 
catalogs more in that direction. The problem is the libraries' need to 
store and convey what are, in effect, fixed, unique relationships 
involving redundant entities. While an author's name may be redundant in 
and of itself, its relationship to title, publisher, year, etc. in each 
of the author’s works in the catalog is unique, and each of those unique 
relationships needs to be captured and conveyed. I don’t think they can 
be generated on the fly by linking tables.


In some ways, we've incorporated RDB principles already, such as the use 
of an authority record to store earlier forms of an author's name, which 
eliminates the need to place these in the relevant bib records. While 
there is certainly room for improvement (static linking of the bib and 
authority records of controlled fields as one possible example), I think 
the scope is limited. In essence, we've already identified the minimum 
information necessary to convey, again, those unique relationships 
between entities (author, title, publisher, year, etc.) that constitute 
a work in the library's collection. If we were interested in attributes 
of the individual entities in MARC records (give me books published 
between 1990 and 2005 whose publisher is a publicly traded company) we 
could also make more use of RDB principles (linking via publisher name 
to a publisher table that states whether publishers are public or 
private), but again, in a library catalog, the entities themselves are 
of relatively little interest beyond their role in the creation or 
description of the information being sought and, for that, the pieces of 
information in each (currently MARC) record seem to me both sufficient 
and necessary.


Thus, while I share with my colleagues many of the stated concerns with 
RDA, I think there are some limiting factors to consider when using 
relational database principles as a standard of measure.


Joe Tomich
UW-Milwaukee Libraries


Re: [RDA-L] RDA, DBMS and RDF

2012-05-18 Thread Kathleen Lamantia
Joe,

Thanks for this very pertinent comment.  This is exactly what I have been 
wondering myself.

Kathleen F. Lamantia, MLIS
Technical Services Librarian
Stark County District Library
715 Market Avenue North
Canton, OH 44702
330-458-2723
klaman...@starklibrary.org
Inspiring Ideas ∙ Enriching Lives ∙ Creating Community
 

-Original Message-
From: Joe Tomich [mailto:jtom...@uwm.edu] 
Sent: Friday, May 18, 2012 9:12 AM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF

I'm coming into this discussion somewhat late so apologies if the following has 
been covered. As someone who works both with MARC directly as a cataloger and 
MS Access (both in the same ILS--Voyager) I was very interested in this 
discussion and wanted to weigh in. My question:

Is it RDA that is incompatible with relational database principles, or does the 
underlying nature of information that a library must convey with respect to its 
holdings prevent that information from integrating fully into a relational 
database environment?

The building blocks of a relational database are, of course, tables containing 
attributes of a particular entity. A classic, real-world example would 
attributes such as name, address, age, SSN, etc. of employees (entities) of a 
company. The strength of a relational database (RDB hereafter) is the ability 
to link those tables and pull together different attributes of user-sought 
combinations of entities, while eliminating data redundancy. The purpose in 
typical scenarios is to gain information about the entities themselves. In a 
library catalog, however, the entities are really means to an end, an end which 
is largely pre-defined and often inherently redundant.

Ostensibly, it's tempting to look at the relationship between a bibliographic 
and authority record and, due to the parallels with fully realized relational 
databases, see the potential to move library catalogs more in that direction. 
The problem is the libraries' need to store and convey what are, in effect, 
fixed, unique relationships involving redundant entities. While an author's 
name may be redundant in and of itself, its relationship to title, publisher, 
year, etc. in each of the author's works in the catalog is unique, and each of 
those unique relationships needs to be captured and conveyed. I don't think 
they can be generated on the fly by linking tables.

In some ways, we've incorporated RDB principles already, such as the use of an 
authority record to store earlier forms of an author's name, which eliminates 
the need to place these in the relevant bib records. While there is certainly 
room for improvement (static linking of the bib and authority records of 
controlled fields as one possible example), I think the scope is limited. In 
essence, we've already identified the minimum information necessary to convey, 
again, those unique relationships between entities (author, title, publisher, 
year, etc.) that constitute a work in the library's collection. If we were 
interested in attributes of the individual entities in MARC records (give me 
books published between 1990 and 2005 whose publisher is a publicly traded 
company) we could also make more use of RDB principles (linking via publisher 
name 
to a publisher table that states whether publishers are public or private), but 
again, in a library catalog, the entities themselves are of relatively little 
interest beyond their role in the creation or description of the information 
being sought and, for that, the pieces of information in each (currently MARC) 
record seem to me both sufficient and necessary.

Thus, while I share with my colleagues many of the stated concerns with RDA, I 
think there are some limiting factors to consider when using relational 
database principles as a standard of measure.

Joe Tomich
UW-Milwaukee Libraries


Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-18 Thread Joe M Tomich
Simon,

In your model, does the stored information for an individual author or 
publisher constitute a record within a table (as would likely be the case in a 
typical relational database), or is each author, publisher, etc. effectively 
its own table?

Joe Tomich
UW-Milwaukee Libraries

- Original Message -
From: Simon Spero sesunc...@gmail.com
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Sent: Friday, May 18, 2012 2:33:25 PM
Subject: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with 
Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

[I am top posting to preserve context,; I'm re-adding BIBFRAME to the address 
list because much of this is relevant to Bibliographic Framework issues. Since 
issues of RDA content are directly involved and the question was posed on 
RDA-L, this message is sent to both lists] 


I am taking the central question raised in the original message to be the one 
listed above to be: [1] Are RDA, MARC, and Bibliographic concepts compatible 
with relational database principles or systems, and should this be the 
standard of measure for evaluating RDA qua RDA. 


There are other implied questions raised, including the nature of entities, and 
the whether identifying and separating out some entities is useful for library 
purposes. [2] Publishers are used as an example. 


[3] The question is also raised as to whether and how , given that an author's 
name may be redundant in and of itself, its relationship to title, publisher, 
year, etc. in each of the author’s works in the catalog is unique, and each of 
those unique relationships needs to be captured and conveyed, a full record 
can be generated by linking (or joining) tables. 


We can gain some insight in to the fundamental questions by looking at the 
questions in reverse order. 


[3] It is indeed possible to generate a full record by joining together things 
from different tables. We'll look at a couple of ways that this can be done 
using a simple example involving just two tables - one table containing 
authors, the other containing everything else apart from the author name. 


The first thing to keep in mind is that for something to be an entity is that 
it must have an identity (No Entity without Identity! was Quine's slogan ). 


The criteria used to identify something can be intrinsic . For example, the 
entries in a table of authorized personal names can be uniquely identified by 
the name, dates, relationship to a specified work et. al (the 100 field in the 
MARC authorities). 


Alternatively, we can identify the name entity using an assigned identifier- 
for example, the LCCN for the authority record (010) ; a locally assigned 
identifier (.e.g. the 001); or the URL for the name entity assigned at viaf.org 
. 


If we want to connect the name entity from a bibliographic record, we need to 
establish some sort of connection between them. 


If each bibliographic record can have at most one main entry for personal name, 
we can include an identifier for the author entity directly in the table for 
the bibliographic record. 
If we use the first style of identifying the name record (i.e. the 100 field), 
we end up with a bibliographic record that doesn't look very different from the 
first record. However, any changes to the authorized name must also be applied 
to all of the bibliographic records that refer to that name. 
This problem does not occur if we use the other kind of identifier (for 
example. the LCCN). When we want to fetch the whole record, we link the two 
tables together using the identifier as a key to look up the right entry in the 
names table. 


For example, if the names table has fields for (name-lccn, name-heading), and 
the original bibliographic record table has fields for (bib-lccn, 
name-heading,title, publisher,date), we can create a new table for the 
bibliographic record that has fields (bib-lccn, name-lccn, 
title,publisher,date). 
We regenerate the original record by fetching entries from both tables, 
fetching the name entry whose name-lccn is equal to the name-lccn in the 
bibliographic record entry. 


We don't have to store the name-lccn in the bibliographic record directly. We 
can instead create a third table to store the main entry. This third table must 
carry identifiers for both the name record and the bibliographic record - e.g. 
(bib-lccn, name-lccn). This approach may seem like extra work if the table is 
only used to hold the main entry, but is needed in standard relational 
databases for fields that are repeatable - for example, added entries. 


We should note here that we could create a separate table for every field in 
the original bibliographic record, with one field naming the bibliographic 
record that the value is a property of, and the second holding the either a 
simple value directly, or an identifier for a more complex value. To recreate 
the original record, we fetch all of the values from all of the tables whose

Re: [RDA-L] Are RDA, MARC data, and Bibliographic concepts compatible with Relational database principles or systems? (Was: Re: [RDA-L] RDA, DBMS and RDF)

2012-05-18 Thread Simon Spero
On Fri, May 18, 2012 at 8:03 PM, Joe M Tomich jtom...@uwm.edu wrote:

 Simon,

 In your model, does the stored information for an individual author or
 publisher constitute a record within a table (as would likely be the case
 in a typical relational database), or is each author, publisher, etc.
 effectively its own table?


Typically you would have a table  for each type of entity; you wouldn't
have a table for each instance (that would be a lot of tables :-)

In the examples I gave I actually presented four different models,
representing different ways of using a relational model.

In the first model  we had a table where the reference to the right entry
in the names table was included as a column in the table for bibliographic
records.
In this case we have 2 tables

In the second model, we created a separate join table, which had a
reference to an entry in the bibliographic records table,  and a reference
to an entry in the names table (this approach can be used with fields that
could have multiple values for the same record, e.g. added entries).
 In this case we have 3 tables.

In the third model, we had a separate table for every property, each with
two columns.  One column identified the thing that this was a property of
(for example  bib record number 9); the other gave a value of that property
- in a performer table  this might be value of n91064231, or possibly
http://lccn.loc.gov/n91064231 ).
In this case we have a separate table for every property, not for every
record.  The subject, table name, and value correspond to the three parts
of an RDF triple.

In the fourth model, we store the subject, property name, and value in a
single table.  This corresponds to a naive implementation of a triple store.
In this case we only have a single table.

Does this make things clearer?

Simon


Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

2012-05-17 Thread Karen Coyle
Jonathan, there is nothing wrong with testing out ways to retrieve a 
record with multiple subject headings that share some keywords. It's 
probably the most common case we have. I don't know why you see 
experimentation as wrong. If the RDBMS doesn't give the desired result, 
then we should move on to other technologies.


The question at hand is: do headings give us the desired result using 
the common technology of our library systems? If not, should something 
change about how we do headings, or do we need a different technology?


The underlying question, though, is what do we want headings to 
accomplish in our systems?


I happen to think that implementation details and cataloging practice 
must inform each other.


kc

On 5/16/12 10:17 PM, Jonathan Rochkind wrote:

Certainly you can come up with an infinite number of wrong ways to do it that 
won't get the results you want. With any given technology.  I do not understand 
why you are trying to come up with wrong ways to do this arbitrary goal, you 
seem to be working on refining your software approaches with the goal of 
finding something that won't work. Why would anyone want to do that?

In addition to a nearly infinite number of wrong ways to accomplish this 
particular goal, there are also a few right ways to do it. There are several 
other designs using a rdbms, in addition to the one Simon prototypes,  that 
could also give you the results I think you're describing. Results that it's 
not entirely clear to me any user actually wants, but if they did, we could do 
it. With an rdbms, with something else.  The technology used for your database 
or text index or search engine is an implementation detail.

Good metadata with the semantics needed to answer the questions you might want 
to put it to (without having to make the computer guess probabilistically) 
matters -- if it's there, systems can be created to do what you want. Sure, 
with a rdbms. Or with specialized inverted indexing tools. Or a combination. Or 
something else.

The best tools will depend on exactly what you're wanting to do, as well as the 
scale (in various dimensions), the current availability/cost of various 
options, etc.  These are questions for programmers and software engineers. If 
the right semantics are captured in the data, the tool can be built -- that is 
the question for metadata engineers and catalogers. (To be sure, some 
understanding of algorithms and other aspects of how computers work is 
important to be able to understand what software can get out of any given data 
modelled/represented in any given way).

I don't understand what you're driving at, what the point of this conversation 
is.


From: Resource Description and Access / Resource Description and Access 
[RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net]
Sent: Wednesday, May 16, 2012 8:46 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

Thanks Simon, It's much better to have an actual mock-up than just a 
description.

If I understand this correctly, to do this you do three separate queries. If you had been able to 
use a single query (e.g. if you had an overall keyword index), with UNION ALL would you have been 
able to retain instances where the same keyword appears more than once in the record? In other words, I'm 
wondering if one entry for the weasels came from the title and one from the subject heading. If 
one book had two subject headings,  could you get this result just from a subject heading search? (I'm 
thinking that using a search on different indexes that match the search key rather than a single index is an 
added factor.)

kc

On 5/16/12 4:38 PM, Simon Spero wrote:
On Wed, May 16, 2012 at 5:50 AM, Karen 
Coyleli...@kcoyle.netmailto:li...@kcoyle.net  wrote:
This confirms what I was saying about retrieval. There are some on this list that claim 
that there ARE systems that could do what I asked (the bibliographic record 
will display 3 times in the list of retrievals). I can explain (with a bunch of 
drawings) why each record appears only once. Those who disagree with me should point to 
an example, and then we can analyze the functionality. But I want to see something real.

  You seem to be saying that you  can use drawings  that will show that it is 
not possible to have records show up more than once in a search using DBMS.  
Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask 
a DBMS - in this case I'll go with  PostgresSQL, a mature, open source 
relational database system.

I'll create a simplified database table, with columns for author, title, and 
the primary subject heading. I'll also add an id column, so we can see which 
row is which.  This simplification is for exposition purposes. The database is 
real; only the data has been made up to annoy the French.

Lets look at the content.

# select id,title, author,subject1 from book;
  id

Re: [RDA-L] RDA, DBMS and RDF

2012-05-17 Thread Simon Spero
 to understand what software can get out of any
 given data modelled/represented in any given way).

 I don't understand what you're driving at, what the point of this
 conversation is.

 __**__
 From: Resource Description and Access / Resource Description and Access [
 RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net]
 Sent: Wednesday, May 16, 2012 8:46 PM
 To: RDA-L@LISTSERV.LAC-BAC.GC.CA
 Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

 Thanks Simon, It's much better to have an actual mock-up than just a
 description.

 If I understand this correctly, to do this you do three separate
 queries. If you had been able to use a single query (e.g. if you had an
 overall keyword index), with UNION ALL would you have been able to retain
 instances where the same keyword appears more than once in the record? In
 other words, I'm wondering if one entry for the weasels came from the
 title and one from the subject heading. If one book had two subject
 headings,  could you get this result just from a subject heading search?
 (I'm thinking that using a search on different indexes that match the
 search key rather than a single index is an added factor.)

 kc

 On 5/16/12 4:38 PM, Simon Spero wrote:
 On Wed, May 16, 2012 at 5:50 AM, Karen Coyleli...@kcoyle.netmailto:**
 li...@kcoyle.net li...@kcoyle.net  wrote:
 This confirms what I was saying about retrieval. There are some on this
 list that claim that there ARE systems that could do what I asked (the
 bibliographic record will display 3 times in the list of retrievals).
 I can explain (with a bunch of drawings) why each record appears only once.
 Those who disagree with me should point to an example, and then we can
 analyze the functionality. But I want to see something real.

  You seem to be saying that you  can use drawings  that will show that it
 is not possible to have records show up more than once in a search using
 DBMS.  Despite my name, I prefer to do coding. So, rather than draw this
 out, I'll ask a DBMS - in this case I'll go with  PostgresSQL, a mature,
 open source relational database system.

 I'll create a simplified database table, with columns for author, title,
 and the primary subject heading. I'll also add an id column, so we can see
 which row is which.  This simplification is for exposition purposes. The
 database is real; only the data has been made up to annoy the French.

 Lets look at the content.

 # select id,title, author,subject1 from book;
  id |   title| author
  |   subject1
 +-**---+--**
 ---+--**
   5 | I hate rich people | Hollande, François
  | Politics--Gaffes and gaffers
   2 | A brief history of white flags | Monkey, Cheese
 Eating-Surrender | France-History
   4 | See France on twenty weasels a day | Weasel, Ima
   | France--Guidebooks
   3 | We'll never surrender  | Weasel, Ima
   | France--Fiction

 If we look at the data, we see four entries.  Three of them have the word
 France in the subject field; one also has the word in the title.

 Although PostgresSQL has built in full text indexing, I'm not going to
 use it for  this example; instead I'll just use standard SQL approximate
 matching - the LIKE operator.   When we compare things using LIKE, the
 % character serves as a wild card.   OPAC users may prefer to pronounce
 it '#'.   For example,

 'I hate rich people'  LIKE   '%France%'   is false
 'See France on twenty weasels a day'  LIKE  '%France%'   is true

 Now we're going to try doing a search for 'France' anywhere in any of
 these fields. We'll also sort the results in alphabetical order, based on
 the field in which the word occurs.

 We'll  do this by creating a query that has three parts - one for each
 field we'll be  searching on.  For each part  of the query, we'll include
 the value of the matched field in a column in the result set that we'll
 call sort_key.

 Let's create the three parts of the query.

 First  title:
 select id,title,author,subject1,title as sort_key from book where title
 like '%France%'

 Then subject:
 select id,title,author,subject1,**subject1 as sort_key from book where
 subject1 like '%France%'

 Finally author:
 select id,title,author,subject1,**author as sort_key from book where
 author like '%France%'

 (Notice that in one of these queries, we choose a  different field to be
 the value of  sort_key).

 Right now, we have three different queries- we need some way to combine
 them into a single set of results. Fortunately, we can do this using
 another standard SQL operator - UNION ALL.This command takes the
 results of two queries that return the same columns and turns them in to a
 single list of results.  Using  UNION ALL instead of UNION tells the
 database not to get rid of  any duplicate rows.

 select id,title,author,subject1,title

Re: [RDA-L] RDA, DBMS and RDF

2012-05-16 Thread Brenndorfer, Thomas
For reference, here is a recent authority record with 374 (occupation) using an 
LCSH term:

LDR  cz   22 n  4500
001 541951
005 20120514104731.0
008 800520n| acannaabn  |a aaa
010 ‡an  79100565
035 ‡a(OCoLC)oca00332681
035 ‡a(DLC)n  79100565
035 ‡a(DLCn)703231
035 ‡a11654658
035 ‡a2898
040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC
046 ‡f19020204‡g19740826
100   1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974
370 ‡aDetroit, Mich.‡bHawaii
374 ‡aAir pilots‡2lcsh
400   1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974
670 ‡aVan Every, D. Charles Lindbergh, his life, 1927.
670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding 
America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the 
Atlantic)
670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia 
(Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, 
Hawaii; American aviator, engineer, and Pulitzer Prize winner for 
autobiography, The Spirit of St. Louis; first to make nonstop solo flight 
across Atlantic; baby son kidnapped and murdered, 1932)

Thomas Brenndorfer
Guelph Public Library

From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Sean Chen
Sent: May 16, 2012 10:05 AM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF

I agree values for field of activity and occupation elements should come from a 
controlled vocabulary, if anything to make the job of the person cataloging 
easier. I think I'd follow what Richard Moore says later on in the the thread: 
he emphasizes that a Linked Data approach would require this. Also I think the 
need to move away from the precoordinated Authorized Access Points and think 
about the rest of the elements that make up an authority record is really 
important. Or at least to think of them as separate beasts (which RDA does do, 
depending on your opinion).

With field of activity, it seems to me to be less troublesome since a plural 
doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; 
Statistics/Statistic) and in other situations LCSH has used a singular form; 
based on other guidance (Constitutional law vs Constitutional laws).

Occupations are a bit more difficult with LCSH using plural a lot more; 
especially with headings in the category of classes of people which is where 
I think occupations would draw from.  On top of that the actual term might 
often not line up with representation (Chemistry teacher vs. Professor of 
chemistry). Are there better vocabularies for occupations than LCSH?


--
Sean Chen slc.c...@gmail.commailto:slc.c...@gmail.com





On May 13, 2012, at 11:07 PM, Adam L. Schiff wrote:



The elements that constitute authorized access points have been separated out 
in MARC because of RDA (such as, fuller form of name-- 378; form of work-- 380; 
dates-- 046, and these are encoded in externally referenced standards -- ISO 
8601 and EDTF). Other elements, such as Field of Activity or Occupation can be 
linked to controlled vocabulary terms, such as LCSH headings.

Except that LCSH occupation/profession headings are in the plural, while RDA 
terms would be in the singular.  I'm not at all sure that you could singularize 
an LCSH heading and still code the subfield $2 of the 374 field for LCSH.  What 
do others think about this?


Some ideas for improving RDA that follow from the points raised:

- Separate out Authorized Access Points entirely from the numbered 
instructions. Treat them as a sidebar, and have side-by-side links to the 
instructions for each individual element so one can see all the relevants 
instructions as one is constructing an authorized access point. This will 
further solidify the idea that Authorized Access Points are creatures belonging 
to some catalog implementations, but may not be needed in others.

I'm also beginning to believe that we may need indicators in the MARC fields 
for the elements that would be included in an authorized access point, so that 
a machine could generate them on the fly.  If you have recorded, for example, 
multiple professions/occupations, you might want to designate which one should 
go into the authorized access point.  Or you might record one or more 
professions that would never go into the access point, and you might want to 
tell the system that too.  The same is true for many other elements (e.g. 
associated place) that are sometimes needed in an access point but which might 
be recorded even when not needed to differentiate an entity/access point from 
another.

**
* Adam L. Schiff * * Principal Cataloger*
* University of Washington Libraries *
* Box 352900 *
* Seattle, WA 98195-2900 *
* (206

Re: [RDA-L] RDA, DBMS and RDF

2012-05-16 Thread Karen Coyle
The question of plurals has come up in the discussions of vocabularies 
within JSC, since the vocabularies that are coded in the Open Metadata 
Registry (at http://rdvocab.info). The first thing to remember is that 
the words used are merely display forms; the actual data is an 
identifier (at least for any controlled list). In many cases you need 
singular in some situations and plural in others (1 map, 3 maps). The 
identifier for your vocabulary term in this case does not change; if you 
have give map the identifier http://something.org/23435; in your 
vocabulary list, it is the same in both situations. How to indicate a 
plural v. singular isn't clear yet, but it's an obvious need that many 
communities will have. The thing that we have to remember is that 
different natural languages handle this differently, so there needs to 
be a solution that works for as many language groups as possible. The 
key thing to remember, though, is that we are talking about *display* 
forms, not their underlying meaning when we contemplate singular v. 
plural. In most cases (at least the ones I have so far run into) we 
wouldn't want separate lists for singular and plural, only the option to 
use different displays based on the context.


kc

On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote:


For reference, here is a recent authority record with 374 (occupation) 
using an LCSH term:


LDR  cz   22 n  4500

001 541951

005 20120514104731.0

008 800520n| acannaabn  |a aaa

010 ‡an  79100565

035 ‡a(OCoLC)oca00332681

035 ‡a(DLC)n  79100565

035 ‡a(DLCn)703231

035 ‡a11654658

035 ‡a2898

040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC

046 ‡f19020204‡g19740826

100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974

370 ‡aDetroit, Mich.‡bHawaii

374 ‡aAir pilots‡2lcsh

400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974

670 ‡aVan Every, D. Charles Lindbergh, his life, 1927.

670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding 
America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across 
the Atlantic)


670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia 
(Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 
1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize 
winner for autobiography, The Spirit of St. Louis; first to make 
nonstop solo flight across Atlantic; baby son kidnapped and murdered, 
1932)


Thomas Brenndorfer

Guelph Public Library

*From:*Resource Description and Access / Resource Description and 
Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] *On Behalf Of *Sean Chen

*Sent:* May 16, 2012 10:05 AM
*To:* RDA-L@LISTSERV.LAC-BAC.GC.CA
*Subject:* Re: [RDA-L] RDA, DBMS and RDF

I agree values for field of activity and occupation elements should 
come from a controlled vocabulary, if anything to make the job of the 
person cataloging easier. I think I'd follow what Richard Moore says 
later on in the the thread: he emphasizes that a Linked Data approach 
would require this. Also I think the need to move away from the 
precoordinated Authorized Access Points and think about the rest of 
the elements that make up an authority record is really important. Or 
at least to think of them as separate beasts (which RDA does do, 
depending on your opinion).


With field of activity, it seems to me to be less troublesome since a 
plural doesn't seem to cause too much dissonance in a heading 
(Economics vs. Economic; Statistics/Statistic) and in other situations 
LCSH has used a singular form; based on other guidance (Constitutional 
law vs Constitutional laws).


Occupations are a bit more difficult with LCSH using plural a lot 
more; especially with headings in the category of classes of people 
which is where I think occupations would draw from.  On top of that 
the actual term might often not line up with representation (Chemistry 
teacher vs. Professor of chemistry). Are there better vocabularies for 
occupations than LCSH?



--

Sean Chen slc.c...@gmail.com mailto:slc.c...@gmail.com



On May 13, 2012, at 11:07 PM, Adam L. Schiff wrote:



The elements that constitute authorized access points have been
separated out in MARC because of RDA (such as, fuller form of
name-- 378; form of work-- 380; dates-- 046, and these are encoded
in externally referenced standards -- ISO 8601 and EDTF). Other
elements, such as Field of Activity or Occupation can be linked to
controlled vocabulary terms, such as LCSH headings.


Except that LCSH occupation/profession headings are in the plural, 
while RDA terms would be in the singular.  I'm not at all sure that 
you could singularize an LCSH heading and still code the subfield $2 
of the 374 field for LCSH.  What do others think about this?



Some ideas for improving RDA that follow from the points raised:

- Separate out Authorized Access Points entirely from the numbered
instructions. Treat them as a sidebar, and have side-by-side links
to the instructions for each

Re: [RDA-L] RDA, DBMS and RDF

2012-05-16 Thread Brenndorfer, Thomas
Just curious how these pieces can be implemented, within the current framework 
and any future ones …

There are separately mapped singular and plural vocabulary values in the Open 
Metadata Registry:

map
http://rdvocab.info/termList/extentCarto/1004

maps
http://rdvocab.info/termList/extentCarto/1013


And there is some interesting overlap, where extent terms for notated music are 
in singular and plural:
Extent of notated music
http://metadataregistry.org/concept/list/vocabulary_id/59.html


and parallel terms are in singular only:
Format of notated music
http://metadataregistry.org/concept/list/vocabulary_id/109.html


And it would be worthwhile knowing how these issues can be handled with the 
Linked Data link to the controlled vocabulary in the example authority record:

Air pilots
http://id.loc.gov/authorities/subjects/sh85002673

Thomas Brenndorfer
Guelph Public Library


From: Karen Coyle [mailto:li...@kcoyle.net]
Sent: May 16, 2012 12:06 PM
To: Resource Description and Access / Resource Description and Access
Cc: Brenndorfer, Thomas
Subject: Re: [RDA-L] RDA, DBMS and RDF

The question of plurals has come up in the discussions of vocabularies within 
JSC, since the vocabularies that are coded in the Open Metadata Registry (at 
http://rdvocab.info). The first thing to remember is that the words used are 
merely display forms; the actual data is an identifier (at least for any 
controlled list). In many cases you need singular in some situations and plural 
in others (1 map, 3 maps). The identifier for your vocabulary term in this case 
does not change; if you have give map the identifier 
http://something.org/23435;http://something.org/23435 in your vocabulary 
list, it is the same in both situations. How to indicate a plural v. singular 
isn't clear yet, but it's an obvious need that many communities will have. The 
thing that we have to remember is that different natural languages handle this 
differently, so there needs to be a solution that works for as many language 
groups as possible. The key thing to remember, though, is that we are talking 
about *display* forms, not their underlying meaning when we contemplate 
singular v. plural. In most cases (at least the ones I have so far run into) we 
wouldn't want separate lists for singular and plural, only the option to use 
different displays based on the context.

kc

On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote:
For reference, here is a recent authority record with 374 (occupation) using an 
LCSH term:

LDR  cz   22 n  4500
001 541951
005 20120514104731.0
008 800520n| acannaabn  |a aaa
010 ‡an  79100565
035 ‡a(OCoLC)oca00332681
035 ‡a(DLC)n  79100565
035 ‡a(DLCn)703231
035 ‡a11654658
035 ‡a2898
040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC
046 ‡f19020204‡g19740826
100   1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974
370 ‡aDetroit, Mich.‡bHawaii
374 ‡aAir pilots‡2lcsh
400   1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974
670 ‡aVan Every, D. Charles Lindbergh, his life, 1927.
670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding 
America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across the 
Atlantic)
670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia 
(Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 1974, Maui, 
Hawaii; American aviator, engineer, and Pulitzer Prize winner for 
autobiography, The Spirit of St. Louis; first to make nonstop solo flight 
across Atlantic; baby son kidnapped and murdered, 1932)

Thomas Brenndorfer
Guelph Public Library

From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Sean Chen
Sent: May 16, 2012 10:05 AM
To: RDA-L@LISTSERV.LAC-BAC.GC.CAmailto:RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF

I agree values for field of activity and occupation elements should come from a 
controlled vocabulary, if anything to make the job of the person cataloging 
easier. I think I'd follow what Richard Moore says later on in the the thread: 
he emphasizes that a Linked Data approach would require this. Also I think the 
need to move away from the precoordinated Authorized Access Points and think 
about the rest of the elements that make up an authority record is really 
important. Or at least to think of them as separate beasts (which RDA does do, 
depending on your opinion).

With field of activity, it seems to me to be less troublesome since a plural 
doesn't seem to cause too much dissonance in a heading (Economics vs. Economic; 
Statistics/Statistic) and in other situations LCSH has used a singular form; 
based on other guidance (Constitutional law vs Constitutional laws).

Occupations are a bit more difficult with LCSH using plural a lot more; 
especially with headings in the category of classes of people which

Re: [RDA-L] RDA, DBMS and RDF

2012-05-16 Thread JOHN C ATTIG
The plural forms for RDA terms in the Open Metadata Registry represent an 
earlier state of our deliberations. I believe that all of them still have the 
status New-Proposed. The Joint Steering Committee is considering whether we 
need the plural forms of terms to be given explicitly (with distinct URIs) in 
the Registry. So far the discussion seems to be in the direction that Karen 
describes.

Before the vocabularies in question are published, this issue will be 
resolved, and (if that is the decision) the plural forms will be deleted and 
only the singular forms will be published.

John Attig
ALA Representative to the Joint Steering Committee
jx...@psu.edu

- Original Message -

| From: Thomas Brenndorfer tbrenndor...@library.guelph.on.ca
| To: RDA-L@LISTSERV.LAC-BAC.GC.CA
| Sent: Wednesday, May 16, 2012 1:21:39 PM
| Subject: Re: [RDA-L] RDA, DBMS and RDF

| Just curious how these pieces can be implemented, within the current
| framework and any future ones …

| There are separately mapped singular and plural vocabulary values in
| the Open Metadata Registry:

| map
| http://rdvocab.info/termList/extentCarto/1004

| maps
| http://rdvocab.info/termList/extentCarto/1013

| And there is some interesting overlap, where extent terms for notated
| music are in singular and plural:
| Extent of notated music
| http://metadataregistry.org/concept/list/vocabulary_id/59.html

| and parallel terms are in singular only:
| Format of notated music
| http://metadataregistry.org/concept/list/vocabulary_id/109.html

| And it would be worthwhile knowing how these issues can be handled
| with the Linked Data link to the controlled vocabulary in the
| example authority record:

| Air pilots
| http://id.loc.gov/authorities/subjects/sh85002673

| Thomas Brenndorfer
| Guelph Public Library

| From: Karen Coyle [mailto:li...@kcoyle.net]
| Sent: May 16, 2012 12:06 PM
| To: Resource Description and Access / Resource Description and Access
| Cc: Brenndorfer, Thomas
| Subject: Re: [RDA-L] RDA, DBMS and RDF

| The question of plurals has come up in the discussions of
| vocabularies within JSC, since the vocabularies that are coded in
| the Open Metadata Registry (at http://rdvocab.info ). The first
| thing to remember is that the words used are merely display forms;
| the actual data is an identifier (at least for any controlled
| list). In many cases you need singular in some situations and plural
| in others (1 map, 3 maps). The identifier for your vocabulary term
| in this case does not change; if you have give map the identifier
| http://something.org/23435; in your vocabulary list, it is the same
| in both situations. How to indicate a plural v. singular isn't clear
| yet, but it's an obvious need that many communities will have. The
| thing that we have to remember is that different natural languages
| handle this differently, so there needs to be a solution that works
| for as many language groups as possible. The key thing to remember,
| though, is that we are talking about *display* forms, not their
| underlying meaning when we contemplate singular v. plural. In most
| cases (at least the ones I have so far run into) we wouldn't want
| separate lists for singular and plural, only the option to use
| different displays based on the context.

| kc

| On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote:
| For reference, here is a recent authority record with 374
| (occupation) using an LCSH term:

| LDR cz 22 n 4500
| 001 541951
| 005 20120514104731.0
| 008 800520n| acannaabn |a aaa
| 010 ‡a n 79100565
| 035 ‡a (OCoLC)oca00332681
| 035 ‡a (DLC)n 79100565
| 035 ‡a (DLCn)703231
| 035 ‡a 11654658
| 035 ‡a 2898
| 040 ‡a DLC ‡c DLC ‡d DLC ‡d MoSpS-AV ‡d DLC
| 046 ‡f 19020204 ‡g 19740826
| 100 1 ‡a Lindbergh, Charles A. ‡q (Charles Augustus), ‡d 1902-1974
| 370 ‡a Detroit, Mich. ‡b Hawaii
| 374 ‡a Air pilots ‡2 lcsh
| 400 1 ‡w nna ‡a Lindbergh, Charles Augustus, ‡d 1902-1974
| 670 ‡a Van Every, D. Charles Lindbergh, his life, 1927.
| 670 ‡a The entrepreneurs, an American adventure. Part 3, Expanding
| America [VR] 1991, c1986: ‡b container (Charles Lindbergh; flew
| across the Atlantic)
| 670 ‡a Funk and Wagnalls WWW Home page, Dec. 11, 2000: ‡b
| Encyclopedia (Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit;
| d. Aug. 26, 1974, Maui, Hawaii; American aviator, engineer, and
| Pulitzer Prize winner for autobiography, The Spirit of St. Louis;
| first to make nonstop solo flight across Atlantic; baby son
| kidnapped and murdered, 1932)

| Thomas Brenndorfer
| Guelph Public Library

| From: Resource Description and Access / Resource Description and
| Access [ mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA ] On Behalf Of Sean
| Chen
| Sent: May 16, 2012 10:05 AM
| To: RDA-L@LISTSERV.LAC-BAC.GC.CA
| Subject: Re: [RDA-L] RDA, DBMS and RDF

| I agree values for field of activity and occupation elements should
| come from a controlled vocabulary, if anything to make the job of
| the person cataloging easier. I think I'd follow what Richard

Re: [RDA-L] RDA, DBMS and RDF

2012-05-16 Thread Karen Coyle

On 5/16/12 10:21 AM, Brenndorfer, Thomas wrote:


And it would be worthwhile knowing how these issues can be handled 
with the Linked Data link to the controlled vocabulary in the example 
authority record:


Air pilots

http://id.loc.gov/authorities/subjects/sh85002673



I think the answer is that we don't know yet, but that this is an issue 
that libraries and the semantic web community need to work on together. 
We may be the first community that has extensive examples in this area.


Remember that the semantic web standards that exist today are kind of 
the ground floor standards. There is a lot of work going on to create 
the upper storeys.


I'll check and see if this has been brought up to the W3C yet, and if 
not explore how to get it on their radar.


kc


Thomas Brenndorfer

Guelph Public Library

*From:*Karen Coyle [mailto:li...@kcoyle.net]
*Sent:* May 16, 2012 12:06 PM
*To:* Resource Description and Access / Resource Description and Access
*Cc:* Brenndorfer, Thomas
*Subject:* Re: [RDA-L] RDA, DBMS and RDF

The question of plurals has come up in the discussions of vocabularies 
within JSC, since the vocabularies that are coded in the Open Metadata 
Registry (at http://rdvocab.info). The first thing to remember is that 
the words used are merely display forms; the actual data is an 
identifier (at least for any controlled list). In many cases you need 
singular in some situations and plural in others (1 map, 3 maps). The 
identifier for your vocabulary term in this case does not change; if 
you have give map the identifier http://something.org/23435; 
http://something.org/23435 in your vocabulary list, it is the same 
in both situations. How to indicate a plural v. singular isn't clear 
yet, but it's an obvious need that many communities will have. The 
thing that we have to remember is that different natural languages 
handle this differently, so there needs to be a solution that works 
for as many language groups as possible. The key thing to remember, 
though, is that we are talking about *display* forms, not their 
underlying meaning when we contemplate singular v. plural. In most 
cases (at least the ones I have so far run into) we wouldn't want 
separate lists for singular and plural, only the option to use 
different displays based on the context.


kc

On 5/16/12 7:34 AM, Brenndorfer, Thomas wrote:

For reference, here is a recent authority record with 374 (occupation) 
using an LCSH term:


LDR  cz   22 n  4500

001 541951

005 20120514104731.0

008 800520n| acannaabn  |a aaa

010 ‡an  79100565

035 ‡a(OCoLC)oca00332681

035 ‡a(DLC)n  79100565

035 ‡a(DLCn)703231

035 ‡a11654658

035 ‡a2898

040 ‡aDLC‡cDLC‡dDLC‡dMoSpS-AV‡dDLC

046 ‡f19020204‡g19740826

100 1 ‡aLindbergh, Charles A.‡q(Charles Augustus),‡d1902-1974

370 ‡aDetroit, Mich.‡bHawaii

374 ‡aAir pilots‡2lcsh

400 1 ‡wnna‡aLindbergh, Charles Augustus,‡d1902-1974

670 ‡aVan Every, D. Charles Lindbergh, his life, 1927.

670 ‡aThe entrepreneurs, an American adventure. Part 3, Expanding 
America [VR] 1991, c1986:‡bcontainer (Charles Lindbergh; flew across 
the Atlantic)


670 ‡aFunk and Wagnalls WWW Home page, Dec. 11, 2000:‡bEncyclopedia 
(Charles Augustus Lindbergh; b. Feb. 4, 1902, Detroit; d. Aug. 26, 
1974, Maui, Hawaii; American aviator, engineer, and Pulitzer Prize 
winner for autobiography, The Spirit of St. Louis; first to make 
nonstop solo flight across Atlantic; baby son kidnapped and murdered, 
1932)


Thomas Brenndorfer

Guelph Public Library

*From:*Resource Description and Access / Resource Description and 
Access [mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] *On Behalf Of *Sean Chen

*Sent:* May 16, 2012 10:05 AM
*To:* RDA-L@LISTSERV.LAC-BAC.GC.CA mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA
*Subject:* Re: [RDA-L] RDA, DBMS and RDF

I agree values for field of activity and occupation elements should 
come from a controlled vocabulary, if anything to make the job of the 
person cataloging easier. I think I'd follow what Richard Moore says 
later on in the the thread: he emphasizes that a Linked Data approach 
would require this. Also I think the need to move away from the 
precoordinated Authorized Access Points and think about the rest of 
the elements that make up an authority record is really important. Or 
at least to think of them as separate beasts (which RDA does do, 
depending on your opinion).


With field of activity, it seems to me to be less troublesome since a 
plural doesn't seem to cause too much dissonance in a heading 
(Economics vs. Economic; Statistics/Statistic) and in other situations 
LCSH has used a singular form; based on other guidance (Constitutional 
law vs Constitutional laws).


Occupations are a bit more difficult with LCSH using plural a lot 
more; especially with headings in the category of classes of people 
which is where I think occupations would draw from.  On top of that 
the actual term might often not line up with representation (Chemistry 
teacher vs. Professor of chemistry

Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

2012-05-16 Thread Simon Spero
On Wed, May 16, 2012 at 5:50 AM, Karen Coyle li...@kcoyle.net wrote:

 This confirms what I was saying about retrieval. There are some on this
 list that claim that there ARE systems that could do what I asked (the
 bibliographic record will display 3 times in the list of retrievals).
 I can explain (with a bunch of drawings) why each record appears only once.
 Those who disagree with me should point to an example, and then we can
 analyze the functionality. But I want to see something real.


 You seem to be saying that you  can use drawings  that will show that it
is not possible to have records show up more than once in a search using
DBMS.  Despite my name, I prefer to do coding. So, rather than draw this
out, I'll ask a DBMS - in this case I'll go with  PostgresSQL, a mature,
open source relational database system.

I'll create a simplified database table, with columns for author, title,
and the primary subject heading. I'll also add an id column, so we can see
which row is which.  This simplification is for exposition purposes. The
database is real; only the data has been made up to annoy the French.

Lets look at the content.

*# select id,title, author,subject1 from book;*
 id |   title| author
 |   subject1
++-+--
  5 | I hate rich people | Hollande, François
 | Politics--Gaffes and gaffers
  2 | A brief history of white flags | Monkey, Cheese Eating-Surrender
| France-History
  4 | See France on twenty weasels a day | Weasel, Ima
| France--Guidebooks
  3 | We'll never surrender  | Weasel, Ima
| France--Fiction

If we look at the data, we see four entries.  Three of them have the word
France in the subject field; one also has the word in the title.

Although PostgresSQL has built in full text indexing, I'm not going to use
it for  this example; instead I'll just use standard SQL approximate
matching - the LIKE operator.   When we compare things using LIKE, the
% character serves as a wild card.   OPAC users may prefer to pronounce
it '#'.   For example,

'I hate rich people'  LIKE   '%France%'   is false
'See France on twenty weasels a day'  LIKE  '%France%'   is true


Now we're going to try doing a search for 'France' anywhere in any of these
fields. We'll also sort the results in alphabetical order, based on the
field in which the word occurs.

We'll  do this by creating a query that has three parts - one for each
field we'll be  searching on.  For each part  of the query, we'll include
the value of the matched field in a column in the result set that we'll
call sort_key.

Let's create the three parts of the query.

First  title:

select id,title,author,subject1,title as sort_key from book where title
like '%France%'


Then subject:

select id,title,author,subject1,subject1 as sort_key from book where
subject1 like '%France%'


Finally author:

select id,title,author,subject1,author as sort_key from book where author
like '%France%'


(Notice that in one of these queries, we choose a  different field to be
the value of  sort_key).

Right now, we have three different queries- we need some way to combine
them into a single set of results. Fortunately, we can do this using
another standard SQL operator - UNION ALL.This command takes the
results of two queries that return the same columns and turns them in to a
single list of results.  Using  UNION ALL instead of UNION tells the
database *not* to get rid of  any duplicate rows.

select id,title,author,subject1,title as sort_key from book where title
like '%France%'
UNION ALL

select id,title,author,subject1,subject1 as sort_key from book where
subject1 like '%France%'
UNION ALL
select id,title,author,subject1,author as sort_key from book where author
like '%France%'


Finally, we'll sort the results using the sort_key column we created.  It
seems appropriate.  To do this, we'll add an ORDER BY sort_key clause to
the end of the query.

Let's put it all together and see what happens when we execute the query.

*# select id,title,author,subject1,title as sort_key from book where title
like '%France%' *
*   UNION ALL*
*select id,title,author,subject1,subject1 as sort_key from book where
subject1 like '%France%' *
*   UNION ALL *
*select id,title,author,subject1,author as sort_key from book where author
like '%France%' *
*   ORDER BY sort_key;*

 id |   title| author
 |  subject1  |  sort_key
++-++
  3 | We'll never surrender  | Weasel, Ima
| France--Fiction| France--Fiction
  4 | See France on twenty weasels a day | Weasel, Ima
| France--Guidebooks | France--Guidebooks
  2 | A brief history of white flags | Monkey, Cheese Eating-Surrender
| 

Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

2012-05-16 Thread Karen Coyle
Thanks Simon, It's much better to have an actual mock-up than just a 
description.


If I understand this correctly, to do this you do three separate 
queries. If you had been able to use a single query (e.g. if you had an 
overall keyword index), with UNION ALL would you have been able to 
retain instances where the same keyword appears more than once in the 
record? In other words, I'm wondering if one entry for the weasels 
came from the title and one from the subject heading. If one book had 
two subject headings,  could you get this result just from a subject 
heading search? (I'm thinking that using a search on different indexes 
that match the search key rather than a single index is an added factor.)


kc

On 5/16/12 4:38 PM, Simon Spero wrote:
On Wed, May 16, 2012 at 5:50 AM, Karen Coyle li...@kcoyle.net 
mailto:li...@kcoyle.net wrote:


This confirms what I was saying about retrieval. There are some on
this list that claim that there ARE systems that could do what I
asked (the bibliographic record will display 3 times in the
list of retrievals). I can explain (with a bunch of drawings) why
each record appears only once. Those who disagree with me should
point to an example, and then we can analyze the functionality.
But I want to see something real.


 You seem to be saying that you  can use drawings  that will show that 
it is not possible to have records show up more than once in a search 
using DBMS.  Despite my name, I prefer to do coding. So, rather than 
draw this out, I'll ask a DBMS - in this case I'll go with 
 PostgresSQL, a mature, open source relational database system.


I'll create a simplified database table, with columns for author, 
title, and the primary subject heading. I'll also add an id column, so 
we can see which row is which.  This simplification is for exposition 
purposes. The database is real; only the data has been made up to 
annoy the French.


Lets look at the content.

*# select id,title, author,subject1 from book;*
 id |   title| author 
 |   subject1

++-+--
  5 | I hate rich people | Hollande, François 
 | Politics--Gaffes and gaffers
  2 | A brief history of white flags | Monkey, Cheese 
Eating-Surrender | France-History
  4 | See France on twenty weasels a day | Weasel, Ima 
| France--Guidebooks
  3 | We'll never surrender  | Weasel, Ima 
| France--Fiction


If we look at the data, we see four entries.  Three of them have the 
word France in the subject field; one also has the word in the title.


Although PostgresSQL has built in full text indexing, I'm not going to 
use it for  this example; instead I'll just use standard SQL 
approximate matching - the LIKE operator.   When we compare things 
using LIKE, the % character serves as a wild card.   OPAC users 
may prefer to pronounce it '#'.   For example,


'I hate rich people'  LIKE   '%France%' is false
'See France on twenty weasels a day'  LIKE  '%France%' is true


Now we're going to try doing a search for 'France' anywhere in any of 
these fields. We'll also sort the results in alphabetical order, based 
on the field in which the word occurs.


We'll  do this by creating a query that has three parts - one for each 
field we'll be  searching on.  For each part  of the query, we'll 
include the value of the matched field in a column in the result set 
that we'll call sort_key.


Let's create the three parts of the query.

First  title:

select id,title,author,subject1,title as sort_key from book where
title like '%France%'


Then subject:

select id,title,author,subject1,subject1 as sort_key from book
where subject1 like '%France%'


Finally author:

select id,title,author,subject1,author as sort_key from book where
author like '%France%'


(Notice that in one of these queries, we choose a  different field to 
be the value of  sort_key).


Right now, we have three different queries- we need some way to 
combine them into a single set of results. Fortunately, we can do this 
using another standard SQL operator - UNION ALL.This command 
takes the results of two queries that return the same columns and 
turns them in to a single list of results.  Using  UNION ALL instead 
of UNION tells the database /not/ to get rid of  any duplicate rows.


select id,title,author,subject1,title as sort_key from book where
title like '%France%'
UNION ALL

select id,title,author,subject1,subject1 as sort_key from book
where subject1 like '%France%'
UNION ALL
select id,title,author,subject1,author as sort_key from book where
author like '%France%'


Finally, we'll sort the results using the sort_key column we created. 
 It seems appropriate.  To do this, we'll add an ORDER BY sort_key 

Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

2012-05-16 Thread Jonathan Rochkind
Certainly you can come up with an infinite number of wrong ways to do it that 
won't get the results you want. With any given technology.  I do not understand 
why you are trying to come up with wrong ways to do this arbitrary goal, you 
seem to be working on refining your software approaches with the goal of 
finding something that won't work. Why would anyone want to do that?

In addition to a nearly infinite number of wrong ways to accomplish this 
particular goal, there are also a few right ways to do it. There are several 
other designs using a rdbms, in addition to the one Simon prototypes,  that 
could also give you the results I think you're describing. Results that it's 
not entirely clear to me any user actually wants, but if they did, we could do 
it. With an rdbms, with something else.  The technology used for your database 
or text index or search engine is an implementation detail.

Good metadata with the semantics needed to answer the questions you might want 
to put it to (without having to make the computer guess probabilistically) 
matters -- if it's there, systems can be created to do what you want. Sure, 
with a rdbms. Or with specialized inverted indexing tools. Or a combination. Or 
something else.

The best tools will depend on exactly what you're wanting to do, as well as the 
scale (in various dimensions), the current availability/cost of various 
options, etc.  These are questions for programmers and software engineers. If 
the right semantics are captured in the data, the tool can be built -- that is 
the question for metadata engineers and catalogers. (To be sure, some 
understanding of algorithms and other aspects of how computers work is 
important to be able to understand what software can get out of any given data 
modelled/represented in any given way).

I don't understand what you're driving at, what the point of this conversation 
is.


From: Resource Description and Access / Resource Description and Access 
[RDA-L@LISTSERV.LAC-BAC.GC.CA] on behalf of Karen Coyle [li...@kcoyle.net]
Sent: Wednesday, May 16, 2012 8:46 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF (fwd) (fwd)

Thanks Simon, It's much better to have an actual mock-up than just a 
description.

If I understand this correctly, to do this you do three separate queries. If 
you had been able to use a single query (e.g. if you had an overall keyword 
index), with UNION ALL would you have been able to retain instances where the 
same keyword appears more than once in the record? In other words, I'm 
wondering if one entry for the weasels came from the title and one from the 
subject heading. If one book had two subject headings,  could you get this 
result just from a subject heading search? (I'm thinking that using a search on 
different indexes that match the search key rather than a single index is an 
added factor.)

kc

On 5/16/12 4:38 PM, Simon Spero wrote:
On Wed, May 16, 2012 at 5:50 AM, Karen Coyle 
li...@kcoyle.netmailto:li...@kcoyle.net wrote:
This confirms what I was saying about retrieval. There are some on this list 
that claim that there ARE systems that could do what I asked (the 
bibliographic record will display 3 times in the list of retrievals). I 
can explain (with a bunch of drawings) why each record appears only once. Those 
who disagree with me should point to an example, and then we can analyze the 
functionality. But I want to see something real.

 You seem to be saying that you  can use drawings  that will show that it is 
not possible to have records show up more than once in a search using DBMS.  
Despite my name, I prefer to do coding. So, rather than draw this out, I'll ask 
a DBMS - in this case I'll go with  PostgresSQL, a mature, open source 
relational database system.

I'll create a simplified database table, with columns for author, title, and 
the primary subject heading. I'll also add an id column, so we can see which 
row is which.  This simplification is for exposition purposes. The database is 
real; only the data has been made up to annoy the French.

Lets look at the content.

# select id,title, author,subject1 from book;
 id |   title| author  |
   subject1
++-+--
  5 | I hate rich people | Hollande, François  | 
Politics--Gaffes and gaffers
  2 | A brief history of white flags | Monkey, Cheese Eating-Surrender | 
France-History
  4 | See France on twenty weasels a day | Weasel, Ima | 
France--Guidebooks
  3 | We'll never surrender  | Weasel, Ima | 
France--Fiction

If we look at the data, we see four entries.  Three of them have the word 
France in the subject field; one also has the word in the title.

Although PostgresSQL has built in full text indexing, I'm

Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread Moore, Richard
Adam

Except that LCSH occupation/profession headings are in the plural,
while RDA terms would be in the singular.  I'm not at all sure that you
could singularize an LCSH heading and still code the subfield $2 of the
374 field for LCSH.  What do others think about this?

I think that if we are to use LCSH terms for occupations in 374, we
should use them as they appear in LCSH: that is, in the plural. It's the
only approach that makes sense to me if we are thinking in terms of
linked data. 

This is the advice I've given to our group of cataloguers who are
creating RDA authorities:

LCSH terms for classes of persons are given in the plural. Use LCSH
terms concisely and only include subdivisions when necessary.
Subdivisions should be indicated with a double dash.


_
Richard Moore 
Authority Control Team Manager 
The British Library

Tel.: +44 (0)1937 546806
E-mail: richard.mo...@bl.uk
 
 

**
Experience the British Library online at http://www.bl.uk/
 
The British Library’s new interactive Annual Report and Accounts 2010/11 : 
http://www.bl.uk/annualreport2010-11http://www.bl.uk/knowledge
 
Help the British Library conserve the world's knowledge. Adopt a Book. 
http://www.bl.uk/adoptabook
 
The Library's St Pancras site is WiFi - enabled
 
*
 
The information contained in this e-mail is confidential and may be legally 
privileged. It is intended for the addressee(s) only. If you are not the 
intended recipient, please delete this e-mail and notify the 
mailto:postmas...@bl.uk : The contents of this e-mail must not be disclosed or 
copied without the sender's consent.
 
The statements and opinions expressed in this message are those of the author 
and do not necessarily reflect those of the British Library. The British 
Library does not take any responsibility for the views of the author.
 
*
 Think before you print


Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread James Weinheimer
On 13/05/2012 19:49, Karen Coyle wrote:
snip
 All,

 After struggling for a long time with my frustration with the
 difficulties of dealing with MARC, FRBR and RDA concepts in the
 context of data management, I have done a blog post that explains some
 of my thinking on the topic:

 http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

 The short summary is that RDA is not really suitable for storage and
 use in a relational database system, and therefore is even further
 from being suitable for RDF. I use headings (access points in RDA, I
 believe) as my example, but there are numerous other aspects of RDA
 that belie its intention to support scenario one.

 I have intended to write something much more in depth on this topic
 but as that has been in progress now for a considerable time, I felt
 that a short, albeit incomplete, explanation was needed.

 I welcome all discussion on this topic.
/snip

This is really good. I question whether libraries primarily need a new
relational database model for our catalogs, especially one based on
FRBR. I still have never seen a practical advantage over what can be
done now. The power of the Lucene-type full-text engines and the
searches they allow and their speed are simply stunning, and nothing can
compare to them right now. There are versions such as the Zebra indexing
system in Koha, which was created for bibliographic records and very
similar to Lucene. http://www.indexdata.com/zebra and the guide
http://www.indexdata.com/zebra/doc/zebra.pdf.

A relational database would be far too slow if used in conjunction with
a huge database such as Google. So, some catalogs use the DBMS only for
record maintenance, then everything is indexed in Lucene for searching,
while the displays are made from the XML versions of the records. The
DBMS is there only for storage and maintenance. This is how Koha works
and could be more or less how Worldcat works as well, but these are not
the only catalogs that work like this.

Still, I will say that much of this lies beyond the responsibility of
cataloging per se, and goes into that of systems.

But on the other hand, your point that library headings are not
relational and are actually based on browsing textual strings really
is a responsibility of cataloging. It is also absolutely true and should
be a matter of general debate. The text strings haven't worked in years
because what worked rather clearly in a card catalog did not work
online. I've written about this before, but there was a discussion on
Autocat not too long ago. Here is one of my posts where I discussed the
issue and offered an alternative to the current display of the headings
found under Edgar Allen Poe:
http://blog.jweinheimer.net/2012/04/re-acat-death-of-dictionary-catalog-was.html

I still maintain that we do not really know what the public wants yet.
Everything is in a state of change right now, so it will take a lot of
research, along with trial and error, to find out. I do think that
people would want the traditional power of the catalog, but they will
not use left-anchored text strings. The way it works now is far too
clunky and new methods for the web must be found. Paths such as you
point out would lead to genuine change and possible improvements in how
our catalogs function for the public, which is the major road we need to
take.

-- 
*James Weinheimer* weinheimer.ji...@gmail.com
*First Thus* http://catalogingmatters.blogspot.com/
*Cooperative Cataloging Rules*
http://sites.google.com/site/opencatalogingrules/
*Cataloging Matters Podcasts*
http://blog.jweinheimer.net/p/cataloging-matters-podcasts.html


Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread Tillett, Barbara
The authorized access point part of RDA is one of the carryovers from AACR2, 
which we hope eventually will become unnecessary in a Scenario 1 environment, 
other than as a default display form.

There are several areas of RDA that had to be carried over from AACR2 simply 
because discussions with the relevant communities had not been completed (e.g., 
with the Music community, law, religion, etc. - and those discussions are 
underway).  We also will be renewing conversations with the publishing 
community to revisit the RDA/ONIX framework.  RDA will continue to evolve and 
improve with the help of our international collaborations.

- Barbara Tillett, Chair, Joint Steering Committee for Development of RDA

-Original Message-
From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle
Sent: Sunday, May 13, 2012 1:49 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: [RDA-L] RDA, DBMS and RDF

All,

After struggling for a long time with my frustration with the difficulties of 
dealing with MARC, FRBR and RDA concepts in the context of data management, I 
have done a blog post that explains some of my thinking on the topic:

http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

The short summary is that RDA is not really suitable for storage and use in a 
relational database system, and therefore is even further from being suitable 
for RDF. I use headings (access points in RDA, I
believe) as my example, but there are numerous other aspects of RDA that belie 
its intention to support scenario one.

I have intended to write something much more in depth on this topic but as that 
has been in progress now for a considerable time, I felt that a short, albeit 
incomplete, explanation was needed.

I welcome all discussion on this topic.

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread Kuhagen, Judith
Three possible scenarios are described in Tom Delsey's paper RDA Database 
Implementation Scenarios available on the JSC web site 
(http://www.rda-jsc.org/docs/5editor2rev.pdf).  

Judy Kuhagen, Secretary
Joint Steering Committee for Development of RDA


-Original Message-
From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Tillett, Barbara
Sent: Monday, May 14, 2012 6:44 AM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: Re: [RDA-L] RDA, DBMS and RDF

The authorized access point part of RDA is one of the carryovers from AACR2, 
which we hope eventually will become unnecessary in a Scenario 1 environment, 
other than as a default display form.

There are several areas of RDA that had to be carried over from AACR2 simply 
because discussions with the relevant communities had not been completed (e.g., 
with the Music community, law, religion, etc. - and those discussions are 
underway).  We also will be renewing conversations with the publishing 
community to revisit the RDA/ONIX framework.  RDA will continue to evolve and 
improve with the help of our international collaborations.

- Barbara Tillett, Chair, Joint Steering Committee for Development of RDA

-Original Message-
From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle
Sent: Sunday, May 13, 2012 1:49 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: [RDA-L] RDA, DBMS and RDF

All,

After struggling for a long time with my frustration with the difficulties of 
dealing with MARC, FRBR and RDA concepts in the context of data management, I 
have done a blog post that explains some of my thinking on the topic:

http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

The short summary is that RDA is not really suitable for storage and use in a 
relational database system, and therefore is even further from being suitable 
for RDF. I use headings (access points in RDA, I
believe) as my example, but there are numerous other aspects of RDA that belie 
its intention to support scenario one.

I have intended to write something much more in depth on this topic but as that 
has been in progress now for a considerable time, I felt that a short, albeit 
incomplete, explanation was needed.

I welcome all discussion on this topic.

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread Karen Coyle

On 5/14/12 3:43 AM, Tillett, Barbara wrote:

The authorized access point part of RDA is one of the carryovers from AACR2, which we 
hope eventually will become unnecessary in a Scenario 1 environment, other than as a 
default display form.
Barbara, can you say more about this? Do you have examples? (Or could 
you make some up?) What type of retrieval would be made on RDA records 
compared to how we retrieve on records today? Has anyone mocked up data 
displays? (that aren't in MARC)


It might be that I just haven't found the right site or documentation 
that answers my questions.


kc


There are several areas of RDA that had to be carried over from AACR2 simply 
because discussions with the relevant communities had not been completed (e.g., 
with the Music community, law, religion, etc. - and those discussions are 
underway).  We also will be renewing conversations with the publishing 
community to revisit the RDA/ONIX framework.  RDA will continue to evolve and 
improve with the help of our international collaborations.

- Barbara Tillett, Chair, Joint Steering Committee for Development of RDA

-Original Message-
From: Resource Description and Access / Resource Description and Access 
[mailto:RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle
Sent: Sunday, May 13, 2012 1:49 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: [RDA-L] RDA, DBMS and RDF

All,

After struggling for a long time with my frustration with the difficulties of 
dealing with MARC, FRBR and RDA concepts in the context of data management, I 
have done a blog post that explains some of my thinking on the topic:

http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

The short summary is that RDA is not really suitable for storage and use in a relational 
database system, and therefore is even further from being suitable for RDF. I use 
headings (access points in RDA, I
believe) as my example, but there are numerous other aspects of RDA that belie its 
intention to support scenario one.

I have intended to write something much more in depth on this topic but as that 
has been in progress now for a considerable time, I felt that a short, albeit 
incomplete, explanation was needed.

I welcome all discussion on this topic.

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


Re: [RDA-L] RDA, DBMS and RDF

2012-05-14 Thread Karen Coyle
Mac, I did a search on the subject term France and on the 3d page of 
hits (sorted by title) there were two titles that seemed to be for the 
same item. Instead, they do turn out to be two records because there are 
two volumes.


Here's the case that I'm trying to get to -- let's say you have a record 
with 3 subject headings:


Working class -- France
Working class -- Dwellings -- France
Housing -- France

In a card catalog, these would result in 3 separate cards and therefore 
should you look all through the subject card catalog you would see the 
book in question 3 times.


In a keyword search limited to subject headings, most systems would 
retrieve this record once and display it once. That has to do with how 
the DBMS resolves from indexes to records. So even though a keyword may 
appear more than once in a record, the record is only retrieved once.


In your catalog, which displays the subject headings on a line with the 
author and title

1) will each of these subject headings appear in the display?
2) does that mean that the bibliographic record (represented by the 
author and title) will display 3 times in the list of retrievals?


kc

On 5/14/12 3:02 PM, J. McRee Elrod wrote:

Karen,

Because ebrary (through whom CEL and some other clients distribute
MARC records) can only accommodate one 856$u per record, those clients
must have a monograph record for each volume of a multivolume set, and
each issue of a serial (e.g., yearbooks) having its own PDF URL.

I suspect that is why you saw what appeared to be the same record more
than once.  When an individual volume has a distinctive title, that
title goes in 245$a, and the set or serial title in 490/8XX.  But if
not, we must use 245$n, with the set or serial title in 245$a.

As I keep saying over and over and over, our problems arise from
systems limitations, not ISBD/AACR2/MARC21 limitations.  The building
should have received out attention before the building blocks.

If what you saw was because of a 245 and a 246 being very similar, or
for some other reason, please cite an example and Matt can tell you
how his OPAC handles that.


__   __   J. McRee (Mac) Elrod (m...@slc.bc.ca)
   {__  |   / Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
   ___} |__ \__




 Forwarded message 
Date: Mon, 14 May 2012 10:26:19 -0700
From: Matt Elrodm...@elrod.ca
To: J. McRee Elrodm...@slc.bc.ca
Subject: Re: [RDA-L]  RDA, DBMS and RDF

Mac,

I would need to know which title seems to appear twice in a hit list to
answer this question.  Distinct records might *appear* to be duplicates
for multi-volume sets for example.  Recall that SLC sometimes creates
redundant monograph records to handle sets and serials.

Matt

On 14/05/2012 9:58 AM, J. McRee Elrod wrote:

Karen asked:


Mac, I'd love to see your file design. I did find an example of a record
that appears more than once in a single list, and I am wondering if you
had to replicate the record in the database to accomplish that, or if
you have another way to retrieve a record more than once on a single
keyword retrieval.

I'm copying your question to the designermatt@elrod   who should be able
to answer your question.


http://www.canadianelectroniclibrary.ca/cel-arc.html

 __   __   J. McRee (Mac) Elrod (m...@slc.bc.ca)
{__  |   / Special Libraries Cataloguing   HTTP://www.slc.bc.ca/
___} |__ \__


--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


[RDA-L] RDA, DBMS and RDF

2012-05-13 Thread Karen Coyle

All,

After struggling for a long time with my frustration with the 
difficulties of dealing with MARC, FRBR and RDA concepts in the context 
of data management, I have done a blog post that explains some of my 
thinking on the topic:


http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html

The short summary is that RDA is not really suitable for storage and use 
in a relational database system, and therefore is even further from 
being suitable for RDF. I use headings (access points in RDA, I 
believe) as my example, but there are numerous other aspects of RDA that 
belie its intention to support scenario one.


I have intended to write something much more in depth on this topic but 
as that has been in progress now for a considerable time, I felt that a 
short, albeit incomplete, explanation was needed.


I welcome all discussion on this topic.

kc

--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet


[RDA-L] RDA, DBMS and RDF

2012-05-13 Thread Brenndorfer, Thomas

From: Resource Description and Access / Resource Description and Access 
[RDA-L@LISTSERV.LAC-BAC.GC.CA] On Behalf Of Karen Coyle [li...@kcoyle.net]
Sent: May-13-12 1:49 PM
To: RDA-L@LISTSERV.LAC-BAC.GC.CA
Subject: [RDA-L] RDA, DBMS and RDF

All,

After struggling for a long time with my frustration with the
difficulties of dealing with MARC, FRBR and RDA concepts in the context
of data management, I have done a blog post that explains some of my
thinking on the topic:

http://kcoyle.blogspot.com/2012/05/rda-dbms-rdf.html


The reason that RDA continues to instruct catalogers to create pre-coordinated 
headings is that RDA supports backwards compatibility with existing catalog 
implementations (Scenarios 2 and 3).

Authorized access points are otherwise treated as sidebars in RDA-- they're 
separated in RDA from instructions for the consistuent elements, and recognized 
throughout as only one method of identifying a specific entity (and there 
should be only one unique authorized access point per entity-- it's not a valid 
criticism to say they each point to only one or a very small number of 
records-- but there are some weaknesses, such as undifferentiated persons and 
expression headings for translations that do not go far enough in 
differentiating expression entities).

The elements that constitute authorized access points have been separated out 
in MARC because of RDA (such as, fuller form of name-- 378; form of work-- 380; 
dates-- 046, and these are encoded in externally referenced standards -- ISO 
8601 and EDTF). Other elements, such as Field of Activity or Occupation can be 
linked to controlled vocabulary terms, such as LCSH headings.

An element like Undiffentieated Name Indicator (RDA 8.11) refers only to the 
core elements as being insufficient to differentiate between two or more 
persons with the same name. It does not refer to the concatenated authorized 
access point as being insufficient to differentiate the entity. Generally, 
there is a separation between consideration of the elements and the 
construction of an authorized access point, although there is overlap.

That being said, there is some bleeding of the concept of the authorized 
access point into decisions about individual elements. Example for Preferred 
Title of the Work (RDA 6.2.2.1) -- The preferred title of the work is the 
title or form of title chosen as the basis for the authorized access point 
representing the work.

There is room for improvement for supporting connections between data. But it's 
not always necessary -- one doesn't always need a separate lookup table for an 
element's value (even pre-set drop-down values can be built into a single table 
without referencing an external table, similar to using macros or keyboard 
shortcuts to create quasi-controlled free-text strings).

Some ideas for improving RDA that follow from the points raised:

- Separate out Authorized Access Points entirely from the numbered 
instructions. Treat them as a sidebar, and have side-by-side links to the 
instructions for each individual element so one can see all the relevants 
instructions as one is constructing an authorized access point. This will 
further solidify the idea that Authorized Access Points are creatures belonging 
to some catalog implementations, but may not be needed in others.

- Make better use of the FRAD distinction between the Name and the Actual 
Entity (RDA treats the Name of a Person as an attribute of a Person, whereas it 
would be better to see the Name as being related to the Person entity). The 
reason why this might be useful is that the Names of entities can be linked to 
the sources found (i.e., to specific manifestations, so as to track the 
frequency of usage of a name). This is better than the justify the added 
entry concept which is the old standby for determining frequency of usage-- 
the Preferred Form of Names of entities do change, and this would continue to 
be the case in all implementations, even in those that don't use Authorized 
Access Points!!

- Continue to identify elements that can be linked to lookup tables or 
controlled vocabulary. But do not sacrifice the principle of representation -- 
there is a need to identify how an entity represents itself and transcribe as 
found versus normalizing the data for better machine-processing. Both 
objectives can co-exist (as they do to some extent now, such as with MARC fixed 
fields representing controlled terms to go with variable transcribed text 
fields).

- Allow RDA content instructions to easily merge with specific encoding rules. 
The RDA Element Set has started this with combining RDA instructions with 
related MARC instructions, but there is a need for a streamlined set of 
instructions that can leap from content instructions directly into encoding 
rules for specific applications-- and ideally right down to an ILS's specific 
conventions.

Thomas Brenndorfer
Guelph Public Library