Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-06 Thread Matthias Steffens
Bruce,

thanks for the detailed explanation and the examples!

On Thu, 6-Apr-2006 09:20 -0400, Bruce D'Arcus wrote:

> Let's imagine a document where only two article citations are added:
> article X with a DOI, and article Y without. Your citation references
> looks like this internally:
> 
>   
>   http://refbase.net/msteffens/doe99";]

Yes, I agree. Also, this would store all required bits together:
database URL, username, citekey. And it would make it unnecessary to
globally specify a preferred database. Instead, this information would
be specific to every reference, which, in turn, would allow you to use
multiple databases or a distributed database system.


> Scenario 1: single author, single database (you)
> 
> 
> You have some config option that says your preferred database. If
> your embedded bibliography list needs to be regenerated, it asks
> RefBase for those records (again).
> 
> There is no difficulty at all because the database is the same.

With your above example, this is only true if the username gets also
transferred when querying the database for record X (which was keyed by
DOI). Otherwise the database won't be able to find your own record.


> Scenario 2: multiple authors, multiple databases
> 
> 
> User A adds the citations, and user B receives that document with the
> embedded data.
[...]
>   - "user:b:smith04" is added to local record as a possible id
> for item X
>   - "info:doi/12345464565"is added to local record as a
> possible id for item Y

This is a crucial bit that wasn't clear to me before. If the
user-specific as well as the generic information is added to the
embedded metadata and IS actually used, when resolving references, both
keying methods can be applied depending on the situation. This is what
I'm asking for.

> By specifying your preferred database(s), and always embedding the
> bib data in the wrapper, you're insured of getting the correct
> records and of having the logic there to resolve against different
> databases.

Yes. But this will only work if the user's username for the preferred
database is stored as well - and if the user's records are updated with
the generic identifiers where missing.


So, in summary, if the user-specific access info is stored for every
reference (and user) in the metadata, such as in:

 

and if the user can ask OOo to prefer his own records whenever
possible, than I'll be happy :-). And you're right that, if these
conditions are met, I don't care how OOo handles it's record keying
internally.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-06 Thread Bruce D'Arcus


On Apr 6, 2006, at 5:49 AM, Matthias Steffens wrote:


It should not be an exclusive choice whether a user uses one mechanism
or the other, BOTH mechansims should be provided at the same time (if
available). This will highly increase the chance that the bibliographic
database will find the best/desired record.


I think you're worrying too much about this.  Let me explain with 
examples.


Let's imagine a document where only two article citations are added: 
article X with a DOI, and article Y without. Your citation references 
looks like this internally:



http://refbase.net/msteffens/doe99";]


The associated bibliographic records are embedded in the file wrapper, 
each identified with those ids.


Two scenarios:

Scenario 1: single author, single database (you)


You have some config option that says your preferred database. If your 
embedded bibliography list needs to be regenerated, it asks RefBase for 
those records (again).


There is no difficulty at all because the database is the same.

Scenario 2: multiple authors, multiple databases


User A adds the citations, and user B receives that document with the 
embedded data.


Let's say user B's database, to make things complicated, contains 
reference X, but did not include the DOI when storing it. Conversely, 
they do have the DOI for reference Y.


In other words, if user B had added the same citations, the pointers 
would look like this:





At this point, it doesn't matter what id schema you use, because you 
have a conflict. You'd have the SAME conflict if both users used 
user-specific labels for ids.  OOoBib, in looking at user B's database, 
cannot find either of these records just based on the id.


However, because the metadata is embedded in the file wrapper, it's 
possible to resolve this fairly easily.


	* lookup the corresponding references based on embedded metadata 
(title, etc.)
	* when found, offer user choice to update local and embedded data; if 
yes:
		- "user:b:smith04" is added to local record as a possible id for item 
X
		- "info:doi/12345464565"is added to local record as a possible id for 
item Y
		- citations are updated so that both use the universal id, you end up 
with:





when user A gets the document back, same normalization process happens, 
so that both databases contain the equivalent identifiers.


The document is now portable. If you send this to a publisher, they can 
extract the citations and make sense of them.


In any case, the business of identifying is mostly orthogonal to proper 
resolution.  Precisely because we embed the bib metadata, the only 
thing that REALLY matters is whether the citation ids match an 
equivalent bib record.


This is why I say, BTW, it makes no sense to use user-based keys by 
default. If you have a DOI or ISBN, use it; you lose nothing. By 
specifying your preferred database(s), and always embedding the bib 
data in the wrapper, you're insured of getting the correct records and 
of having the logic there to resolve against different databases.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-06 Thread Matthias Steffens
On Wed, 5-Apr-2006 21:07 -0400, Bruce D'Arcus wrote:

> On Apr 5, 2006, at 6:06 PM, Matthias Steffens wrote:
> 
> >> I'd also perhaps default to my database and user account, with
> >> options to ping other servers if data is missing.
> >
> > Yes, exactly!
> >
> >> Wouldn't that solve the problems with the best balance of concerns?
> >
> > Yes.
> 
> So do the two "yes" responses suggest I don't need to respond to the
> previous objections?

Maybe I'm misunderstanding things here. The above comments were given
under the impression that multiple keys (database-independent as well
as database-dependent) would be possible and that *both* would be
actively used by OOo.

If only one keying method is possible as you indicated in one of your
previous emails, then my previous objections still stand.

It should not be an exclusive choice whether a user uses one mechanism
or the other, BOTH mechansims should be provided at the same time (if
available). This will highly increase the chance that the bibliographic
database will find the best/desired record.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 6:06 PM, Matthias Steffens wrote:


I'd also perhaps default to my database and user account, with
options to ping other servers if data is missing.


Yes, exactly!


Wouldn't that solve the problems with the best balance of concerns?


Yes.


So do the two "yes" responses suggest I don't need to respond to the 
previous objections?


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matej Cepl
Bruce D'Arcus wrote:
> Matej is just asking for the ability to regenerate the complete list,
> rather than only allow appending and deleting individual items. That's
> probably sensible in general.

Certainly, all used references should be (in the original format-agnostic
shape) included in ODT document for easy transport between different
computers and users.

Matej

-- 
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
http://www.ceplovi.cz/matej/blog/, Jabber: [EMAIL PROTECTED]
23 Marion St. #3, (617) 876-1259, ICQ 132822213
 
Of course I'm respectable. I'm old. Politicians, ugly buildings,
and whores all get respectable if they last long enough.
  --John Huston in "Chinatown."


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread David Wilson
On Thursday 06 April 2006 4:24 am, Matthias Steffens wrote:
> On Wed, 5-Apr-2006 10:27 -0400, Bruce D'Arcus wrote:

>
> Yes, that's a very nice feature. However, when I'm writing a paper, 95%
> of the cited references do already exist in my bibliographic database
> and I want to use these (and not a copy from somewhere else) since I
> know that I've verified my own entries for correctness (multiple
> times). The same cannot be said for any remotely fetched data and I'd
> need to check each entry for correctness. 
>
(If you wonder why I make late entries into some of the discussions - it's 
because I am not up a 2:30 am)

Yes I agree, we can not assume that library catalogues are correct - even the 
sainted US LOC. I was told recently the a common library cataloguing 
practice, and one used my university,  is that when a new book comes in to be 
catalogue, the cataloguer, does a world-wide library search and copies the 
first cataloguing entry found. Now if they all do this all the libraries have 
copies of the very first cataloguing entry produced for that book by X from 
library Y, and X may not be all the skilled at writing them because he or she 
mostly spends their time copying other libraries' efforts.
 
This also partly explains why book on the same topics are not always together 
on the shelves.

Also the libraries I have used often have problems collecting  the books of 
one author under the same author listing. So you have books by 

Smith Fred, S
Smith Fred, S  (1934- )
Smith Fred, S  (1934-1987)

(Which will look poor in your Dissertation, and be even worse if you assumed 
they were different people)

So the point is that collecting internet cataloguing data will not be a magic 
corrector of data. Useful, but it will still need checking by the user.

David
-- 
---
David N. Wilson
Co-Project Lead for the Bibliographic 
OpenOffice Project
http://bibliographic.openoffice.org

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matthias Steffens
On Wed, 5-Apr-2006 15:29 -0400, Bruce D'Arcus wrote:

> >>> Ideally, multiple identifiers would be stored and sent to the
> >>> bibliographic database which could then decide what to do.
> >>
> >> Yes, ideally. But I'm not sure how practical that is (to get
> >> implemented).
> >
> > The database would need to perform additional queries if the first
> > choice doesn't result in a single record being found. I think that
> > this is feasible though.
> 
> I'm talking more at the document format level. The citation proposal
> only allows one key/id.

Does that mean that you can only specify one single identifier? Be it
ISBN, DOI, local record ID or user-specific cite key? You can't specify
multiple identifiers? This would mean that all our discussion is
meaningless, doesn't it?

> Allowing more complicated coding in an already complicated spec would
> no doubt be controversial for the TC, and for implementors. Moreover,
> it would treat citations as a special class of object, which would
> also probably be controversial.

I understand that. But the TC folks should also understand that the
entire bibliographic database will be completely useless, if people
can't link to their own records. This will be a major frustration.
Being modern and ideal is nice, but if it only suits 5% of the crowd,
something is wrong.

> > Sounds reasonable, but maybe it should read:
> >
> >  "For identifying citations, prefer:"
> >
> > "Prefer" would indicate that both identifiers will be used but with
> > different priorities.
> 
> Yes, absolutely. And come to think of it, there should be another 
> config option for preferred sources, with optional user parameter(s).

Yes, personal info such as usernames may be different across the
various databases.

> As far as I can see, the ONLY reason to have a natural language key
> is because one doesn't have a universal identifier. Your concern is
> mostly about *where* you get your records from, not how you identify
> them (you want *your* records because you trust them).

Basically, yes. It's not only the source ("*where*") but I want
specifically my own records ("*your*"). So even within the same source,
I don't want the buggy & incomplete records of my colleague but my own
ones.

> So for me, I'd want a rule that says to use universal ids wherever
> possible, and to fallback to a label I provide where necessary.

I'd say it the other way 'round: prefer my own records wherever
possible but fallback to universal ids where necessary, e.g. if nothing
found or when collaborating with others.

> I'd also perhaps default to my database and user account, with
> options to ping other servers if data is missing.

Yes, exactly!

> Wouldn't that solve the problems with the best balance of concerns?

Yes.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 3:13 PM, Matthias Steffens wrote:


Ideally, multiple identifiers would be stored and sent to the
bibliographic database which could then decide what to do.


Yes, ideally. But I'm not sure how practical that is (to get
implemented).


The database would need to perform additional queries if the first
choice doesn't result in a single record being found. I think that this
is feasible though.


I'm talking more at the document format level. The citation proposal 
only allows one key/id. Allowing more complicated coding in an already 
complicated spec would no doubt be controversial for the TC, and for 
implementors. Moreover, it would treat citations as a special class of 
object, which would also probably be controversial.



Sounds reasonable, but maybe it should read:

 "For identifying citations, prefer:"

"Prefer" would indicate that both identifiers will be used but with
different priorities.


Yes, absolutely. And come to think of it, there should be another 
config option for preferred sources, with optional user parameter(s).


As far as I can see, the ONLY reason to have a natural language key is 
because one doesn't have a universal identifier. Your concern is mostly 
about *where* you get your records from, not how you identify them (you 
want *your* records because you trust them).


E.g. this goes back to my distinction between identification and 
sourcing.


So for me, I'd want a rule that says to use universal ids wherever 
possible, and to fallback to a label I provide where necessary. I'd 
also perhaps default to my database and user account, with options to 
ping other servers if data is missing.


Wouldn't that solve the problems with the best balance of concerns?

Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matthias Steffens
On Wed, 5-Apr-2006 14:45 -0400, Bruce D'Arcus wrote:

> On Apr 5, 2006, at 2:24 PM, Matthias Steffens wrote:
> 
> > However, when I'm writing a paper, 95% of the cited references do 
> > already exist in my bibliographic database and I want to use these 
> > (and not a copy from somewhere else)

> OK, but I take it you're using RefBase; a single database?
> 
> What do you do for Matt, who has different databases, where the same
> reference has different local db numbers and cite keys?

Yes, that must be taken into account. Optimal would be a global source
identifier such as an URI. But since also local/internal databases will
be used, I guess that the easiest method would be if the document
stores ID's about which citations/references originated from which
sources:

 source:xxx
 person:[EMAIL PROTECTED]:smith99

> > Ideally, multiple identifiers would be stored and sent to the 
> > bibliographic database which could then decide what to do.
> 
> Yes, ideally. But I'm not sure how practical that is (to get 
> implemented).

The database would need to perform additional queries if the first
choice doesn't result in a single record being found. I think that this
is feasible though.

> My sense is that we could have rules and configuration options to set 
> these options. Am not exactly sure what they'd be, but it probably 
> wouldn't be too hard to figure out. Maybe:
> 
> For identifying citations, use:
> 
>   universal identifiers (enhances portability)
>   user-specific labels

Sounds reasonable, but maybe it should read:

 "For identifying citations, prefer:"

"Prefer" would indicate that both identifiers will be used but with
different priorities.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 2:24 PM, Matthias Steffens wrote:

However, when I'm writing a paper, 95% of the cited references do 
already exist in my bibliographic database and I want to use these 
(and not a copy from somewhere else) since I know that I've verified 
my own entries for correctness (multiple times). The same cannot be 
said for any remotely fetched data and I'd need to check each entry 
for correctness. This is just an example.


OK, but I take it you're using RefBase; a single database?

What do you do for Matt, who has different databases, where the same 
reference has different local db numbers and cite keys?


FWIW, the way Endnote handles this is that citations include author and 
year, so if it can't find the proper record by id, it uses those to 
present users choices.



My point is here that it really depends on the user's specific needs.


True.


Thus, the solution should be to simply allow for both methods.


OK.

Ideally, multiple identifiers would be stored and sent to the 
bibliographic database which could then decide what to do.


Yes, ideally. But I'm not sure how practical that is (to get 
implemented).


One logic could be: If the database-dependent information (username, 
cite key, local record ID) can be resolved, prefer this method to 
fetch the user's personal entry, otherwise try to fetch the data from 
trusted sources (such as LoC) using the database-independent 
identifiers.


I think I'd separate this out further:

1) how to identify (local vs. universal id)
2) how to locate (generic vs. user-based)

As I said before, one could use an isbn-based uri to grab a record from 
a local db.


My sense is that we could have rules and configuration options to set 
these options. Am not exactly sure what they'd be, but it probably 
wouldn't be too hard to figure out. Maybe:


For identifying citations, use:

universal identifiers (enhances portability)
user-specific labels

Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matthias Steffens
On Wed, 5-Apr-2006 10:27 -0400, Bruce D'Arcus wrote:
> >> If I ping a server and give a list of isbns and dois and an
> >> optional username, problem solved; no?
> >
> > Yes and no, It depends on the quality of the bibliographic data.
[...]
> OK, fair enough. But we need to design the system to enable what I'm
> arguing for as ideal I think.

Yep, and this is a good thing, otherwise there wouldn't be any
progress! ;-) It's important to stay as backwards compatible as
possible, though.

> The fallback can well be that for some users with poor data sources,
> their citation uris are rather dumb uris like:
> 
>   person:[EMAIL PROTECTED]:smith99

Something like this would be good and I'd consider it equally important.

> But we need to start getting with the network here, so that it'll
> even be possible for a user to be reading a book they want to cite,
> add a citation by typing in the isbn in the citation id field, and
> OOoBib will grab that relevant record from the Library of Congress
> server. Likewise for DOIs.

Yes, that's a very nice feature. However, when I'm writing a paper, 95%
of the cited references do already exist in my bibliographic database
and I want to use these (and not a copy from somewhere else) since I
know that I've verified my own entries for correctness (multiple
times). The same cannot be said for any remotely fetched data and I'd
need to check each entry for correctness. This is just an example. My
point is here that it really depends on the user's specific needs.

On Wed, 5-Apr-2006 10:34 -0400, Bruce D'Arcus wrote:

> As I said, I know there are real world difficulties with this
> approach, but consider all the (much greater) problems of the
> alternative: every user has their own unique reference scheme. Two
> collaborate on a document, one citing an article using "xyz" and the
> other the exact same article using "123". Imagine THAT headache!

That's a very good point. Still, I think that this, again, is
completely dependent on the user's individual needs. If you're writing
your thesis, collaboration may be less important. But it may be
absolutely crucial when writing a scientific paper together with your
co-authors.

In summary, I think that both keying methods (database-independent and
database-dependent) have major advantages and disadvantages. Thus, the
solution should be to simply allow for both methods. Ideally, multiple
identifiers would be stored and sent to the bibliographic database
which could then decide what to do. One logic could be: If the
database-dependent information (username, cite key, local record ID)
can be resolved, prefer this method to fetch the user's personal entry,
otherwise try to fetch the data from trusted sources (such as LoC)
using the database-independent identifiers.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 12:28 PM, Matt Price wrote:


in this context doess "standard embedded metatdata" mean "metadata
that follows already existing standards" or "a new OASIS standard for
document metadata"?


It means the former, or perhaps in the case of RDF, it might mean a 
specific "profile" (basically a kind of subset). For example RSS 1.0 is 
a specific profile of RDF; it has a set of rules about exactly how one 
should write out the RDF to make it easier for non-RDF tools to 
process.


There's a caveat to what "standard" means here though. What is likely 
to be standardized is the model and serialization (some kind of RDF 
subset), and some common core properties (Dublin Core, for example). 
The rest would be flexible. So this might be explicitly supported in 
ODF:



  Some Title


... while here the additional element would not be, but it would still 
be valid:



  Some Title
  ST


The model would say both of the child elements of rdf:Description are 
both properties of the resource identified by that uri.


This is the approach that Adobe has taken with their XMP metadata 
system, in fact. And Adobe is represented on the TC and the metadata 
SC, so there's possibility for some collaboration.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matt Price
On Wed, Apr 05, 2006 at 12:18:27PM -0400, Bruce D'Arcus wrote:
> 
> On Apr 5, 2006, at 12:12 PM, Matt Price wrote:
> 
> >sorry, I didn't mean URI's, I meant "the metadata work atthe ODF TC".
> 
> OIC.
> 
> There's nothing yet, but so long as we agree on allowing standard 
> embedded metadata, I believe there's consensus support for defining one 
> or more linking attributes that would associate content (like 
> citations) with that metadata.
> 
> That was uncontroversial when we last talked about it at least.

in this context doess "standard embedded metatdata" mean "metadata
that follows already existing standards" or "a new OASIS standard for
document metadata"?

m

> 
> Bruce
> 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

--
 .''`.   Matt Price 
: :'  :  Debian User
`. `'`   & hemi-geek
  `- 
-- 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 12:12 PM, Matt Price wrote:


sorry, I didn't mean URI's, I meant "the metadata work atthe ODF TC".


Actually, you can see how I frame the approach here:



I think in general people on the TC agree with this, but we'll have to 
see how it goes. We now have the metadata, which first has to get a set 
of use cases approved by the TC, etc.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 12:12 PM, Matt Price wrote:


sorry, I didn't mean URI's, I meant "the metadata work atthe ODF TC".


OIC.

There's nothing yet, but so long as we agree on allowing standard 
embedded metadata, I believe there's consensus support for defining one 
or more linking attributes that would associate content (like 
citations) with that metadata.


That was uncontroversial when we last talked about it at least.

Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matt Price
On Wed, Apr 05, 2006 at 12:07:15PM -0400, Bruce D'Arcus wrote:
> 
> On Apr 5, 2006, at 11:50 AM, Matt Price wrote:
> 
> >should it be a little more extensive here?  so for instnace:  I am
> >extremely disorganized, and in the absence of a satisfactory
> >bibliogrpahic solution have dealt with various bibs in the last few
> >years.  On one paper I use one bib, for another project I may have a
> >wholly different one.  So shouldthe uri be:
> >
> >   person:[EMAIL PROTECTED]:SOME_HASH_HERE:smith99
> 
> I'm not really sure exactly what it should be, but yeah, it'd take some 
> thought.
> 
> >>I should also add that using uris for association is likely what will
> >>be the outcome of the metadata work at the ODF TC. It provides a
> >>standard and general mechanism to link content and metadata.
> >>
> >>How's that?
> >
> >do you guys have some docs on this emerging standard?
> 
> It's not emerging; it's already widely used:
> 
>   
> 
> See how examples like RDF and XLink use uris for linking. One example 
> of the former relevant to this discussion:
> 
>   

sorry, I didn't mean URI's, I meant "the metadata work atthe ODF TC".
standard was perhapsthe wrong word.  

Matt


> 
> Bruce
> 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

--
 .''`.   Matt Price 
: :'  :  Debian User
`. `'`   & hemi-geek
  `- 
-- 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 11:50 AM, Matt Price wrote:


should it be a little more extensive here?  so for instnace:  I am
extremely disorganized, and in the absence of a satisfactory
bibliogrpahic solution have dealt with various bibs in the last few
years.  On one paper I use one bib, for another project I may have a
wholly different one.  So shouldthe uri be:

   person:[EMAIL PROTECTED]:SOME_HASH_HERE:smith99


I'm not really sure exactly what it should be, but yeah, it'd take some 
thought.



I should also add that using uris for association is likely what will
be the outcome of the metadata work at the ODF TC. It provides a
standard and general mechanism to link content and metadata.

How's that?


do you guys have some docs on this emerging standard?


It's not emerging; it's already widely used:



See how examples like RDF and XLink use uris for linking. One example 
of the former relevant to this discussion:




Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matt Price
On Wed, Apr 05, 2006 at 10:27:40AM -0400, Bruce D'Arcus wrote:
> 
> On Apr 5, 2006, at 10:19 AM, Matthias Steffens wrote:
> 
> 
> OK, fair enough. But we need to design the system to enable what I'm 
> arguing for as ideal I think. The fallback can well be that for some 
> users with poor data sources, their citation uris are rather dumb uris 
> like:
> 
>   person:[EMAIL PROTECTED]:smith99

should it be a little more extensive here?  so for instnace:  I am
extremely disorganized, and in the absence of a satisfactory
bibliogrpahic solution have dealt with various bibs in the last few
years.  On one paper I use one bib, for another project I may have a
wholly different one.  So shouldthe uri be:

   person:[EMAIL PROTECTED]:SOME_HASH_HERE:smith99

> 
> But we need to start getting with the network here, so that it'll even 
> be possible for a user to be reading a book they want to cite, add a 
> citation by typing in the isbn in the citation id field, and OOoBib 
> will grab that relevant record from the Library of Congress server. 
> Likewise for DOIs.
> 
> E.g., I would hope that in five years, users NEVER have to create their 
> own (bad) citation data.

do I EVER look forward to that!

> 
> I should also add that using uris for association is likely what will 
> be the outcome of the metadata work at the ODF TC. It provides a 
> standard and general mechanism to link content and metadata.
> 
> How's that?

do you guys have some docs on this emerging standard?

matt

--
 .''`.   Matt Price 
: :'  :  Debian User
`. `'`   & hemi-geek
  `- 
-- 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 10:28 AM, Matt Price wrote:


N. Wiener to J. von Neumann, 16 Jy. 1955, Box III, Wiener Archives,
MIT.
there won't be any standard urn for this kind of reference, will
there?

So I agree with Matthias that a secondary key mechanism is important.


I sometimes use archival data, so am aware of this problem. Like I 
said, there would be a fallback uri mechanism to cover this sort of 
thing.


As I said, I know there are real world difficulties with this approach, 
but consider all the (much greater) problems of the alternative: every 
user has their own unique reference scheme. Two collaborate on a 
document, one citing an article using "xyz" and the other the exact 
same article using "123". Imagine THAT headache!


In any case, whatever we decide will be part of the more general 
metadata-in-ODF discussion.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matt Price
On Wed, Apr 05, 2006 at 03:58:15PM +0200, Matthias Steffens wrote:
> On Wed, 5-Apr-2006 09:13 -0400, Bruce D'arcus Wrote:
> 
> > Bingo, Which Is Why We Need To Find A Way To Use Standardized Uris
> > Where Possible To Define Citation Keys. In Some Cases (Dois, Isbns,
> > Etc.) This Is Easier Than Others, But The Pay-Off Will Be Large. If
> > My Citation Key In The Document Is Identified With
> > "Urn:Isbn:32343545" It's Trivial To Grab The Associate Bib Record
> > From Many Sources.  If It's Some User-Specific Key ("Smith99") Or
> > Local Database Number, Then Stuff Breaks.
> 
> Agreed. But how would you handle entries which have no ISBN, DOI or
> other unique identifier? I guess that OpenURLs could be used for older
> journal articles? But what about items that cannot be identified by
> means of the above, how would you handle these?

...or what about the fact that I may not have this kind of information
available to me when I enter the citation?  SO for instnace, a common
case for a historian:
E. A. Poe, "The Raven" (1868) cited in Smith 1977, p.34. 

I don't have further information about the original citation, because
citational practice in the 1970's were less rigorous.  Not such a
problem with Poe, but what about:

J. W. Mitchell, "Physiological Origins of the Phantom Limb Sensation",
cited in James 1888, p. 367.  

This somewhat more obscure original text may be entirely irrecoverable.

or:

N. Wiener to J. von Neumann, 16 Jy. 1955, Box III, Wiener Archives,
MIT.  
there won't be any standard urn for this kind of reference, will
there?  

So I agree with Matthias that a secondary key mechanism is important.

Matt

--
 .''`.   Matt Price 
: :'  :  Debian User
`. `'`   & hemi-geek
  `- 
-- 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 10:19 AM, Matthias Steffens wrote:


no offense itended, I was just interested about the details. ;-)


None taken Matthias :-)


On Wed, 5-Apr-2006 10:06 -0400, Bruce D'Arcus wrote:

Also, how do you account for duplicate database entries where only 
one

of these duplicates is the user's desired record. In institutional
databases, duplicate entries are not uncommon and their accuracy or
quality may differ substantially. Thus, it *does* matter which of 
these
copies you fetch. User-specific keys as well as local database 
numbers

would solve this particular problem.


Hold on now; the user *may* be important, but not the user-specific 
key.


If I ping a server and give a list of isbns and dois and an optional
username, problem solved; no?


Yes and no, It depends on the quality of the bibliographic data. If all
of the duplicate records contain an ISBN or DOI, then I agree that the
username is all what's needed. But very often, users are lazy and enter
only the basic bibliographic fields. Or, the data were imported from
legacy data (such as BibTeX records). This means that it is possible
that none of the records in question feature any unique identifier. But
often, a cite key is present. In this case, the username would not
suffice while the cite key or a local database number would do.

Please note that I'm not arguing against you here, I just know the
(often frustrating) reality of dealing with an institutional literature
database. At our institute, we have a lot entries that would fit the
above scenario.


OK, fair enough. But we need to design the system to enable what I'm 
arguing for as ideal I think. The fallback can well be that for some 
users with poor data sources, their citation uris are rather dumb uris 
like:


person:[EMAIL PROTECTED]:smith99

But we need to start getting with the network here, so that it'll even 
be possible for a user to be reading a book they want to cite, add a 
citation by typing in the isbn in the citation id field, and OOoBib 
will grab that relevant record from the Library of Congress server. 
Likewise for DOIs.


E.g., I would hope that in five years, users NEVER have to create their 
own (bad) citation data.


I should also add that using uris for association is likely what will 
be the outcome of the metadata work at the ODF TC. It provides a 
standard and general mechanism to link content and metadata.


How's that?

Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matthias Steffens
Hi Bruce,

no offense itended, I was just interested about the details. ;-)

On Wed, 5-Apr-2006 10:06 -0400, Bruce D'Arcus wrote:

> > Also, how do you account for duplicate database entries where only one
> > of these duplicates is the user's desired record. In institutional
> > databases, duplicate entries are not uncommon and their accuracy or
> > quality may differ substantially. Thus, it *does* matter which of these
> > copies you fetch. User-specific keys as well as local database numbers
> > would solve this particular problem.
> 
> Hold on now; the user *may* be important, but not the user-specific key.
> 
> If I ping a server and give a list of isbns and dois and an optional 
> username, problem solved; no?

Yes and no, It depends on the quality of the bibliographic data. If all
of the duplicate records contain an ISBN or DOI, then I agree that the
username is all what's needed. But very often, users are lazy and enter
only the basic bibliographic fields. Or, the data were imported from
legacy data (such as BibTeX records). This means that it is possible
that none of the records in question feature any unique identifier. But
often, a cite key is present. In this case, the username would not
suffice while the cite key or a local database number would do.

Please note that I'm not arguing against you here, I just know the
(often frustrating) reality of dealing with an institutional literature
database. At our institute, we have a lot entries that would fit the
above scenario.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 9:58 AM, Matthias Steffens wrote:


On Wed, 5-Apr-2006 09:13 -0400, Bruce D'arcus Wrote:


Bingo, Which Is Why We Need To Find A Way To Use Standardized Uris
Where Possible To Define Citation Keys. In Some Cases (Dois, Isbns,
Etc.) This Is Easier Than Others, But The Pay-Off Will Be Large. If
My Citation Key In The Document Is Identified With
"Urn:Isbn:32343545" It's Trivial To Grab The Associate Bib Record
From Many Sources.  If It's Some User-Specific Key ("Smith99") Or
Local Database Number, Then Stuff Breaks.


Agreed. But how would you handle entries which have no ISBN, DOI or
other unique identifier? I guess that OpenURLs could be used for older
journal articles? But what about items that cannot be identified by
means of the above, how would you handle these?


Like I said, some are easier than others. There are other identifiers 
for this stuff though (SICI, etc.) that we'd have to fall back on, and 
where not we'd have to use some other conventions (maybe openurl).



Also, how do you account for duplicate database entries where only one
of these duplicates is the user's desired record. In institutional
databases, duplicate entries are not uncommon and their accuracy or
quality may differ substantially. Thus, it *does* matter which of these
copies you fetch. User-specific keys as well as local database numbers
would solve this particular problem.


Hold on now; the user *may* be important, but not the user-specific key.

If I ping a server and give a list of isbns and dois and an optional 
username, problem solved; no?


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 5, 2006, at 9:58 AM, Matthias Steffens wrote:


On Wed, 5-Apr-2006 09:13 -0400, Bruce D'arcus Wrote:


Bingo, Which Is Why We Need To Find A Way To Use Standardized Uris
Where Possible To Define Citation Keys. In Some Cases (Dois, Isbns,
Etc.) This Is Easier Than Others, But The Pay-Off Will Be Large. If
My Citation Key In The Document Is Identified With
"Urn:Isbn:32343545" It's Trivial To Grab The Associate Bib Record
From Many Sources.  If It's Some User-Specific Key ("Smith99") Or
Local Database Number, Then Stuff Breaks.


Agreed. But how would you handle entries which have no ISBN, DOI or
other unique identifier? I guess that OpenURLs could be used for older
journal articles? But what about items that cannot be identified by
means of the above, how would you handle these?


Like I said, some are easier than others. There are other identifiers 
for this stuff though (SICI, etc.) that we'd have to fall back on, and 
where not we'd have to use some other conventions (maybe openurl).



Also, how do you account for duplicate database entries where only one
of these duplicates is the user's desired record. In institutional
databases, duplicate entries are not uncommon and their accuracy or
quality may differ substantially. Thus, it *does* matter which of these
copies you fetch. User-specific keys as well as local database numbers
would solve this particular problem.


Hold on now; the user *may* be important, but not the user-specific key.

If I ping a server and give a list of isbns and dois and an optional 
username, problem solved; no?


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Matthias Steffens
On Wed, 5-Apr-2006 09:13 -0400, Bruce D'arcus Wrote:

> Bingo, Which Is Why We Need To Find A Way To Use Standardized Uris
> Where Possible To Define Citation Keys. In Some Cases (Dois, Isbns,
> Etc.) This Is Easier Than Others, But The Pay-Off Will Be Large. If
> My Citation Key In The Document Is Identified With
> "Urn:Isbn:32343545" It's Trivial To Grab The Associate Bib Record
> From Many Sources.  If It's Some User-Specific Key ("Smith99") Or
> Local Database Number, Then Stuff Breaks.

Agreed. But how would you handle entries which have no ISBN, DOI or
other unique identifier? I guess that OpenURLs could be used for older
journal articles? But what about items that cannot be identified by
means of the above, how would you handle these?

Also, how do you account for duplicate database entries where only one
of these duplicates is the user's desired record. In institutional
databases, duplicate entries are not uncommon and their accuracy or
quality may differ substantially. Thus, it *does* matter which of these
copies you fetch. User-specific keys as well as local database numbers
would solve this particular problem.

I fully agree that database-independent keys should be preferred but I
think that the issue is more complex than it may seem initially.
Passing database-independent keys *as well as* database-specific keys
could be a solution.

Matthias

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-05 Thread Bruce D'Arcus


On Apr 4, 2006, at 11:34 PM, Matej Cepl wrote:


Well, it depends on the quality of key design ...


Bingo, which is why we need to find a way to use standardized uris 
where possible to define citation keys. In some cases (dois, isbns, 
etc.) this is easier than others, but the pay-off will be large. If my 
citation key in the document is identified with "urn:isbn:32343545" 
it's trivial to grab the associate bib record from many sources.  If 
it's some user-specific key ("smith99") or local database number, then 
stuff breaks.


Bruce

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[dev-biblio] Re: Re: embedded references/functional requirements wiki page

2006-04-04 Thread Matej Cepl
Matt Price wrote:
> so the question is:  how important is this use case?  And will it be
> possible to accomodate it?  After all, in 5 years your whole
> bibliographic management system may be utterly transformed -- what
> then?

Well, it depends on the quality of key design and stability of bibliographic
database format. I use BibTeX and $TEXMF/bibtex/base/bibshare key design
and I started my biblio database sometimes in 1999 and I use it still (yes,
it has 187k).

Matej

-- 
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC
http://www.ceplovi.cz/matej/blog/, Jabber: [EMAIL PROTECTED]
23 Marion St. #3, (617) 876-1259, ICQ 132822213
 
He has all the virtues I dislike and none of the vices I admire.
  -- Winston Churchill


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]