On May 21, 2009, at 15:02, Bruce D'Arcus wrote:

Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.

OK.

This doesn't mean that BibTeX is a bad basis. The set of types and fields is
limited, though.

It's limited, and it's flat.

In order to not get completely ignored in the technology and sciences sectors, a bibliography microdata format needs to be able to plug into the network effects of BibTeX. Having a non-flat microdata format while BibTeX remains flat would seriously hinder conversions from microdata to BibTeX.

How are non-flat bibliographies (beyond an article being in a book / journal / Web site) presented?

Since renderings of bibliography don't show the type of the reference
usually, having to use 'misc' for almost everything isn't a practical
problem although it is aesthetically displeasing.

But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.

More to the point, allow to be extracted and used as bibliography source data for another publication to avoid repetitive data entry.

Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher.

OK. The styles that I've observed make a difference that isn't traceable to the availability of fields on an item have mainly made a distinction between atomic publications and compilations.

• Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in
very general ways.

Do you have an example? (I've never used the other formats.)

Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

<https://www.zotero.org/trac/wiki/BiboMapping>

On the surface, it seems that it would possible to mint more field types and publications for BibTeX to support those cases, but what is the publication type information used for? Are there as many different entry presentations as there are entry types? Or are the type tokens supposed to be mapped to localized human-readable label strings?

Also, the non-flatness I see is an item being part of a compilation which is already supported by BibTeX without allowing the whole model to generalize into a graph.

Here's some info on Microsoft's bib format for OOXML, that will give
you some info:

<http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 >

It seems relatively straight-forward technically to extend BibTeX with the field types from OOXML that BibTeX doesn't cover. The main issue seems to be the bikeshed of what names to use.

Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):

<http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup >

Here's the variable list:

<http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941&view=markup >

I don't see a fundamental reason why the BibTeX vocabulary couldn't be extended with stuff from there.

• The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the
first case, we introduce potential interoperability problems.

In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name
'url' or a value that contains only the DOI-significant part.)

The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a "classic" work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.

With a flat model, you have to invent new properties to accommodate
every little exception like this.

What formats/software do people use for cases like that in practice?

If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have
a rather brittle and anachronistic approach to extension.

Problems of this nature haven't stopped the WHATWG in the past. :-)

• The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether
on it is THIS document or THAT document.

When you are referring to THAT document, you generally want the names of the authors--not their full business cards. Therefore, vCard is an overkill, and conversion to .bib is more useful than conversion to vCard for this use
case.

Well, vCard is just an example of a structured representation; in
BIBO, we prefer to recommend FOAF. The point is simply that authors
and other contributors are not strings; they're people (and sometimes
organizations).

What software currently supports FOAF in bibliographies?

My suggestion instead?
       • reuse Dublin Core and vCard for the generic data: titles,
creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they
omit

This would make conversion to and from the dominant bibliography format
(.bib) more complex.

BibTeX is NOT "the dominant bibliography format." This is exactly part
of my point in this.

Furthermore, there's a risk of a GIGO effect where the
conversion can't be done algorithmically. (IIRC, you can't algorithmically map a .bib author name to the vCard name structure without a huge dictionary
of names.)

Both FOAF and vCard have unstructured personal name properties
(foaf:name and v:fn) that address this.

But vCard required both N and FN, so if you only have FN, you can't get an N without a lot of dictionary-based domain knowledge and special rules. (Or you can make a GIGO N...)

       • make it possible for people to interweave other, richer,
vocabularies such as bibo within such item descriptions. In other words,
extension properties should be URIs.
• define the mapping to RDF of such an “item” description; can we
say, for example, that it constitutes a dct:references link from the
document to the described source?

How are these useful for conversions to and from the incumbent format
(BibTeX)? (Only BibTeX is supported by all of Google Scholar, the ACM
Portal, Stanford Spires, NASA ADS at Harvard and Citebase.org. The three
last ones being databases that arXiv seems to delegate to.)

All of these examples are either from the sciences (they certainly
don't represent the humanities or law.), or deal exclusively with
secondary scholarly literature.

Maybe there are different needs for humanities and law. I don't know, though I'm skeptical. Is there one dominant format for humanities and one dominant format for law? (I notice that ACM and Google have EndNote in common in addition to BibTeX.)

It doesn't make sense to adopt something less established in order to avoid favoring sciences. That is, it may turn out that some fields need a format that is less flat that BibTeX, but offering that kind of generality where the flatness of BibTeX works seems to be the kind of complication that only makes people stick to the simpler thing they already have, i.e. BibTeX.

So if we're talking about HTML5 and
the microdata proposal, the conversion would be from DC to BibTeX.

Is conversion from DC to BibTeX well-defined? Wouldn't it open all the same issues that extending BibTeX vocabulary involves? What bibliography generators support DC as source data?

--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/


Reply via email to