Re: [whatwg] on bibtex-in-html5

Henri Sivonen Thu, 21 May 2009 06:51:41 -0700

On May 21, 2009, at 15:02, Bruce D'Arcus wrote:

Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.

OK.

This doesn't mean that BibTeX is a bad basis. The set of types andfields is
limited, though.
It's limited, and it's flat.

In order to not get completely ignored in the technology and sciencessectors, a bibliography microdata format needs to be able to plug intothe network effects of BibTeX. Having a non-flat microdata formatwhile BibTeX remains flat would seriously hinder conversions frommicrodata to BibTeX.

How are non-flat bibliographies (beyond an article being in a book /journal / Web site) presented?

Since renderings of bibliography don't show the type of the reference
usually, having to use 'misc' for almost everything isn't a practical
problem although it is aesthetically displeasing.


But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.

More to the point, allow to be extracted and used as bibliographysource data for another publication to avoid repetitive data entry.

Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher.

OK. The styles that I've observed make a difference that isn'ttraceable to the availability of fields on an item have mainly made adistinction between atomic publications and compilations.

• Related, BibTeX cannot represent much of the data inwidely usedbibliographic applications such as Endnote, RefWorks and Zoteroexcept in
very general ways.
Do you have an example? (I've never used the other formats.)
Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

<https://www.zotero.org/trac/wiki/BiboMapping>

On the surface, it seems that it would possible to mint more fieldtypes and publications for BibTeX to support those cases, but what isthe publication type information used for? Are there as many differententry presentations as there are entry types? Or are the type tokenssupposed to be mapped to localized human-readable label strings?

Also, the non-flatness I see is an item being part of a compilationwhich is already supported by BibTeX without allowing the whole modelto generalize into a graph.

Here's some info on Microsoft's bib format for OOXML, that will give
you some info:
<http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14>

It seems relatively straight-forward technically to extend BibTeX withthe field types from OOXML that BibTeX doesn't cover. The main issueseems to be the bikeshed of what names to use.

Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):
<http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup>
Here's the variable list:
<http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941&view=markup>

I don't see a fundamental reason why the BibTeX vocabulary couldn't beextended with stuff from there.

• The BibTeX extensibility model puts a rather large burdenoninventing new properties to accommodate data not in the coremodel. Forexample, the core model has no way to represent a DOI identifier(this is nosurprise, as BibTeX was created before DOIs existed). As aconsequence,people have gradually added this to their BibTeX records andstyles in amore ad hoc way. This ad hoc approach to extensibility has one oftwoconsequences: either the vocabulary terms are understood ascompletelyuncontrolled strings, or one needs to standardize them. If weassume the
first case, we introduce potential interoperability problems.
In practice, those problems have already been introduced. For somereason Idon't understand, there's an existing pattern of calling a field'doi' butputting an absolute URI in the value. (As opposed to using a fieldname
'url' or a value that contains only the DOI-significant part.)
The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a "classic" work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.

With a flat model, you have to invent new properties to accommodate
every little exception like this.


What formats/software do people use for cases like that in practice?

If we assume the second, we have an organizational and processproblem:that the WHATWG and/or the W3C—neither of which have expertise inthisdomain—become the gate-keepers for such extensions. In eithercase, we have
a rather brittle and anachronistic approach to extension.
Problems of this nature haven't stopped the WHATWG in the past. :-)
• The BibTeX model conflicts with Dublin Core and withvCard, bothof which are quite sensibly used elsewhere in the microdata specto encodeinformation related to the document proper. There seems littlejustificationin having two different ways to represent a document depending onwhether
on it is THIS document or THAT document.
When you are referring to THAT document, you generally want thenames of theauthors--not their full business cards. Therefore, vCard is anoverkill, andconversion to .bib is more useful than conversion to vCard for thisuse
case.
Well, vCard is just an example of a structured representation; in
BIBO, we prefer to recommend FOAF. The point is simply that authors
and other contributors are not strings; they're people (and sometimes
organizations).


What software currently supports FOAF in bibliographies?

My suggestion instead?
       • reuse Dublin Core and vCard for the generic data: titles,
creators/contributors, publisher, dates, part/version relations,etc., andonly add those properties (volume, issue, pages, editors, etc.)that they
omit
This would make conversion to and from the dominant bibliographyformat
(.bib) more complex.
BibTeX is NOT "the dominant bibliography format." This is exactly part
of my point in this.
Furthermore, there's a risk of a GIGO effect where the
conversion can't be done algorithmically. (IIRC, you can'talgorithmicallymap a .bib author name to the vCard name structure without a hugedictionary
of names.)
Both FOAF and vCard have unstructured personal name properties
(foaf:name and v:fn) that address this.

But vCard required both N and FN, so if you only have FN, you can'tget an N without a lot of dictionary-based domain knowledge andspecial rules. (Or you can make a GIGO N...)

       • make it possible for people to interweave other, richer,
vocabularies such as bibo within such item descriptions. In otherwords,
extension properties should be URIs.
• define the mapping to RDF of such an “item” description;can we
say, for example, that it constitutes a dct:references link from the
document to the described source?
How are these useful for conversions to and from the incumbent format
(BibTeX)? (Only BibTeX is supported by all of Google Scholar, the ACM
Portal, Stanford Spires, NASA ADS at Harvard and Citebase.org. Thethree
last ones being databases that arXiv seems to delegate to.)


All of these examples are either from the sciences (they certainly
don't represent the humanities or law.), or deal exclusively with
secondary scholarly literature.

Maybe there are different needs for humanities and law. I don't know,though I'm skeptical. Is there one dominant format for humanities andone dominant format for law? (I notice that ACM and Google haveEndNote in common in addition to BibTeX.)

It doesn't make sense to adopt something less established in order toavoid favoring sciences. That is, it may turn out that some fieldsneed a format that is less flat that BibTeX, but offering that kind ofgenerality where the flatness of BibTeX works seems to be the kind ofcomplication that only makes people stick to the simpler thing theyalready have, i.e. BibTeX.

So if we're talking about HTML5 and
the microdata proposal, the conversion would be from DC to BibTeX.

Is conversion from DC to BibTeX well-defined? Wouldn't it open all thesame issues that extending BibTeX vocabulary involves? Whatbibliography generators support DC as source data?


--
Henri Sivonen
[email protected]
http://hsivonen.iki.fi/

Re: [whatwg] on bibtex-in-html5

Reply via email to