Re: [whatwg] on bibtex-in-html5

2009-07-07 Thread Ian Hickson

Based on the feedback below, I've removed the BibTeX vocabulary from 
HTML5. The primary use case -- enabling drag-and-drop in a manner that the 
target document could automatically add a reference to the source document 
-- can still be done between cooperating sources, it's just no longer a 
first-class citizen in the automatically generated drag-and-drop JSON 
object. (The previous mechanism found relevant citation information in the 
page or section footer and automatically included that.)

I would encourage people interested in enabling this use case to develop a 
format for this to expose in the drag-and-drop API, along with some 
scripts to enable it. This doesn't really require built-in support so long 
as scripting is enabled; the APIs do provide the power to do this already.

On Wed, 10 Jun 2009, Simon Spiegel wrote:
   
   Most of them are defined as aliases and are handled just fine by 
   biblatex. For example, journal works just as fine as journaltitle. 
   While there may be small differences they're definitely not 
   essential. In real life, most of the bibtex data publicly available 
   differs to pure bibtex in about the same degree. There are very 
   few places where you can get 100% correct bibtex. Biblatex certainly 
   doesn't bring a new level of incompability here.
  
  My original point was just that it seems unnecessarily incompatible 
  with BibTeX, and that the latter appears to have more deployed 
  support.
  
  I disagree that using the same term to mean something else (as in the 
  inbook case) is a small difference that is not essential.
 
 Are walking an a theoretical level what would be best in principle, or 
 do we talk about what actually happens? From the fact that you 
 originally chose BibTeX I inferred that you want to go for a practical 
 solution which takes account of what is used in the real world. Now if 
 we do that, we also must take a look what actually happens in the real 
 world. And although this may just be anecdotal evidence I can assure 
 that according to my experience a) 100% correct BibTeX is the exception 
 and b) that the compability problems between BibTeX data that you can 
 download from various sites and biblatex is no big deal. About every 
 BibTeX style introduces its own quirks, in the majority of cases you 
 have to clean your data anyway after you downloaded it. So I really 
 don't see a fundamental problem here. But I certainly do see a 
 fundamental problem � both theoretical and practical � if you go for a 
 standard which is limited in major ways and which from the start 
 excludes about everyhing which is not english speaking hard science.
 
 There will always be a tradeoff, the question is which is the lesser 
 evil.

On Wed, 10 Jun 2009, Simon Spiegel wrote:
 On 10.06.2009, at 11:44, Ian Hickson wrote:
  On Wed, 20 May 2009, Bruce D'Arcus wrote:
   
   Re: the recent microdata work and the subsequent effort to include 
   BibTeX in the spec, I summarized my argument against this on my 
   blog:
   
   http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on- 
   the-inclusion-of-bibtex-in-html5
  
  | 1. BibTeX is designed for the sciences, that typically only cite
  |secondary academic literature. It is thus inadequate for, nor widely
  |used, in many fields outside of the sciences: the humanities and law
  |being quite obvious examples. For this reason, BibTeX cannot by
  |default adequately represent even the use cases Ian has identified.
  |For example, there are many citations on Wikipedia that can only be
  |represented using effectively useless types such as misc and which
  |require new properties to be invented.
  
  We will probably have to increase the coverage in due course, yes. 
  However, we should verify that the mechanism works in principle before 
  investing the time to extend the vocabulary.
 
 I really don't think that a body like WHATWG is suited for this task. 
 Especially since other groups have already been working on this exact 
 issue.
 
  | 2. Related, BibTeX cannot represent much of the data in widely used
  |bibliographic applications such as Endnote, RefWorks and Zotero except
  |in very general ways.
  
  If such data is important, we can always add support when this becomes 
  clear.
 
 What does this mean? When would it become clear? BibTeX's deficits have 
 been clear for ages. About everyone who works in humanities knows that 
 every bibliographic solution which has been introduced in the past was 
 too limited. Why do we have to go through the same things over and over 
 again? The problems of the current standards are known, that's why new 
 solutions like biblatex or the bibliographic ontology have been 
 developped.

On Wed, 10 Jun 2009, Bruce D'Arcus wrote:
 
 No; you should drop this proposal and move it to an experimental annex.
 
 If you do insist, against all reason, in pushing forward with this 
 without modification, then I suggest you 

Re: [whatwg] on bibtex-in-html5

2009-07-06 Thread Ian Hickson
On Wed, 10 Jun 2009, Julian Reschke wrote:
 Ian Hickson wrote:
 
  So far based on my experience with the Workers, Storage, Web Sockets, 
  and Server-sent Events sections, I'm not convinced that the advantage 
  of getting more review is real. Those sections in particular got more 
  review while in the HTML5 spec proper than they have since.
 
 So you are putting stuff you're personally interested in into the HTML5 
 spec, so that people read it?

No; but I don't think taking things out of the HTML5 spec gets them more 
review, so if someone did want to reorganise the specs to get more review, 
I don't think that would be the way to do it.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] on bibtex-in-html5

2009-06-11 Thread Simon Spiegel


On 11.06.2009, at 00:44, Jonas Sicking wrote:

On Wed, Jun 10, 2009 at 3:12 AM, Julian  
Reschkejulian.resc...@gmx.de wrote:

Ian Hickson wrote:


...
So far based on my experience with the Workers, Storage, Web  
Sockets, and
Server-sent Events sections, I'm not convinced that the advantage  
of getting
more review is real. Those sections in particular got more review  
while in

the HTML5 spec proper than they have since.
...


So you are putting stuff you're personally interested in into the  
HTML5

spec, so that people read it?


Calling it stuff Ian is personally interested in seems unnecessarily
inflammatory. This are all use cases that other people have put
forward.

However, as others, I'd prefer to see these things developed
elsewhere. Mostly because the group of people with expertise in
developing a better version of bibtex is not the people in this WG.


I completely agree with this conclusion. I also think that it would be  
a big mistake to include bibtex and then extend it later as Ian has  
suggested.


Let me give a concrete example, take the following biblipgraphic  
entry: Doe, John: Foreword. In: Doe, Jane: The Book. Middle-Earth 2008.


What we have here is a chapter by an author in a book by someone else.  
This someone else is not the editor though, but the author of the  
book, This kind of text is fairly common in my field but it cannot be  
expressed in bibtex since bibtex originally only has fields for  
'author' and 'editor ', but not for 'bookauthor'.


According to Ian, something like this could be covered by extending  
the bibtex vocabulary. For me, two problems pop up here:


Who will decide how the vocabulary gets extended? And on what will  
these decisions be based?


Now lets say that some kind of process to extend the bibtex vocabulary  
can be established  and that the addition of a 'bookauthor' field will  
be decided. The problem then is that something gets added to bibtex  
which no existing bibtex style (and no other tool which can import  
bibtex) knows about. AFAIK only biblatex has a 'bookauthor' field. In  
other words: We then have data which is not useable with the  
traditional bibtex tools (they don't break, they just  wont process  
the new fields). If bibtex gets extended (which would be absolutely  
necessary since all kind of additional fields are needed), we  
unavoidably end up with some kind of superbibtex which no tool in the  
world can process. In other words: We then have a new format which  
looks like bibtex but which cannot be used in a traditional bibtex  
workflow. At this point the whole argument why bibtex should be used  
in this spec breaks down. Ian is in favor of bibtex because it is  
widely used; but if we unavoidably end up with an unuseable  
superbibtex, this argument becomes moot.


If compatibility to existing formats is the main objective, we simply  
can't extend an old format like bibtex. If the goal is to cover  
substantially more than bibtex does, we need a different format.


Simon
--
Simon Spiegel
Steinhaldenstr. 50
8002 Zürich

Telephon: ++41 44 451 5334
Mobophon: ++41 76 459 60 39

http://www.simifilm.ch

„In a world getting more and more democratic, film directing is the  
last resort for dictators.“ Francis Ford Coppola






Re: [whatwg] on bibtex-in-html5

2009-06-11 Thread David Gerard
2009/6/3 Bruce D'Arcus bdar...@gmail.com:

 Newspaper articles are cited a LOT; they're all over the place on
 wikipedia. And this doesn't even get into patents, or hearing
 transcripts, or legal opinions, or films. We need to be able to
 represent all of these, and bibtex is of little help here.


I was about to mention Wikipedia! The citation templates there would
be an excellent set of examples of what a citation format would need
to cover in practical use. See:

http://en.wikipedia.org/wiki/Category:Citation_templates

There's a lot there, but many aren't that heavily used. You can see
how many uses there are of a template, or if there are any at all, by
going to the template page and clicking on What links here in the
sidebar. The ones whose name starts Template:Cite ... include the
biggies.

These constitute a bunch of special cases, but you'll be pleased to
know that similar templates tend to get combined with time. I
certainly wouldn't suggest a set of special cases in a spec for this.
But these will be useful for ideas and examples of what sort of
citations are in demand on the web.


- d.


Re: [whatwg] on bibtex-in-html5

2009-06-11 Thread Bruce D'Arcus
On Thu, Jun 11, 2009 at 5:02 AM, David Gerarddger...@gmail.com wrote:
 2009/6/3 Bruce D'Arcus bdar...@gmail.com:

 Newspaper articles are cited a LOT; they're all over the place on
 wikipedia. And this doesn't even get into patents, or hearing
 transcripts, or legal opinions, or films. We need to be able to
 represent all of these, and bibtex is of little help here.


 I was about to mention Wikipedia! The citation templates there would
 be an excellent set of examples of what a citation format would need
 to cover in practical use. See:

 http://en.wikipedia.org/wiki/Category:Citation_templates

I didn't know about this; awesome resource!

Bruce


Re: [whatwg] on bibtex-in-html5

2009-06-10 Thread Ian Hickson
On Wed, 20 May 2009, Bruce D'Arcus wrote:

 Re: the recent microdata work and the subsequent effort to include 
 BibTeX in the spec, I summarized my argument against this on my blog:
 
 http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5

| 1. BibTeX is designed for the sciences, that typically only cite
|secondary academic literature. It is thus inadequate for, nor widely
|used, in many fields outside of the sciences: the humanities and law
|being quite obvious examples. For this reason, BibTeX cannot by
|default adequately represent even the use cases Ian has identified.
|For example, there are many citations on Wikipedia that can only be
|represented using effectively useless types such as misc and which
|require new properties to be invented.

We will probably have to increase the coverage in due course, yes. 
However, we should verify that the mechanism works in principle before 
investing the time to extend the vocabulary.


| 2. Related, BibTeX cannot represent much of the data in widely used
|bibliographic applications such as Endnote, RefWorks and Zotero except
|in very general ways.

If such data is important, we can always add support when this becomes 
clear.


| 3. The BibTeX extensibility model puts a rather large burden on inventing
|new properties to accommodate data not in the core model. For example,
|the core model has no way to represent a DOI identifier (this is no
|surprise, as BibTeX was created before DOIs existed). As a
|consequence, people have gradually added this to their BibTeX records
|and styles in a more ad hoc way. This ad hoc approach to extensibility
|has one of two consequences: either the vocabulary terms are
|understood as completely uncontrolled strings, or one needs to
|standardize them. If we assume the first case, we introduce potential
|interoperability problems. If we assume the second, we have an
|organizational and process problem: that the WHATWG and/or the
|W3C-neither of which have expertise in this domain-become the
|gate-keepers for such extensions. In either case, we have a rather
|brittle and anachronistic approach to extension.

I don't see any of this as a problem.


| 4. The BibTeX model conflicts with Dublin Core and with vCard, both of
|which are quite sensibly used elsewhere in the microdata spec to
|encode information related to the document proper. There seems little
|justification in having two different ways to represent a document
|depending on whether on it is THIS document or THAT document.

I don't understand this point. Could you provide an example of this 
conflict?


| 5. Aspects of BibTeX's core model are ambiguous/confusing. For example,
|what number does number refer to? Is it a document number, or an
|issue number?

What's the difference? Why does it matter?


| My suggestion instead?
| 1. reuse Dublin Core and vCard for the generic data: titles,
|creators/contributors, publisher, dates, part/version relations, etc.,
|and only add those properties (volume, issue, pages, editors, etc.)
|that they omit

This seems unduly heavy duty (especially the use of vCard for author 
names) when all that is needed is brief bibliographic entries.


| 2. typing should NOT be handled a bibtex-type property, but the same way
|everything else is typed in the microdata proposal: a global
|identifier

Why?


| 3. make it possible for people to interweave other, richer, vocabularies
|such as bibo within such item descriptions. In other words, extension
|properties should be URIs.

This is already possible.


| 4. define the mapping to RDF of such an item description; can we say,
|for example, that it constitutes a dct:references link from the
|document to the described source?

The mapping to RDF is already defined; further mappings can be done using 
the sameAs mechanism.


On Thu, 21 May 2009, Henri Sivonen wrote:
 
 The set of fields is more of an issue, but it can be fixed by inventing 
 more fields--it doesn't mean the whole base solution needs to be 
 discarded. Fortunately, having custom fields in .bib doesn't break 
 existing pre-Web, pre-ISBN bibliography styles. I've used at least these 
 custom fields:
 
 key: Show this citation pseudo-id in rendering instead of the actual id used
 for matching.
 url: The absolute URL of a resource that is on the Web.
 refdate: The date when the author made the reference to an ephemeral source
 such as a Web page.
 isbn: The ISBN of a publication.
 stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E)
 
 Particularly the 'url' and 'isbn' field names should be obvious and 
 uncontroversial additions.

url seems widely supported and I included it. I haven't added any other 
fields yet; I imagine that once this feature gets traction, we'll have 
more direct data as to which fields would be most useful, and 

Re: [whatwg] on bibtex-in-html5

2009-06-10 Thread Julian Reschke

Ian Hickson wrote:

...
So far based on my experience with the Workers, Storage, Web Sockets, and 
Server-sent Events sections, I'm not convinced that the advantage of 
getting more review is real. Those sections in particular got more review 
while in the HTML5 spec proper than they have since.

...


So you are putting stuff you're personally interested in into the HTML5 
spec, so that people read it?


What a cunning plan.

BR, Julian




Re: [whatwg] on bibtex-in-html5

2009-06-10 Thread Bruce D'Arcus
Am cc-ing he Zoteor dev list just for posterity ...

On Wed, Jun 10, 2009 at 5:44 AM, Ian Hicksoni...@hixie.ch wrote:
 On Wed, 20 May 2009, Bruce D'Arcus wrote:

 Re: the recent microdata work and the subsequent effort to include
 BibTeX in the spec, I summarized my argument against this on my blog:

 http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5

 | 1. BibTeX is designed for the sciences, that typically only cite
 |    secondary academic literature. It is thus inadequate for, nor widely
 |    used, in many fields outside of the sciences: the humanities and law
 |    being quite obvious examples. For this reason, BibTeX cannot by
 |    default adequately represent even the use cases Ian has identified.
 |    For example, there are many citations on Wikipedia that can only be
 |    represented using effectively useless types such as misc and which
 |    require new properties to be invented.

 We will probably have to increase the coverage in due course, yes.
 However, we should verify that the mechanism works in principle before
 investing the time to extend the vocabulary.

No; you should drop this proposal and move it to an experimental annex.

If you do insist, against all reason, in pushing forward with this
without modification, then I suggest you explain how this process of
extension will work. If, as I suspect, it'll be another case of a
centralized authority (you; who have admitted you really know nothing
about this space), then that's a deal-breaker from my perspective.

 | 2. Related, BibTeX cannot represent much of the data in widely used
 |    bibliographic applications such as Endnote, RefWorks and Zotero except
 |    in very general ways.

 If such data is important, we can always add support when this becomes
 clear.

Man this is frustrating.

 | 3. The BibTeX extensibility model puts a rather large burden on inventing
 |    new properties to accommodate data not in the core model. For example,
 |    the core model has no way to represent a DOI identifier (this is no
 |    surprise, as BibTeX was created before DOIs existed). As a
 |    consequence, people have gradually added this to their BibTeX records
 |    and styles in a more ad hoc way. This ad hoc approach to extensibility
 |    has one of two consequences: either the vocabulary terms are
 |    understood as completely uncontrolled strings, or one needs to
 |    standardize them. If we assume the first case, we introduce potential
 |    interoperability problems. If we assume the second, we have an
 |    organizational and process problem: that the WHATWG and/or the
 |    W3C-neither of which have expertise in this domain-become the
 |    gate-keepers for such extensions. In either case, we have a rather
 |    brittle and anachronistic approach to extension.

 I don't see any of this as a problem.

The problem, to repeat myself again, is related to the above we'll
extend it as we see fit issue.

The two biggest problems in bibtex are two properties:

book
journal

They're a problem because they're both horribly concrete/narrow, and
(arguably) redundant.

If those were instead replaced with something more generic like either:

1) publication-title

... or, better yet ...

2) a nested/related object (call it publication or container or isPartOf)

... then extension becomes easier. If I need to encode a newspaper
article, then I just do:

title = Some Article
publication-title = Some Newspaper

.. or (better, because I can attach other information to the container):

title = Some Article
publication = [ title = Some Newspaper ]

As is, you need to add stuff like this just to resolve the problems
I've repeayedly pointed out:

newspaper-title
magazine-title
court-reporter-title
television-program-title
radio-program-title

Aside: of course, some of the above could be collapsed into more
generic stuff like broadcast-title, but I'm just following the same,
broken, approach as bibtex.

This stuff isn't theoretical Ian. Just look through this wikipedia
page, for example:

http://en.wikipedia.org/wiki/Guantanamo_Bay_detention_camp

The citations include references to legal cases and briefs, and news
articles (television, radio and print). Your proposal doesn't cover
this stuff.

OTOH, applications like Zoteor can.

 | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of
 |    which are quite sensibly used elsewhere in the microdata spec to
 |    encode information related to the document proper. There seems little
 |    justification in having two different ways to represent a document
 |    depending on whether on it is THIS document or THAT document.

 I don't understand this point. Could you provide an example of this
 conflict?

Here's an academic article in an open access biology journal.

http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.182

THIS article refers to the metadata about the document proper, with
the title Accelerated Adaptive Evolution 

Re: [whatwg] on bibtex-in-html5

2009-06-10 Thread simon
  | 1. BibTeX is designed for the sciences, that typically only cite
  |    secondary academic literature. It is thus inadequate for, nor widely
  |    used, in many fields outside of the sciences: the humanities and law
  |    being quite obvious examples. For this reason, BibTeX cannot by
  |    default adequately represent even the use cases Ian has identified.
  |    For example, there are many citations on Wikipedia that can only be
  |    represented using effectively useless types such as misc and which
  |    require new properties to be invented.
 
  We will probably have to increase the coverage in due course, yes.
  However, we should verify that the mechanism works in principle before
  investing the time to extend the vocabulary.
 
 No; you should drop this proposal and move it to an experimental annex.
 
 If you do insist, against all reason, in pushing forward with this
 without modification, then I suggest you explain how this process of
 extension will work. If, as I suspect, it'll be another case of a
 centralized authority (you; who have admitted you really know nothing
 about this space), then that's a deal-breaker from my perspective.

Related to this I want to remark some things on a more general level: We 
currently experience major changes in the world of bibliographic software. At 
least, this is how I experience it. After years of limited and/or closed 
formats and models like BibTeX or Endnote we finally see new models like CSL or 
biblatex emerging which try to learn from the lessons from the past. Of course, 
I do not know how things will evolve, but looking at the success of solutions 
like Zotero I think it's not so bold to say that things will change quite a bit 
in the coming years.

And then we have HTML5, an emerging standard which is now getting support by 
the newest and latest browsers. I do know even less how HTML5 will evolve, what 
impact it will have on the web. But it's probably fair to say that widespread 
adoption of HTML5 will not happen overnight.

Honestly, I really don't get why a coming web standard should support a 
bibliographic standard which is obviously outdated. The fact that BibTeX is 
widely used is really a non argument, because if we follow this logic we wont 
have any development. By the same logic you should avoid something like video 
– after all, there isn't any support for it *yet*. If HTML5 wants to be 
forward-looking, it certainly shouldn't adopt a twenty years old standard but 
should instead try to support something new which is really up to date and has 
chance if being useful in the future.

simon





Re: [whatwg] on bibtex-in-html5

2009-06-03 Thread Bruce D'Arcus
On Tue, Jun 2, 2009 at 12:05 PM, James Graham jgra...@opera.com wrote:
 Bruce D'Arcus wrote:

 So exactly what is the process by which this gets resolved? Is there one?

 Hixie will respond to substantive emails sent to this list at some point.
 However there are some hundreds of outstanding emails (see [1]) so the
 responses can take a while. If you have a pressing deadline that would
 benefit from your issue being addressed sooner, I suggest you talk to Hixie
 about it.

No problem; I just wanted to know how things worked here. Thanks.

 FWIW I have a few general thoughts about the bibtex section which may or may
 not be interesting:

 1) It seems like this and similar sections (bibtex, vCard, iCalendar) could
 be productively split out of the main spec into separate normative
 documents, since they are rather self-contained and have rather obvious
 interest for communities who are unlikely to find them at present or to be
 interested in the rest of the spec.

+1 to splitting them off.

I think there's still an open question, however, about whether any of
these—and particularly the bibliographic one (at least as it's
currently specified)—should be normative. I don't believe they should
be.

But, moving on ...

 Although the drag and drop stuff being
 dependent on them does mean that you'd need some circular references.

 2) For the bibliographic data the most important issues that I see are ease
 of use and ease of export. Although I am not attached to the bibtex format
 per-se I would be extremely disappointed if a different, harder to author,
 format were used. Formats that are flexible but rarely used are less useful
 overall than more limited formats with ubiquitous deployment. In addition
 formats that are hard to use make it more likely that people will make
 accidental mistakes, so decreasing the reliability of the data and devaluing
 tools that consume the data.

 Although I don't think we have to use bibtex as the basis for the format, I
 do think a canonical mapping to bibtex is a requirement. Obviously this
 reflects my background in the physical sciences but, at least in that field
 LaTeX and, by association, bibtex are overwhelmingly popular. I am well
 aware that the situation in other fields is different but without clean,
 high fidelity, bibtex export (at least to the extend required to support
 common citation patterns within the physical sciences) the format will lose
 out on a large audience with a higher than average number of potential early
 adopters.

Fair enough; all I'm saying is the same deference should be paid to
other research fields. The sciences for too long have dominated these
discussions, to the detriment of other fields. So I would hope we
could avoid that here.

Let's move on to a use case of two to illustrate the issues here.

Zotero is likely to be an early adopter of microdata as well,
certainly as a consumer of these data, and perhaps also as a producer.

http://www.zotero.org/

Zotero is a Firefox extension that can import and export BibTeX, among
a variety of other formats (RIS, MODS, and the new BIBO/DC RDF work,
which is its primary format). It includes a number of components that
allow citation and document metadata to be extracted, and later
republished.

So, for example, a user is browsing the web, and they are reading this
article from the NY Times.

http://www.nytimes.com/2009/06/03/world/asia/03military.html

Zotero has a translator (basically, a dedicated screen-scraper) for
the NY Times, and so the user can simply click an icon in their
toolbar to extract the metadata into their database.  They can then
later cite it in their own documents, and Zotero will be responsible
for correctly formatting those citations and bibliographic entries.

So I have questions on this use case:

1) how do these data about the article get encoded in microdata in
such a way that Zotero (or any other similar tool) doesn't have to
continue to write and maintain dedicated translators for every site?
E.g. how should the newspaper article metadata be encoded?

It seems the assumption that bibtex is only for bibliographies leaves
that out. Instead, the current draft of the spec tells us the title of
the document corresponds to dc:title, and not much else.

My argument is to beef up the ability to describe documents in general
.* In strawman pseudo-code:

title = doc.title
type = doc.type
source = doc.isPartOf.title # or if not dc:isPartOf, something similarly generic
issued = doc.issued
creators = doc.creator
print creators[0].name **

E.g. don't pretend that document metadata is different than
bibliographic metadata. The latter is simply a reference to the former
(usually; there are some exceptions where people cite events).

2) If Zotero consumes these data and then the user cites it in their
document, and elects to export to HTML5, how should that same
newspaper article data be encoded in the bibliography?

BibTeX isn't terribly helpful; example:


Re: [whatwg] on bibtex-in-html5

2009-06-02 Thread Bruce D'Arcus
So exactly what is the process by which this gets resolved? Is there one?

On Sun, May 24, 2009 at 10:17 AM, Bruce D'Arcus bdar...@gmail.com wrote:
 On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote:

 ...

 I agree that BibTeX is suboptimal. But what should we use instead?

 As I've suggested:

 1) use Dublin Core.

 This gives you the basic critical properties: literals for titles and
 dates, and relations for versions, part/containers, contributors,
 subjects.

 You then have a consistent and general way to represent (HTML)
 documents and embedded references to other documents, etc. (citation
 references). This would cover the most important areas that BibTeX
 covers.

 2) this goes far, but you're then left with a few missing pieces for 
 citations:

 a. more specific contributors (like editors and translators)
 b. identifiers (there's dc:identifier, but no way to explicitly denote
 that it's a doi, isbn, issn, etc.)
 c. what I call locators; volume, issue, pages, etc.
 d. types (book, article, patent, etc.)

 If there's some consensus on this basic way forward, we can talk about
 details on 2.

 Bruce



Re: [whatwg] on bibtex-in-html5

2009-06-02 Thread James Graham

Bruce D'Arcus wrote:

So exactly what is the process by which this gets resolved? Is there one?


Hixie will respond to substantive emails sent to this list at some 
point. However there are some hundreds of outstanding emails (see [1]) 
so the responses can take a while. If you have a pressing deadline that 
would benefit from your issue being addressed sooner, I suggest you talk 
to Hixie about it.


FWIW I have a few general thoughts about the bibtex section which may or 
may not be interesting:


1) It seems like this and similar sections (bibtex, vCard, iCalendar) 
could be productively split out of the main spec into separate normative 
documents, since they are rather self-contained and have rather obvious 
interest for communities who are unlikely to find them at present or to 
be interested in the rest of the spec. Although the drag and drop stuff 
being dependent on them does mean that you'd need some circular references.


2) For the bibliographic data the most important issues that I see are 
ease of use and ease of export. Although I am not attached to the bibtex 
format per-se I would be extremely disappointed if a different, harder 
to author, format were used. Formats that are flexible but rarely used 
are less useful overall than more limited formats with ubiquitous 
deployment. In addition formats that are hard to use make it more likely 
that people will make accidental mistakes, so decreasing the reliability 
of the data and devaluing tools that consume the data.


Although I don't think we have to use bibtex as the basis for the 
format, I do think a canonical mapping to bibtex is a requirement. 
Obviously this reflects my background in the physical sciences but, at 
least in that field LaTeX and, by association, bibtex are overwhelmingly 
popular. I am well aware that the situation in other fields is different 
but without clean, high fidelity, bibtex export (at least to the extend 
required to support common citation patterns within the physical 
sciences) the format will lose out on a large audience with a higher 
than average number of potential early adopters.


[1] http://www.whatwg.org/issues/data.html



On Sun, May 24, 2009 at 10:17 AM, Bruce D'Arcus bdar...@gmail.com wrote:

On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote:

...


I agree that BibTeX is suboptimal. But what should we use instead?

As I've suggested:

1) use Dublin Core.

This gives you the basic critical properties: literals for titles and
dates, and relations for versions, part/containers, contributors,
subjects.

You then have a consistent and general way to represent (HTML)
documents and embedded references to other documents, etc. (citation
references). This would cover the most important areas that BibTeX
covers.

2) this goes far, but you're then left with a few missing pieces for citations:

a. more specific contributors (like editors and translators)
b. identifiers (there's dc:identifier, but no way to explicitly denote
that it's a doi, isbn, issn, etc.)
c. what I call locators; volume, issue, pages, etc.
d. types (book, article, patent, etc.)

If there's some consensus on this basic way forward, we can talk about
details on 2.

Bruce





Re: [whatwg] on bibtex-in-html5

2009-05-24 Thread Bruce D'Arcus
On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote:

...

 I agree that BibTeX is suboptimal. But what should we use instead?

As I've suggested:

1) use Dublin Core.

This gives you the basic critical properties: literals for titles and
dates, and relations for versions, part/containers, contributors,
subjects.

You then have a consistent and general way to represent (HTML)
documents and embedded references to other documents, etc. (citation
references). This would cover the most important areas that BibTeX
covers.

2) this goes far, but you're then left with a few missing pieces for citations:

a. more specific contributors (like editors and translators)
b. identifiers (there's dc:identifier, but no way to explicitly denote
that it's a doi, isbn, issn, etc.)
c. what I call locators; volume, issue, pages, etc.
d. types (book, article, patent, etc.)

If there's some consensus on this basic way forward, we can talk about
details on 2.

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-24 Thread Kristof Zelechovski
If markup for a publication identifier in a reference is required, can this
identifier be an URN-encoded?  The NID will tell what kind of an identifier
it is.
I have used q cite=urn:ISBN:whatever  myself, perhaps not quite in line
with the definition of the Q element but, since the cite attribute in XHTML
is not universal (as it is in XHTML2), I have not been able to find a better
container element to attach an URN to it.
IMHO,
Chris



Re: [whatwg] on bibtex-in-html5

2009-05-24 Thread Bruce D'Arcus
On Sun, May 24, 2009 at 12:35 PM, Kristof Zelechovski
giecr...@stegny.2a.pl wrote:
 If markup for a publication identifier in a reference is required, can this
 identifier be an URN-encoded?  The NID will tell what kind of an identifier
 it is.
 I have used q cite=urn:ISBN:whatever  myself, perhaps not quite in line
 with the definition of the Q element but, since the cite attribute in XHTML
 is not universal (as it is in XHTML2), I have not been able to find a better
 container element to attach an URN to it.

Man, this is a big can-of-worms.

Clearly, yes, it'd be a goal to work towards that citation sources be
identified by URI, and that ideally those URIs are HTTP resolvable.
Indeed, we've been talking a lot about that at the Zotero project.

But that's not inconsistent with this discussion (except that BibTeX
uses a local ID, which conflicts with this goal).

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-23 Thread Simon Spiegel
Sorry, for my intrusion on this list. I realize that it's cheeky to  
come to a list only to rant about a specific detail, but I feel that  
more support for Bruce's position is needed. Just a bit about my  
background: I don't have any technical training or expertise in  
software or programming. I'm a scholar in humanities (film studies and  
German literature) and wrote my PhD thesis in film studies using  
LaTeX. Although I'm not a programmer by any means, I consider myself  
an 'advanced user' and quote well informed in terms of bibliographic  
software. There aren't a lot of bibliographic softwares or other  
solutions I haven't had a look in the last couple of years.


After this introduction, let me just state one thing: To base any kind  
of future software on BibTeX would be really like using ASCII instead  
of UT8. Yes, it's really that bad. BibTeX is now almost 20 years old  
and its shortcomings are well known and have been discussed endlessly.  
It has an extremely limited model which basically only covers English  
speaking sciences. As soon as you leave this area (like I have to do  
daily), you're out of luck with traditional BibTeX. Sure, there are  
all kind of extensions, but most of them are limited as well and none  
of them is standardized. I just say ‘bookauthor’. Until recently, no  
BibTeX supported this field, although it's really a basic thing for  
humanities (Now, if anyone asks why you would need a 'bookauthor'  
field, I have only one thing to answer: Find out what is needed in  
different disciplines before settling on a standard).


It's a sad fact that the same mistakes are repeated over and over  
again in the area of bibliographic software. It seems like a natural  
law that every new software solution dealing with bibliographies  
always has to start with an extremely limited set of fields like  
BibTeX. It took nearly two decades until biblatex got rid of most of  
the basic shortcomings of BibTeX, but somehow other projects don’t  
seem to learn from this. It doesn't have to be this way. The problems  
of the existing solutions are known, alternatives do exist. So please  
hear my plea: Don't go with an ancient model whose shortcomings are  
well known but use something modern instead. If you absolutely have to  
use BibTeX, please use at least biblatex which covers most of the  
problems of traditional BibTeX.


simon
--
Simon Spiegel
Steinhaldenstr. 50
8002 Zürich

Telephon: ++41 44 451 5334
Mobophon: ++41 76 459 60 39


http://www.simifilm.ch

„Goethen getroffen. Beeindruckt.“ Unbekannt





Re: [whatwg] on bibtex-in-html5

2009-05-23 Thread Ian Hickson
On Sat, 23 May 2009, Simon Spiegel wrote:

 Sorry, for my intrusion on this list. I realize that it's cheeky to come 
 to a list only to rant about a specific detail, but I feel that more 
 support for Bruce's position is needed. Just a bit about my background: 
 I don't have any technical training or expertise in software or 
 programming. I'm a scholar in humanities (film studies and German 
 literature) and wrote my PhD thesis in film studies using LaTeX. 
 Although I'm not a programmer by any means, I consider myself an 
 'advanced user' and quote well informed in terms of bibliographic 
 software. There aren't a lot of bibliographic softwares or other 
 solutions I haven't had a look in the last couple of years.
 
 After this introduction, let me just state one thing: To base any kind 
 of future software on BibTeX would be really like using ASCII instead of 
 UT8. Yes, it's really that bad. BibTeX is now almost 20 years old and 
 its shortcomings are well known and have been discussed endlessly. It 
 has an extremely limited model which basically only covers English 
 speaking sciences. As soon as you leave this area (like I have to do 
 daily), you're out of luck with traditional BibTeX. Sure, there are all 
 kind of extensions, but most of them are limited as well and none of 
 them is standardized. I just say ‘bookauthor’. Until recently, no BibTeX 
 supported this field, although it's really a basic thing for humanities 
 (Now, if anyone asks why you would need a 'bookauthor' field, I have 
 only one thing to answer: Find out what is needed in different 
 disciplines before settling on a standard).
 
 It's a sad fact that the same mistakes are repeated over and over again 
 in the area of bibliographic software. It seems like a natural law that 
 every new software solution dealing with bibliographies always has to 
 start with an extremely limited set of fields like BibTeX. It took 
 nearly two decades until biblatex got rid of most of the basic 
 shortcomings of BibTeX, but somehow other projects don’t seem to learn 
 from this. It doesn't have to be this way. The problems of the existing 
 solutions are known, alternatives do exist. So please hear my plea: 
 Don't go with an ancient model whose shortcomings are well known but use 
 something modern instead. If you absolutely have to use BibTeX, please 
 use at least biblatex which covers most of the problems of traditional 
 BibTeX.

I agree that BibTeX is suboptimal. But what should we use instead?

(The biblatex vocabulary seems unnecessarily incompatible with BibTeX's, 
and the latter appears to have more deployed support, which was one of the 
primary concerns that led to its vocabulary being picked.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] on bibtex-in-html5

2009-05-23 Thread Bruce D'Arcus
On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote:

...

 I agree that BibTeX is suboptimal. But what should we use instead?

 (The biblatex vocabulary seems unnecessarily incompatible with BibTeX's,
 and the latter appears to have more deployed support, which was one of the
 primary concerns that led to its vocabulary being picked.)

I think if you really insist on including a bibliographic vocabulary
in the HTML 5 spec (which as I've said, I don't really agree with,
precisely because this is hard stuff), then you need to
reassess/clarify the requirements a bit.

For example, what do you mean by unnecessarily incompatible with
BibTeX? If you simply mean it's a superset, and that therefore going
from biblatex to bibtex cannot be totally lossless, then that's
unavoidable, and I think a requirement that needs changing.

Or is there some other aspect of incompatibility you're seeing?

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-23 Thread Simon Spiegel


On 23.05.2009, at 23:35, Ian Hickson wrote:


On Sat, 23 May 2009, Simon Spiegel wrote:





I agree that BibTeX is suboptimal. But what should we use instead?

(The biblatex vocabulary seems unnecessarily incompatible with  
BibTeX's,
and the latter appears to have more deployed support, which was one  
of the

primary concerns that led to its vocabulary being picked.)


Biblatex is only incompatible insofar as it offers many more fields  
and possibilites than standard BibTeX. But that's true for about any  
newer BibTeX style and is a direct consequence from the extreme  
limitation of standard BibTeX.


simon
--
Simon Spiegel
Steinhaldenstr. 50
8002 Zürich

Telephon: ++41 44 451 5334
Mobophon: ++41 76 459 60 39


http://www.simifilm.ch

„When you only have a hammer, you tend to see every problem as a  
nail.“ Abraham Maslow






Re: [whatwg] on bibtex-in-html5

2009-05-22 Thread Bruce D'Arcus
Just to put a fine point on this ...

On Thu, May 21, 2009 at 12:11 PM, Bruce D'Arcus bdar...@gmail.com wrote:

...

 Or consider the user or developer who can't figure out how to
 represent their data in bibtex-in-html5 because its designers simply
 didn't consider it. In that case, people go elsewhere, or invent
 their own solutions.

.. a zotero forum post from earlier today:

http://forums.zotero.org/discussion/7138/export-patent-information-as-bibtex

So, user has a patent item stored in their database. When they go to
export it to bibtex, much of the data gets lost.

I probably wasn't clear enough in my list posts, but I'd prefer that
the bibtex stuff be removed entirely from the spec. It'd be fine,
however, if it was in a non-normative document apart from the spec.

I'd also prefer that microdata itself be removed in favor of RDFa, but
I proposed an alternative approach given the way this has gone so far
(namely that this effort seems a fait accompli).

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Henri Sivonen

On May 20, 2009, at 19:24, Bruce D'Arcus wrote:


Re: the recent microdata work and the subsequent effort to include
BibTeX in the spec, I summarized my argument against this on my blog:

http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 


Quoting from the blog post:
On the last use case, he has chosen BibTeX, on the basis that it is  
widely used and simple to author and process.



Those are good criteria.
	• BibTeX is designed for the sciences, that typically only cite  
secondary academic literature. It is thus inadequate for, nor widely  
used, in many fields outside of the sciences: the humanities and law  
being quite obvious examples. For this reason, BibTeX cannot by  
default adequately represent even the use cases Ian has identified.  
For example, there are many citations on Wikipedia that can only be  
represented using effectively useless types such as “misc” and which  
require new properties to be invented.


This doesn't mean that BibTeX is a bad basis. The set of types and  
fields is limited, though.


Since renderings of bibliography don't show the type of the reference  
usually, having to use 'misc' for almost everything isn't a practical  
problem although it is aesthetically displeasing.


The set of fields is more of an issue, but it can be fixed by  
inventing more fields--it doesn't mean the whole base solution needs  
to be discarded. Fortunately, having custom fields in .bib doesn't  
break existing pre-Web, pre-ISBN bibliography styles. I've used at  
least these custom fields:


key: Show this citation pseudo-id in rendering instead of the actual  
id used for matching.

url: The absolute URL of a resource that is on the Web.
refdate: The date when the author made the reference to an ephemeral  
source such as a Web page.

isbn: The ISBN of a publication.
stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E)

Particularly the 'url' and 'isbn' field names should be obvious and  
uncontroversial additions.


	• Related, BibTeX cannot represent much of the data in widely used  
bibliographic applications such as Endnote, RefWorks and Zotero  
except in very general ways.


Do you have an example? (I've never used the other formats.)

	• The BibTeX extensibility model puts a rather large burden on  
inventing new properties to accommodate data not in the core model.  
For example, the core model has no way to represent a DOI identifier  
(this is no surprise, as BibTeX was created before DOIs existed). As  
a  consequence, people have gradually added this to their BibTeX  
records and styles in a more ad hoc way. This ad hoc approach to  
extensibility has one of two consequences: either the vocabulary  
terms are understood as completely uncontrolled strings, or one  
needs to standardize them. If we assume the first case, we introduce  
potential interoperability problems.


In practice, those problems have already been introduced. For some  
reason I don't understand, there's an existing pattern of calling a  
field 'doi' but putting an absolute URI in the value. (As opposed to  
using a field name 'url' or a value that contains only the DOI- 
significant part.)


If we assume the second, we have an organizational and process  
problem: that the WHATWG and/or the W3C—neither of which have  
expertise in this domain—become the gate-keepers for such  
extensions. In either case, we have a rather brittle and  
anachronistic approach to extension.


Problems of this nature haven't stopped the WHATWG in the past. :-)

	• The BibTeX model conflicts with Dublin Core and with vCard, both  
of which are quite sensibly used elsewhere in the microdata spec to  
encode information related to the document proper. There seems  
little justification in having two different ways to represent a  
document  depending on whether on it is THIS document or THAT  
document.


When you are referring to THAT document, you generally want the names  
of the authors--not their full business cards. Therefore, vCard is an  
overkill, and conversion to .bib is more useful than conversion to  
vCard for this use case.



My suggestion instead?
	• reuse Dublin Core and vCard for the generic data: titles,  
creators/contributors, publisher, dates, part/version relations,  
etc.,  and only add those properties (volume, issue, pages, editors,  
etc.) that they omit


This would make conversion to and from the dominant bibliography  
format (.bib) more complex. Furthermore, there's a risk of a GIGO  
effect where the conversion can't be done algorithmically. (IIRC, you  
can't algorithmically map a .bib author name to the vCard name  
structure without a huge dictionary of names.)


	• typing should NOT be handled a bibtex-type property, but the same  
way everything else is typed in the microdata proposal: a global  
identifier


Why is typing even needed except for separating articles from  
compilations?


	• make it possible for people to 

Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
Hi Henri,

On Thu, May 21, 2009 at 4:00 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 20, 2009, at 19:24, Bruce D'Arcus wrote:

 Re: the recent microdata work and the subsequent effort to include
 BibTeX in the spec, I summarized my argument against this on my blog:


 http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5

 Quoting from the blog post:

 On the last use case, he has chosen BibTeX, on the basis that it is widely
 used and simple to author and process.

 Those are good criteria.

Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.

        • BibTeX is designed for the sciences, that typically only cite
 secondary academic literature. It is thus inadequate for, nor widely used,
 in many fields outside of the sciences: the humanities and law being quite
 obvious examples. For this reason, BibTeX cannot by default adequately
 represent even the use cases Ian has identified. For example, there are many
 citations on Wikipedia that can only be represented using effectively
 useless types such as “misc” and which require new properties to be
 invented.

 This doesn't mean that BibTeX is a bad basis. The set of types and fields is
 limited, though.

It's limited, and it's flat.

 Since renderings of bibliography don't show the type of the reference
 usually, having to use 'misc' for almost everything isn't a practical
 problem although it is aesthetically displeasing.

But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.

Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher. Surely that should not limit how we address this
going forward?

 The set of fields is more of an issue, but it can be fixed by inventing more
 fields--it doesn't mean the whole base solution needs to be discarded.
 Fortunately, having custom fields in .bib doesn't break existing pre-Web,
 pre-ISBN bibliography styles. I've used at least these custom fields:

 key: Show this citation pseudo-id in rendering instead of the actual id used
 for matching.
 url: The absolute URL of a resource that is on the Web.
 refdate: The date when the author made the reference to an ephemeral source
 such as a Web page.
 isbn: The ISBN of a publication.
 stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E)

 Particularly the 'url' and 'isbn' field names should be obvious and
 uncontroversial additions.

Trust me: this is not nearly as simple as you think. More below ...

        • Related, BibTeX cannot represent much of the data in widely used
 bibliographic applications such as Endnote, RefWorks and Zotero except in
 very general ways.

 Do you have an example? (I've never used the other formats.)

Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

https://www.zotero.org/trac/wiki/BiboMapping

Here's some info on Microsoft's bib format for OOXML, that will give
you some info:

http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14

Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup

Here's the variable list:

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup

        • The BibTeX extensibility model puts a rather large burden on
 inventing new properties to accommodate data not in the core model. For
 example, the core model has no way to represent a DOI identifier (this is no
 surprise, as BibTeX was created before DOIs existed). As a  consequence,
 people have gradually added this to their BibTeX records and styles in a
 more ad hoc way. This ad hoc approach to extensibility has one of two
 consequences: either the vocabulary terms are understood as completely
 uncontrolled strings, or one needs to standardize them. If we assume the
 first case, we introduce potential interoperability problems.

 In practice, those problems have already been introduced. For some reason I
 don't understand, there's an existing pattern of calling a field 'doi' but
 putting an absolute URI in the value. (As opposed to using a field name
 'url' or a value that contains only the DOI-significant part.)

The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a classic work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.


Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
Oops; two quick things ...

On Thu, May 21, 2009 at 8:02 AM, Bruce D'Arcus bdar...@gmail.com wrote:


 Citation and bibliographic formatting conventions do include
 information that suggests type; it's not that it requires a human
 reader to decipher.

I meant it's JUST that ...



 Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
 few others; PO from the BBC, and SIOC):

 https://www.zotero.org/trac/wiki/BiboMapping

FWIW, the Zotero types here refer to what's in their UI ATM. They
will, however, be moving to a more flexible and relational UI model
here that more closely reflects the BIBO model. Reason? Users were
asking for things not easily accommodated in the current, flat,
approach (example: a review might be published in a newspaper or a
journal, or broadcast on the radio on a podcast).

Bruce


Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Henri Sivonen

On May 21, 2009, at 15:02, Bruce D'Arcus wrote:


Except the assumption that BIbTeX is widely used is overdrawn once you
get out of the technology and sciences sectors.


OK.

This doesn't mean that BibTeX is a bad basis. The set of types and  
fields is

limited, though.


It's limited, and it's flat.


In order to not get completely ignored in the technology and sciences  
sectors, a bibliography microdata format needs to be able to plug into  
the network effects of BibTeX. Having a non-flat microdata format  
while BibTeX remains flat would seriously hinder conversions from  
microdata to BibTeX.


How are non-flat bibliographies (beyond an article being in a book /  
journal / Web site) presented?



Since renderings of bibliography don't show the type of the reference
usually, having to use 'misc' for almost everything isn't a practical
problem although it is aesthetically displeasing.


But this is not the point of adding structured data to HTML; it's to
allow it be extracted, and subsequently processed, as data.


More to the point, allow to be extracted and used as bibliography  
source data for another publication to avoid repetitive data entry.



Citation and bibliographic formatting conventions do include
information that suggests type; it's not that it requires a human
reader to decipher.


OK. The styles that I've observed make a difference that isn't  
traceable to the availability of fields on an item have mainly made a  
distinction between atomic publications and compilations.


   • Related, BibTeX cannot represent much of the data in  
widely used
bibliographic applications such as Endnote, RefWorks and Zotero  
except in

very general ways.


Do you have an example? (I've never used the other formats.)


Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
few others; PO from the BBC, and SIOC):

https://www.zotero.org/trac/wiki/BiboMapping


On the surface, it seems that it would possible to mint more field  
types and publications for BibTeX to support those cases, but what is  
the publication type information used for? Are there as many different  
entry presentations as there are entry types? Or are the type tokens  
supposed to be mapped to localized human-readable label strings?


Also, the non-flatness I see is an item being part of a compilation  
which is already supported by BibTeX without allowing the whole model  
to generalize into a graph.



Here's some info on Microsoft's bib format for OOXML, that will give
you some info:

http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 



It seems relatively straight-forward technically to extend BibTeX with  
the field types from OOXML that BibTeX doesn't cover. The main issue  
seems to be the bikeshed of what names to use.



Here's the type schema for CSL (though it needs work, and we
de-emphasize this for formatting in any case; CSL is oriented towards
output formatting only really):

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup 



Here's the variable list:

http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup 



I don't see a fundamental reason why the BibTeX vocabulary couldn't be  
extended with stuff from there.


   • The BibTeX extensibility model puts a rather large burden  
on
inventing new properties to accommodate data not in the core  
model. For
example, the core model has no way to represent a DOI identifier  
(this is no
surprise, as BibTeX was created before DOIs existed). As a   
consequence,
people have gradually added this to their BibTeX records and  
styles in a
more ad hoc way. This ad hoc approach to extensibility has one of  
two
consequences: either the vocabulary terms are understood as  
completely
uncontrolled strings, or one needs to standardize them. If we  
assume the

first case, we introduce potential interoperability problems.


In practice, those problems have already been introduced. For some  
reason I
don't understand, there's an existing pattern of calling a field  
'doi' but
putting an absolute URI in the value. (As opposed to using a field  
name

'url' or a value that contains only the DOI-significant part.)


The point is, when you get beyond dealing with secondary literature
(the domain of BibTeX and the sciences), the range of possible data
expands significantly. Things can get really complicated.

Consider what's actually pretty simple comparatively:

An English translation of a classic work. You often need original
publication information such as title (in the original language),
publisher and issued date, etc.

With a flat model, you have to invent new properties to accommodate
every little exception like this.


What formats/software do people use for cases like that in practice?

If we assume the second, we have an organizational and process  
problem:
that the WHATWG and/or the W3C—neither of which 

Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Bruce D'Arcus
On Thu, May 21, 2009 at 9:51 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 21, 2009, at 15:02, Bruce D'Arcus wrote:

 Except the assumption that BIbTeX is widely used is overdrawn once you
 get out of the technology and sciences sectors.

 OK.

 This doesn't mean that BibTeX is a bad basis. The set of types and fields
 is
 limited, though.

 It's limited, and it's flat.

 In order to not get completely ignored in the technology and sciences
 sectors, a bibliography microdata format needs to be able to plug into the
 network effects of BibTeX. Having a non-flat microdata format while BibTeX
 remains flat would seriously hinder conversions from microdata to BibTeX.

All that matters from a BIbTeX perspective is that the data is a clean
superset. E.g. so long as a book, chapter, article, etc. can be
reliably converted to and from BibTeX, there's no problem.

The same is true of all the other bib formats out there: RIS, NLM,
MODS, PRISM, OOXML, etc.

 How are non-flat bibliographies (beyond an article being in a book / journal
 / Web site) presented?

A journal article is always a good example. If you like, take a look
at the RDFa embedded in this example:

http://bruce.darcus.name/publications/articles/outside-agitator

Now, let's consider the most basic and important distinction: how you
represent the journal title.

In BibTeX, it's (typically) a flat journal key.

In the DC/BIBO representation here, you use a dc:isPartOf relation, so
that the triples look like:

http://bruce.darcus.name/publications/articles/outside-agitator a
bibo:AcademicArticle ;
dc:title Dissent, Public Space and the Politics of Citizenship:
Riots and the Outside Agitator@en ;
bibo:doi 10.1080/1356257042000309652 ;
bibo:issue 3 ;
bibo:pageEnd 370 ;
bibo:pageStart 355 ;
bibo:volume 8 ;
dc:creator http://bruce.darcus.name/about#me ;
dc:isPartOf [ dc:title Space amp; Polity ] .

So that same mechanism can be used to represent related titles of all
sorts: weblogs, magazines and newspapers, court reporters (which are
really just periodicals that published legal decisions), etc.

The alternative in a totally flat model is having to invent new title
properties every time you come across new data (or using a more
generic key than journal to represent the containing title).

I explain the basic thinking behind this using some actual examples
from citation styles here:

http://www.users.muohio.edu/darcusb/misc/citations-spec.html

They're really just design notes, but I think communicate the point.

 Since renderings of bibliography don't show the type of the reference
 usually, having to use 'misc' for almost everything isn't a practical
 problem although it is aesthetically displeasing.

 But this is not the point of adding structured data to HTML; it's to
 allow it be extracted, and subsequently processed, as data.

 More to the point, allow to be extracted and used as bibliography source
 data for another publication to avoid repetitive data entry.

Yes.

 Citation and bibliographic formatting conventions do include
 information that suggests type; it's not that it requires a human
 reader to decipher.

 OK. The styles that I've observed make a difference that isn't traceable to
 the availability of fields on an item have mainly made a distinction between
 atomic publications and compilations.

Yes. But you also have styles that have conventions like if you have
a book, format title in italics, else ... So there are little hints
like that which give a (human) reader information they can use to find
the source in question.

As the creator of CSL, I've always said my intention is to contribute
toward helping us move beyond some of these eccentric traditions,
though!

       • Related, BibTeX cannot represent much of the data in widely used
 bibliographic applications such as Endnote, RefWorks and Zotero except
 in
 very general ways.

 Do you have an example? (I've never used the other formats.)

 Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a
 few others; PO from the BBC, and SIOC):

 https://www.zotero.org/trac/wiki/BiboMapping

 On the surface, it seems that it would possible to mint more field types and
 publications for BibTeX to support those cases, but what is the publication
 type information used for? Are there as many different entry presentations
 as there are entry types? Or are the type tokens supposed to be mapped to
 localized human-readable label strings?

It depends. For Zotero, a lot of it is about mapping to particular UI
configurations for data entry and editing.

But they can also be used for mapping to output styling as defined in
CSL (which is loosely inspired by BibTeX's BST language, but is XML).

 Also, the non-flatness I see is an item being part of a compilation which is
 already supported by BibTeX without allowing the whole model to generalize
 into a graph.

Where is the generic BibTeX key to denote a containing item? There's
no publication-title or 

Re: [whatwg] on bibtex-in-html5

2009-05-21 Thread Edward O'Connor
 Both FOAF and vCard have unstructured personal name properties
 (foaf:name and v:fn) that address this.

 But vCard required both N and FN, so if you only have FN, you can't get an N
 without a lot of dictionary-based domain knowledge and special rules. (Or
 you can make a GIGO N...)

 Hmm ... that's not how it's implemented in hcard.

It is, actually. hCard requires both FN and N, but allows N to be
implied by FN in some cases.

http://microformats.org/wiki/hcard#Implied_.22n.22_Optimization


Ted


[whatwg] on bibtex-in-html5

2009-05-20 Thread Bruce D'Arcus
Re: the recent microdata work and the subsequent effort to include
BibTeX in the spec, I summarized my argument against this on my blog:

http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5

I think it's fair to say that the Zotero project [1] agrees with my
thoughts; Dan Cohen just said as much in an email, where he expressed
concern both with the introduction of a new generic metadata-in-HTML
spec alongside RDFa, and also the use of BibTeX in particular.

Thanks to Ian for the email conversation on this, BTW, and for taking
the basic use case seriously.

Bruce

PS - Brief background on me: am a professional scholar (a social
scientist) [2], and author or co-author of some relevant work in this
area: the Bibliographic Ontology [3], and the Citation Style Language
(CSL) [4], both of which have been collaborations with Zotero (among
others). I also had a major hand in the new RDF/RDFa-based extensible
metadata support in OpenDocument 1.2. So I have quite a bit of
practical experience on different sides of this: both user and
developer.

[1] http://zotero.org
[2] http://bruce.darcus.name
[3] http://bibliontology.com/
[4] http://xbiblio.sourceforge.net/csl/