Re: [whatwg] on bibtex-in-html5
Based on the feedback below, I've removed the BibTeX vocabulary from HTML5. The primary use case -- enabling drag-and-drop in a manner that the target document could automatically add a reference to the source document -- can still be done between cooperating sources, it's just no longer a first-class citizen in the automatically generated drag-and-drop JSON object. (The previous mechanism found relevant citation information in the page or section footer and automatically included that.) I would encourage people interested in enabling this use case to develop a format for this to expose in the drag-and-drop API, along with some scripts to enable it. This doesn't really require built-in support so long as scripting is enabled; the APIs do provide the power to do this already. On Wed, 10 Jun 2009, Simon Spiegel wrote: Most of them are defined as aliases and are handled just fine by biblatex. For example, journal works just as fine as journaltitle. While there may be small differences they're definitely not essential. In real life, most of the bibtex data publicly available differs to pure bibtex in about the same degree. There are very few places where you can get 100% correct bibtex. Biblatex certainly doesn't bring a new level of incompability here. My original point was just that it seems unnecessarily incompatible with BibTeX, and that the latter appears to have more deployed support. I disagree that using the same term to mean something else (as in the inbook case) is a small difference that is not essential. Are walking an a theoretical level what would be best in principle, or do we talk about what actually happens? From the fact that you originally chose BibTeX I inferred that you want to go for a practical solution which takes account of what is used in the real world. Now if we do that, we also must take a look what actually happens in the real world. And although this may just be anecdotal evidence I can assure that according to my experience a) 100% correct BibTeX is the exception and b) that the compability problems between BibTeX data that you can download from various sites and biblatex is no big deal. About every BibTeX style introduces its own quirks, in the majority of cases you have to clean your data anyway after you downloaded it. So I really don't see a fundamental problem here. But I certainly do see a fundamental problem � both theoretical and practical � if you go for a standard which is limited in major ways and which from the start excludes about everyhing which is not english speaking hard science. There will always be a tradeoff, the question is which is the lesser evil. On Wed, 10 Jun 2009, Simon Spiegel wrote: On 10.06.2009, at 11:44, Ian Hickson wrote: On Wed, 20 May 2009, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on- the-inclusion-of-bibtex-in-html5 | 1. BibTeX is designed for the sciences, that typically only cite |secondary academic literature. It is thus inadequate for, nor widely |used, in many fields outside of the sciences: the humanities and law |being quite obvious examples. For this reason, BibTeX cannot by |default adequately represent even the use cases Ian has identified. |For example, there are many citations on Wikipedia that can only be |represented using effectively useless types such as misc and which |require new properties to be invented. We will probably have to increase the coverage in due course, yes. However, we should verify that the mechanism works in principle before investing the time to extend the vocabulary. I really don't think that a body like WHATWG is suited for this task. Especially since other groups have already been working on this exact issue. | 2. Related, BibTeX cannot represent much of the data in widely used |bibliographic applications such as Endnote, RefWorks and Zotero except |in very general ways. If such data is important, we can always add support when this becomes clear. What does this mean? When would it become clear? BibTeX's deficits have been clear for ages. About everyone who works in humanities knows that every bibliographic solution which has been introduced in the past was too limited. Why do we have to go through the same things over and over again? The problems of the current standards are known, that's why new solutions like biblatex or the bibliographic ontology have been developped. On Wed, 10 Jun 2009, Bruce D'Arcus wrote: No; you should drop this proposal and move it to an experimental annex. If you do insist, against all reason, in pushing forward with this without modification, then I suggest you
Re: [whatwg] on bibtex-in-html5
On Wed, 10 Jun 2009, Julian Reschke wrote: Ian Hickson wrote: So far based on my experience with the Workers, Storage, Web Sockets, and Server-sent Events sections, I'm not convinced that the advantage of getting more review is real. Those sections in particular got more review while in the HTML5 spec proper than they have since. So you are putting stuff you're personally interested in into the HTML5 spec, so that people read it? No; but I don't think taking things out of the HTML5 spec gets them more review, so if someone did want to reorganise the specs to get more review, I don't think that would be the way to do it. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] on bibtex-in-html5
On 11.06.2009, at 00:44, Jonas Sicking wrote: On Wed, Jun 10, 2009 at 3:12 AM, Julian Reschkejulian.resc...@gmx.de wrote: Ian Hickson wrote: ... So far based on my experience with the Workers, Storage, Web Sockets, and Server-sent Events sections, I'm not convinced that the advantage of getting more review is real. Those sections in particular got more review while in the HTML5 spec proper than they have since. ... So you are putting stuff you're personally interested in into the HTML5 spec, so that people read it? Calling it stuff Ian is personally interested in seems unnecessarily inflammatory. This are all use cases that other people have put forward. However, as others, I'd prefer to see these things developed elsewhere. Mostly because the group of people with expertise in developing a better version of bibtex is not the people in this WG. I completely agree with this conclusion. I also think that it would be a big mistake to include bibtex and then extend it later as Ian has suggested. Let me give a concrete example, take the following biblipgraphic entry: Doe, John: Foreword. In: Doe, Jane: The Book. Middle-Earth 2008. What we have here is a chapter by an author in a book by someone else. This someone else is not the editor though, but the author of the book, This kind of text is fairly common in my field but it cannot be expressed in bibtex since bibtex originally only has fields for 'author' and 'editor ', but not for 'bookauthor'. According to Ian, something like this could be covered by extending the bibtex vocabulary. For me, two problems pop up here: Who will decide how the vocabulary gets extended? And on what will these decisions be based? Now lets say that some kind of process to extend the bibtex vocabulary can be established and that the addition of a 'bookauthor' field will be decided. The problem then is that something gets added to bibtex which no existing bibtex style (and no other tool which can import bibtex) knows about. AFAIK only biblatex has a 'bookauthor' field. In other words: We then have data which is not useable with the traditional bibtex tools (they don't break, they just wont process the new fields). If bibtex gets extended (which would be absolutely necessary since all kind of additional fields are needed), we unavoidably end up with some kind of superbibtex which no tool in the world can process. In other words: We then have a new format which looks like bibtex but which cannot be used in a traditional bibtex workflow. At this point the whole argument why bibtex should be used in this spec breaks down. Ian is in favor of bibtex because it is widely used; but if we unavoidably end up with an unuseable superbibtex, this argument becomes moot. If compatibility to existing formats is the main objective, we simply can't extend an old format like bibtex. If the goal is to cover substantially more than bibtex does, we need a different format. Simon -- Simon Spiegel Steinhaldenstr. 50 8002 Zürich Telephon: ++41 44 451 5334 Mobophon: ++41 76 459 60 39 http://www.simifilm.ch „In a world getting more and more democratic, film directing is the last resort for dictators.“ Francis Ford Coppola
Re: [whatwg] on bibtex-in-html5
2009/6/3 Bruce D'Arcus bdar...@gmail.com: Newspaper articles are cited a LOT; they're all over the place on wikipedia. And this doesn't even get into patents, or hearing transcripts, or legal opinions, or films. We need to be able to represent all of these, and bibtex is of little help here. I was about to mention Wikipedia! The citation templates there would be an excellent set of examples of what a citation format would need to cover in practical use. See: http://en.wikipedia.org/wiki/Category:Citation_templates There's a lot there, but many aren't that heavily used. You can see how many uses there are of a template, or if there are any at all, by going to the template page and clicking on What links here in the sidebar. The ones whose name starts Template:Cite ... include the biggies. These constitute a bunch of special cases, but you'll be pleased to know that similar templates tend to get combined with time. I certainly wouldn't suggest a set of special cases in a spec for this. But these will be useful for ideas and examples of what sort of citations are in demand on the web. - d.
Re: [whatwg] on bibtex-in-html5
On Thu, Jun 11, 2009 at 5:02 AM, David Gerarddger...@gmail.com wrote: 2009/6/3 Bruce D'Arcus bdar...@gmail.com: Newspaper articles are cited a LOT; they're all over the place on wikipedia. And this doesn't even get into patents, or hearing transcripts, or legal opinions, or films. We need to be able to represent all of these, and bibtex is of little help here. I was about to mention Wikipedia! The citation templates there would be an excellent set of examples of what a citation format would need to cover in practical use. See: http://en.wikipedia.org/wiki/Category:Citation_templates I didn't know about this; awesome resource! Bruce
Re: [whatwg] on bibtex-in-html5
On Wed, 20 May 2009, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 | 1. BibTeX is designed for the sciences, that typically only cite |secondary academic literature. It is thus inadequate for, nor widely |used, in many fields outside of the sciences: the humanities and law |being quite obvious examples. For this reason, BibTeX cannot by |default adequately represent even the use cases Ian has identified. |For example, there are many citations on Wikipedia that can only be |represented using effectively useless types such as misc and which |require new properties to be invented. We will probably have to increase the coverage in due course, yes. However, we should verify that the mechanism works in principle before investing the time to extend the vocabulary. | 2. Related, BibTeX cannot represent much of the data in widely used |bibliographic applications such as Endnote, RefWorks and Zotero except |in very general ways. If such data is important, we can always add support when this becomes clear. | 3. The BibTeX extensibility model puts a rather large burden on inventing |new properties to accommodate data not in the core model. For example, |the core model has no way to represent a DOI identifier (this is no |surprise, as BibTeX was created before DOIs existed). As a |consequence, people have gradually added this to their BibTeX records |and styles in a more ad hoc way. This ad hoc approach to extensibility |has one of two consequences: either the vocabulary terms are |understood as completely uncontrolled strings, or one needs to |standardize them. If we assume the first case, we introduce potential |interoperability problems. If we assume the second, we have an |organizational and process problem: that the WHATWG and/or the |W3C-neither of which have expertise in this domain-become the |gate-keepers for such extensions. In either case, we have a rather |brittle and anachronistic approach to extension. I don't see any of this as a problem. | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of |which are quite sensibly used elsewhere in the microdata spec to |encode information related to the document proper. There seems little |justification in having two different ways to represent a document |depending on whether on it is THIS document or THAT document. I don't understand this point. Could you provide an example of this conflict? | 5. Aspects of BibTeX's core model are ambiguous/confusing. For example, |what number does number refer to? Is it a document number, or an |issue number? What's the difference? Why does it matter? | My suggestion instead? | 1. reuse Dublin Core and vCard for the generic data: titles, |creators/contributors, publisher, dates, part/version relations, etc., |and only add those properties (volume, issue, pages, editors, etc.) |that they omit This seems unduly heavy duty (especially the use of vCard for author names) when all that is needed is brief bibliographic entries. | 2. typing should NOT be handled a bibtex-type property, but the same way |everything else is typed in the microdata proposal: a global |identifier Why? | 3. make it possible for people to interweave other, richer, vocabularies |such as bibo within such item descriptions. In other words, extension |properties should be URIs. This is already possible. | 4. define the mapping to RDF of such an item description; can we say, |for example, that it constitutes a dct:references link from the |document to the described source? The mapping to RDF is already defined; further mappings can be done using the sameAs mechanism. On Thu, 21 May 2009, Henri Sivonen wrote: The set of fields is more of an issue, but it can be fixed by inventing more fields--it doesn't mean the whole base solution needs to be discarded. Fortunately, having custom fields in .bib doesn't break existing pre-Web, pre-ISBN bibliography styles. I've used at least these custom fields: key: Show this citation pseudo-id in rendering instead of the actual id used for matching. url: The absolute URL of a resource that is on the Web. refdate: The date when the author made the reference to an ephemeral source such as a Web page. isbn: The ISBN of a publication. stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E) Particularly the 'url' and 'isbn' field names should be obvious and uncontroversial additions. url seems widely supported and I included it. I haven't added any other fields yet; I imagine that once this feature gets traction, we'll have more direct data as to which fields would be most useful, and
Re: [whatwg] on bibtex-in-html5
Ian Hickson wrote: ... So far based on my experience with the Workers, Storage, Web Sockets, and Server-sent Events sections, I'm not convinced that the advantage of getting more review is real. Those sections in particular got more review while in the HTML5 spec proper than they have since. ... So you are putting stuff you're personally interested in into the HTML5 spec, so that people read it? What a cunning plan. BR, Julian
Re: [whatwg] on bibtex-in-html5
Am cc-ing he Zoteor dev list just for posterity ... On Wed, Jun 10, 2009 at 5:44 AM, Ian Hicksoni...@hixie.ch wrote: On Wed, 20 May 2009, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 | 1. BibTeX is designed for the sciences, that typically only cite | secondary academic literature. It is thus inadequate for, nor widely | used, in many fields outside of the sciences: the humanities and law | being quite obvious examples. For this reason, BibTeX cannot by | default adequately represent even the use cases Ian has identified. | For example, there are many citations on Wikipedia that can only be | represented using effectively useless types such as misc and which | require new properties to be invented. We will probably have to increase the coverage in due course, yes. However, we should verify that the mechanism works in principle before investing the time to extend the vocabulary. No; you should drop this proposal and move it to an experimental annex. If you do insist, against all reason, in pushing forward with this without modification, then I suggest you explain how this process of extension will work. If, as I suspect, it'll be another case of a centralized authority (you; who have admitted you really know nothing about this space), then that's a deal-breaker from my perspective. | 2. Related, BibTeX cannot represent much of the data in widely used | bibliographic applications such as Endnote, RefWorks and Zotero except | in very general ways. If such data is important, we can always add support when this becomes clear. Man this is frustrating. | 3. The BibTeX extensibility model puts a rather large burden on inventing | new properties to accommodate data not in the core model. For example, | the core model has no way to represent a DOI identifier (this is no | surprise, as BibTeX was created before DOIs existed). As a | consequence, people have gradually added this to their BibTeX records | and styles in a more ad hoc way. This ad hoc approach to extensibility | has one of two consequences: either the vocabulary terms are | understood as completely uncontrolled strings, or one needs to | standardize them. If we assume the first case, we introduce potential | interoperability problems. If we assume the second, we have an | organizational and process problem: that the WHATWG and/or the | W3C-neither of which have expertise in this domain-become the | gate-keepers for such extensions. In either case, we have a rather | brittle and anachronistic approach to extension. I don't see any of this as a problem. The problem, to repeat myself again, is related to the above we'll extend it as we see fit issue. The two biggest problems in bibtex are two properties: book journal They're a problem because they're both horribly concrete/narrow, and (arguably) redundant. If those were instead replaced with something more generic like either: 1) publication-title ... or, better yet ... 2) a nested/related object (call it publication or container or isPartOf) ... then extension becomes easier. If I need to encode a newspaper article, then I just do: title = Some Article publication-title = Some Newspaper .. or (better, because I can attach other information to the container): title = Some Article publication = [ title = Some Newspaper ] As is, you need to add stuff like this just to resolve the problems I've repeayedly pointed out: newspaper-title magazine-title court-reporter-title television-program-title radio-program-title Aside: of course, some of the above could be collapsed into more generic stuff like broadcast-title, but I'm just following the same, broken, approach as bibtex. This stuff isn't theoretical Ian. Just look through this wikipedia page, for example: http://en.wikipedia.org/wiki/Guantanamo_Bay_detention_camp The citations include references to legal cases and briefs, and news articles (television, radio and print). Your proposal doesn't cover this stuff. OTOH, applications like Zoteor can. | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of | which are quite sensibly used elsewhere in the microdata spec to | encode information related to the document proper. There seems little | justification in having two different ways to represent a document | depending on whether on it is THIS document or THAT document. I don't understand this point. Could you provide an example of this conflict? Here's an academic article in an open access biology journal. http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.182 THIS article refers to the metadata about the document proper, with the title Accelerated Adaptive Evolution
Re: [whatwg] on bibtex-in-html5
| 1. BibTeX is designed for the sciences, that typically only cite | Â Â secondary academic literature. It is thus inadequate for, nor widely | Â Â used, in many fields outside of the sciences: the humanities and law | Â Â being quite obvious examples. For this reason, BibTeX cannot by | Â Â default adequately represent even the use cases Ian has identified. | Â Â For example, there are many citations on Wikipedia that can only be | Â Â represented using effectively useless types such as misc and which | Â Â require new properties to be invented. We will probably have to increase the coverage in due course, yes. However, we should verify that the mechanism works in principle before investing the time to extend the vocabulary. No; you should drop this proposal and move it to an experimental annex. If you do insist, against all reason, in pushing forward with this without modification, then I suggest you explain how this process of extension will work. If, as I suspect, it'll be another case of a centralized authority (you; who have admitted you really know nothing about this space), then that's a deal-breaker from my perspective. Related to this I want to remark some things on a more general level: We currently experience major changes in the world of bibliographic software. At least, this is how I experience it. After years of limited and/or closed formats and models like BibTeX or Endnote we finally see new models like CSL or biblatex emerging which try to learn from the lessons from the past. Of course, I do not know how things will evolve, but looking at the success of solutions like Zotero I think it's not so bold to say that things will change quite a bit in the coming years. And then we have HTML5, an emerging standard which is now getting support by the newest and latest browsers. I do know even less how HTML5 will evolve, what impact it will have on the web. But it's probably fair to say that widespread adoption of HTML5 will not happen overnight. Honestly, I really don't get why a coming web standard should support a bibliographic standard which is obviously outdated. The fact that BibTeX is widely used is really a non argument, because if we follow this logic we wont have any development. By the same logic you should avoid something like video after all, there isn't any support for it *yet*. If HTML5 wants to be forward-looking, it certainly shouldn't adopt a twenty years old standard but should instead try to support something new which is really up to date and has chance if being useful in the future. simon
Re: [whatwg] on bibtex-in-html5
On Tue, Jun 2, 2009 at 12:05 PM, James Graham jgra...@opera.com wrote: Bruce D'Arcus wrote: So exactly what is the process by which this gets resolved? Is there one? Hixie will respond to substantive emails sent to this list at some point. However there are some hundreds of outstanding emails (see [1]) so the responses can take a while. If you have a pressing deadline that would benefit from your issue being addressed sooner, I suggest you talk to Hixie about it. No problem; I just wanted to know how things worked here. Thanks. FWIW I have a few general thoughts about the bibtex section which may or may not be interesting: 1) It seems like this and similar sections (bibtex, vCard, iCalendar) could be productively split out of the main spec into separate normative documents, since they are rather self-contained and have rather obvious interest for communities who are unlikely to find them at present or to be interested in the rest of the spec. +1 to splitting them off. I think there's still an open question, however, about whether any of these—and particularly the bibliographic one (at least as it's currently specified)—should be normative. I don't believe they should be. But, moving on ... Although the drag and drop stuff being dependent on them does mean that you'd need some circular references. 2) For the bibliographic data the most important issues that I see are ease of use and ease of export. Although I am not attached to the bibtex format per-se I would be extremely disappointed if a different, harder to author, format were used. Formats that are flexible but rarely used are less useful overall than more limited formats with ubiquitous deployment. In addition formats that are hard to use make it more likely that people will make accidental mistakes, so decreasing the reliability of the data and devaluing tools that consume the data. Although I don't think we have to use bibtex as the basis for the format, I do think a canonical mapping to bibtex is a requirement. Obviously this reflects my background in the physical sciences but, at least in that field LaTeX and, by association, bibtex are overwhelmingly popular. I am well aware that the situation in other fields is different but without clean, high fidelity, bibtex export (at least to the extend required to support common citation patterns within the physical sciences) the format will lose out on a large audience with a higher than average number of potential early adopters. Fair enough; all I'm saying is the same deference should be paid to other research fields. The sciences for too long have dominated these discussions, to the detriment of other fields. So I would hope we could avoid that here. Let's move on to a use case of two to illustrate the issues here. Zotero is likely to be an early adopter of microdata as well, certainly as a consumer of these data, and perhaps also as a producer. http://www.zotero.org/ Zotero is a Firefox extension that can import and export BibTeX, among a variety of other formats (RIS, MODS, and the new BIBO/DC RDF work, which is its primary format). It includes a number of components that allow citation and document metadata to be extracted, and later republished. So, for example, a user is browsing the web, and they are reading this article from the NY Times. http://www.nytimes.com/2009/06/03/world/asia/03military.html Zotero has a translator (basically, a dedicated screen-scraper) for the NY Times, and so the user can simply click an icon in their toolbar to extract the metadata into their database. They can then later cite it in their own documents, and Zotero will be responsible for correctly formatting those citations and bibliographic entries. So I have questions on this use case: 1) how do these data about the article get encoded in microdata in such a way that Zotero (or any other similar tool) doesn't have to continue to write and maintain dedicated translators for every site? E.g. how should the newspaper article metadata be encoded? It seems the assumption that bibtex is only for bibliographies leaves that out. Instead, the current draft of the spec tells us the title of the document corresponds to dc:title, and not much else. My argument is to beef up the ability to describe documents in general .* In strawman pseudo-code: title = doc.title type = doc.type source = doc.isPartOf.title # or if not dc:isPartOf, something similarly generic issued = doc.issued creators = doc.creator print creators[0].name ** E.g. don't pretend that document metadata is different than bibliographic metadata. The latter is simply a reference to the former (usually; there are some exceptions where people cite events). 2) If Zotero consumes these data and then the user cites it in their document, and elects to export to HTML5, how should that same newspaper article data be encoded in the bibliography? BibTeX isn't terribly helpful; example:
Re: [whatwg] on bibtex-in-html5
So exactly what is the process by which this gets resolved? Is there one? On Sun, May 24, 2009 at 10:17 AM, Bruce D'Arcus bdar...@gmail.com wrote: On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote: ... I agree that BibTeX is suboptimal. But what should we use instead? As I've suggested: 1) use Dublin Core. This gives you the basic critical properties: literals for titles and dates, and relations for versions, part/containers, contributors, subjects. You then have a consistent and general way to represent (HTML) documents and embedded references to other documents, etc. (citation references). This would cover the most important areas that BibTeX covers. 2) this goes far, but you're then left with a few missing pieces for citations: a. more specific contributors (like editors and translators) b. identifiers (there's dc:identifier, but no way to explicitly denote that it's a doi, isbn, issn, etc.) c. what I call locators; volume, issue, pages, etc. d. types (book, article, patent, etc.) If there's some consensus on this basic way forward, we can talk about details on 2. Bruce
Re: [whatwg] on bibtex-in-html5
Bruce D'Arcus wrote: So exactly what is the process by which this gets resolved? Is there one? Hixie will respond to substantive emails sent to this list at some point. However there are some hundreds of outstanding emails (see [1]) so the responses can take a while. If you have a pressing deadline that would benefit from your issue being addressed sooner, I suggest you talk to Hixie about it. FWIW I have a few general thoughts about the bibtex section which may or may not be interesting: 1) It seems like this and similar sections (bibtex, vCard, iCalendar) could be productively split out of the main spec into separate normative documents, since they are rather self-contained and have rather obvious interest for communities who are unlikely to find them at present or to be interested in the rest of the spec. Although the drag and drop stuff being dependent on them does mean that you'd need some circular references. 2) For the bibliographic data the most important issues that I see are ease of use and ease of export. Although I am not attached to the bibtex format per-se I would be extremely disappointed if a different, harder to author, format were used. Formats that are flexible but rarely used are less useful overall than more limited formats with ubiquitous deployment. In addition formats that are hard to use make it more likely that people will make accidental mistakes, so decreasing the reliability of the data and devaluing tools that consume the data. Although I don't think we have to use bibtex as the basis for the format, I do think a canonical mapping to bibtex is a requirement. Obviously this reflects my background in the physical sciences but, at least in that field LaTeX and, by association, bibtex are overwhelmingly popular. I am well aware that the situation in other fields is different but without clean, high fidelity, bibtex export (at least to the extend required to support common citation patterns within the physical sciences) the format will lose out on a large audience with a higher than average number of potential early adopters. [1] http://www.whatwg.org/issues/data.html On Sun, May 24, 2009 at 10:17 AM, Bruce D'Arcus bdar...@gmail.com wrote: On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote: ... I agree that BibTeX is suboptimal. But what should we use instead? As I've suggested: 1) use Dublin Core. This gives you the basic critical properties: literals for titles and dates, and relations for versions, part/containers, contributors, subjects. You then have a consistent and general way to represent (HTML) documents and embedded references to other documents, etc. (citation references). This would cover the most important areas that BibTeX covers. 2) this goes far, but you're then left with a few missing pieces for citations: a. more specific contributors (like editors and translators) b. identifiers (there's dc:identifier, but no way to explicitly denote that it's a doi, isbn, issn, etc.) c. what I call locators; volume, issue, pages, etc. d. types (book, article, patent, etc.) If there's some consensus on this basic way forward, we can talk about details on 2. Bruce
Re: [whatwg] on bibtex-in-html5
On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote: ... I agree that BibTeX is suboptimal. But what should we use instead? As I've suggested: 1) use Dublin Core. This gives you the basic critical properties: literals for titles and dates, and relations for versions, part/containers, contributors, subjects. You then have a consistent and general way to represent (HTML) documents and embedded references to other documents, etc. (citation references). This would cover the most important areas that BibTeX covers. 2) this goes far, but you're then left with a few missing pieces for citations: a. more specific contributors (like editors and translators) b. identifiers (there's dc:identifier, but no way to explicitly denote that it's a doi, isbn, issn, etc.) c. what I call locators; volume, issue, pages, etc. d. types (book, article, patent, etc.) If there's some consensus on this basic way forward, we can talk about details on 2. Bruce
Re: [whatwg] on bibtex-in-html5
If markup for a publication identifier in a reference is required, can this identifier be an URN-encoded? The NID will tell what kind of an identifier it is. I have used q cite=urn:ISBN:whatever myself, perhaps not quite in line with the definition of the Q element but, since the cite attribute in XHTML is not universal (as it is in XHTML2), I have not been able to find a better container element to attach an URN to it. IMHO, Chris
Re: [whatwg] on bibtex-in-html5
On Sun, May 24, 2009 at 12:35 PM, Kristof Zelechovski giecr...@stegny.2a.pl wrote: If markup for a publication identifier in a reference is required, can this identifier be an URN-encoded? The NID will tell what kind of an identifier it is. I have used q cite=urn:ISBN:whatever myself, perhaps not quite in line with the definition of the Q element but, since the cite attribute in XHTML is not universal (as it is in XHTML2), I have not been able to find a better container element to attach an URN to it. Man, this is a big can-of-worms. Clearly, yes, it'd be a goal to work towards that citation sources be identified by URI, and that ideally those URIs are HTTP resolvable. Indeed, we've been talking a lot about that at the Zotero project. But that's not inconsistent with this discussion (except that BibTeX uses a local ID, which conflicts with this goal). Bruce
Re: [whatwg] on bibtex-in-html5
Sorry, for my intrusion on this list. I realize that it's cheeky to come to a list only to rant about a specific detail, but I feel that more support for Bruce's position is needed. Just a bit about my background: I don't have any technical training or expertise in software or programming. I'm a scholar in humanities (film studies and German literature) and wrote my PhD thesis in film studies using LaTeX. Although I'm not a programmer by any means, I consider myself an 'advanced user' and quote well informed in terms of bibliographic software. There aren't a lot of bibliographic softwares or other solutions I haven't had a look in the last couple of years. After this introduction, let me just state one thing: To base any kind of future software on BibTeX would be really like using ASCII instead of UT8. Yes, it's really that bad. BibTeX is now almost 20 years old and its shortcomings are well known and have been discussed endlessly. It has an extremely limited model which basically only covers English speaking sciences. As soon as you leave this area (like I have to do daily), you're out of luck with traditional BibTeX. Sure, there are all kind of extensions, but most of them are limited as well and none of them is standardized. I just say ‘bookauthor’. Until recently, no BibTeX supported this field, although it's really a basic thing for humanities (Now, if anyone asks why you would need a 'bookauthor' field, I have only one thing to answer: Find out what is needed in different disciplines before settling on a standard). It's a sad fact that the same mistakes are repeated over and over again in the area of bibliographic software. It seems like a natural law that every new software solution dealing with bibliographies always has to start with an extremely limited set of fields like BibTeX. It took nearly two decades until biblatex got rid of most of the basic shortcomings of BibTeX, but somehow other projects don’t seem to learn from this. It doesn't have to be this way. The problems of the existing solutions are known, alternatives do exist. So please hear my plea: Don't go with an ancient model whose shortcomings are well known but use something modern instead. If you absolutely have to use BibTeX, please use at least biblatex which covers most of the problems of traditional BibTeX. simon -- Simon Spiegel Steinhaldenstr. 50 8002 Zürich Telephon: ++41 44 451 5334 Mobophon: ++41 76 459 60 39 http://www.simifilm.ch „Goethen getroffen. Beeindruckt.“ Unbekannt
Re: [whatwg] on bibtex-in-html5
On Sat, 23 May 2009, Simon Spiegel wrote: Sorry, for my intrusion on this list. I realize that it's cheeky to come to a list only to rant about a specific detail, but I feel that more support for Bruce's position is needed. Just a bit about my background: I don't have any technical training or expertise in software or programming. I'm a scholar in humanities (film studies and German literature) and wrote my PhD thesis in film studies using LaTeX. Although I'm not a programmer by any means, I consider myself an 'advanced user' and quote well informed in terms of bibliographic software. There aren't a lot of bibliographic softwares or other solutions I haven't had a look in the last couple of years. After this introduction, let me just state one thing: To base any kind of future software on BibTeX would be really like using ASCII instead of UT8. Yes, it's really that bad. BibTeX is now almost 20 years old and its shortcomings are well known and have been discussed endlessly. It has an extremely limited model which basically only covers English speaking sciences. As soon as you leave this area (like I have to do daily), you're out of luck with traditional BibTeX. Sure, there are all kind of extensions, but most of them are limited as well and none of them is standardized. I just say ‘bookauthor’. Until recently, no BibTeX supported this field, although it's really a basic thing for humanities (Now, if anyone asks why you would need a 'bookauthor' field, I have only one thing to answer: Find out what is needed in different disciplines before settling on a standard). It's a sad fact that the same mistakes are repeated over and over again in the area of bibliographic software. It seems like a natural law that every new software solution dealing with bibliographies always has to start with an extremely limited set of fields like BibTeX. It took nearly two decades until biblatex got rid of most of the basic shortcomings of BibTeX, but somehow other projects don’t seem to learn from this. It doesn't have to be this way. The problems of the existing solutions are known, alternatives do exist. So please hear my plea: Don't go with an ancient model whose shortcomings are well known but use something modern instead. If you absolutely have to use BibTeX, please use at least biblatex which covers most of the problems of traditional BibTeX. I agree that BibTeX is suboptimal. But what should we use instead? (The biblatex vocabulary seems unnecessarily incompatible with BibTeX's, and the latter appears to have more deployed support, which was one of the primary concerns that led to its vocabulary being picked.) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] on bibtex-in-html5
On Sat, May 23, 2009 at 5:35 PM, Ian Hickson i...@hixie.ch wrote: ... I agree that BibTeX is suboptimal. But what should we use instead? (The biblatex vocabulary seems unnecessarily incompatible with BibTeX's, and the latter appears to have more deployed support, which was one of the primary concerns that led to its vocabulary being picked.) I think if you really insist on including a bibliographic vocabulary in the HTML 5 spec (which as I've said, I don't really agree with, precisely because this is hard stuff), then you need to reassess/clarify the requirements a bit. For example, what do you mean by unnecessarily incompatible with BibTeX? If you simply mean it's a superset, and that therefore going from biblatex to bibtex cannot be totally lossless, then that's unavoidable, and I think a requirement that needs changing. Or is there some other aspect of incompatibility you're seeing? Bruce
Re: [whatwg] on bibtex-in-html5
On 23.05.2009, at 23:35, Ian Hickson wrote: On Sat, 23 May 2009, Simon Spiegel wrote: I agree that BibTeX is suboptimal. But what should we use instead? (The biblatex vocabulary seems unnecessarily incompatible with BibTeX's, and the latter appears to have more deployed support, which was one of the primary concerns that led to its vocabulary being picked.) Biblatex is only incompatible insofar as it offers many more fields and possibilites than standard BibTeX. But that's true for about any newer BibTeX style and is a direct consequence from the extreme limitation of standard BibTeX. simon -- Simon Spiegel Steinhaldenstr. 50 8002 Zürich Telephon: ++41 44 451 5334 Mobophon: ++41 76 459 60 39 http://www.simifilm.ch „When you only have a hammer, you tend to see every problem as a nail.“ Abraham Maslow
Re: [whatwg] on bibtex-in-html5
Just to put a fine point on this ... On Thu, May 21, 2009 at 12:11 PM, Bruce D'Arcus bdar...@gmail.com wrote: ... Or consider the user or developer who can't figure out how to represent their data in bibtex-in-html5 because its designers simply didn't consider it. In that case, people go elsewhere, or invent their own solutions. .. a zotero forum post from earlier today: http://forums.zotero.org/discussion/7138/export-patent-information-as-bibtex So, user has a patent item stored in their database. When they go to export it to bibtex, much of the data gets lost. I probably wasn't clear enough in my list posts, but I'd prefer that the bibtex stuff be removed entirely from the spec. It'd be fine, however, if it was in a non-normative document apart from the spec. I'd also prefer that microdata itself be removed in favor of RDFa, but I proposed an alternative approach given the way this has gone so far (namely that this effort seems a fait accompli). Bruce
Re: [whatwg] on bibtex-in-html5
On May 20, 2009, at 19:24, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 Quoting from the blog post: On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Those are good criteria. • BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. The set of fields is more of an issue, but it can be fixed by inventing more fields--it doesn't mean the whole base solution needs to be discarded. Fortunately, having custom fields in .bib doesn't break existing pre-Web, pre-ISBN bibliography styles. I've used at least these custom fields: key: Show this citation pseudo-id in rendering instead of the actual id used for matching. url: The absolute URL of a resource that is on the Web. refdate: The date when the author made the reference to an ephemeral source such as a Web page. isbn: The ISBN of a publication. stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E) Particularly the 'url' and 'isbn' field names should be obvious and uncontroversial additions. • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI- significant part.) If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have a rather brittle and anachronistic approach to extension. Problems of this nature haven't stopped the WHATWG in the past. :-) • The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether on it is THIS document or THAT document. When you are referring to THAT document, you generally want the names of the authors--not their full business cards. Therefore, vCard is an overkill, and conversion to .bib is more useful than conversion to vCard for this use case. My suggestion instead? • reuse Dublin Core and vCard for the generic data: titles, creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they omit This would make conversion to and from the dominant bibliography format (.bib) more complex. Furthermore, there's a risk of a GIGO effect where the conversion can't be done algorithmically. (IIRC, you can't algorithmically map a .bib author name to the vCard name structure without a huge dictionary of names.) • typing should NOT be handled a bibtex-type property, but the same way everything else is typed in the microdata proposal: a global identifier Why is typing even needed except for separating articles from compilations? • make it possible for people to
Re: [whatwg] on bibtex-in-html5
Hi Henri, On Thu, May 21, 2009 at 4:00 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 20, 2009, at 19:24, Bruce D'Arcus wrote: Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 Quoting from the blog post: On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Those are good criteria. Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. • BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. Surely that should not limit how we address this going forward? The set of fields is more of an issue, but it can be fixed by inventing more fields--it doesn't mean the whole base solution needs to be discarded. Fortunately, having custom fields in .bib doesn't break existing pre-Web, pre-ISBN bibliography styles. I've used at least these custom fields: key: Show this citation pseudo-id in rendering instead of the actual id used for matching. url: The absolute URL of a resource that is on the Web. refdate: The date when the author made the reference to an ephemeral source such as a Web page. isbn: The ISBN of a publication. stdnumber: RFC or ISO number. e.g. RFC 2397 or ISO/IEC 10646:2003(E) Particularly the 'url' and 'isbn' field names should be obvious and uncontroversial additions. Trust me: this is not nearly as simple as you think. More below ... • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping Here's some info on Microsoft's bib format for OOXML, that will give you some info: http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 Here's the type schema for CSL (though it needs work, and we de-emphasize this for formatting in any case; CSL is oriented towards output formatting only really): http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup Here's the variable list: http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI-significant part.) The point is, when you get beyond dealing with secondary literature (the domain of BibTeX and the sciences), the range of possible data expands significantly. Things can get really complicated. Consider what's actually pretty simple comparatively: An English translation of a classic work. You often need original publication information such as title (in the original language), publisher and issued date, etc.
Re: [whatwg] on bibtex-in-html5
Oops; two quick things ... On Thu, May 21, 2009 at 8:02 AM, Bruce D'Arcus bdar...@gmail.com wrote: Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. I meant it's JUST that ... Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping FWIW, the Zotero types here refer to what's in their UI ATM. They will, however, be moving to a more flexible and relational UI model here that more closely reflects the BIBO model. Reason? Users were asking for things not easily accommodated in the current, flat, approach (example: a review might be published in a newspaper or a journal, or broadcast on the radio on a podcast). Bruce
Re: [whatwg] on bibtex-in-html5
On May 21, 2009, at 15:02, Bruce D'Arcus wrote: Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. OK. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. In order to not get completely ignored in the technology and sciences sectors, a bibliography microdata format needs to be able to plug into the network effects of BibTeX. Having a non-flat microdata format while BibTeX remains flat would seriously hinder conversions from microdata to BibTeX. How are non-flat bibliographies (beyond an article being in a book / journal / Web site) presented? Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. More to the point, allow to be extracted and used as bibliography source data for another publication to avoid repetitive data entry. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. OK. The styles that I've observed make a difference that isn't traceable to the availability of fields on an item have mainly made a distinction between atomic publications and compilations. • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping On the surface, it seems that it would possible to mint more field types and publications for BibTeX to support those cases, but what is the publication type information used for? Are there as many different entry presentations as there are entry types? Or are the type tokens supposed to be mapped to localized human-readable label strings? Also, the non-flatness I see is an item being part of a compilation which is already supported by BibTeX without allowing the whole model to generalize into a graph. Here's some info on Microsoft's bib format for OOXML, that will give you some info: http://community.muohio.edu/blogs/darcusb/archives/2006/09/05/open-xml-draft-14 It seems relatively straight-forward technically to extend BibTeX with the field types from OOXML that BibTeX doesn't cover. The main issue seems to be the bikeshed of what names to use. Here's the type schema for CSL (though it needs work, and we de-emphasize this for formatting in any case; CSL is oriented towards output formatting only really): http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-types.rnc?view=markup Here's the variable list: http://xbiblio.svn.sourceforge.net/viewvc/xbiblio/csl/schema/branches/split/csl-variables.rnc?revision=941view=markup I don't see a fundamental reason why the BibTeX vocabulary couldn't be extended with stuff from there. • The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. In practice, those problems have already been introduced. For some reason I don't understand, there's an existing pattern of calling a field 'doi' but putting an absolute URI in the value. (As opposed to using a field name 'url' or a value that contains only the DOI-significant part.) The point is, when you get beyond dealing with secondary literature (the domain of BibTeX and the sciences), the range of possible data expands significantly. Things can get really complicated. Consider what's actually pretty simple comparatively: An English translation of a classic work. You often need original publication information such as title (in the original language), publisher and issued date, etc. With a flat model, you have to invent new properties to accommodate every little exception like this. What formats/software do people use for cases like that in practice? If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which
Re: [whatwg] on bibtex-in-html5
On Thu, May 21, 2009 at 9:51 AM, Henri Sivonen hsivo...@iki.fi wrote: On May 21, 2009, at 15:02, Bruce D'Arcus wrote: Except the assumption that BIbTeX is widely used is overdrawn once you get out of the technology and sciences sectors. OK. This doesn't mean that BibTeX is a bad basis. The set of types and fields is limited, though. It's limited, and it's flat. In order to not get completely ignored in the technology and sciences sectors, a bibliography microdata format needs to be able to plug into the network effects of BibTeX. Having a non-flat microdata format while BibTeX remains flat would seriously hinder conversions from microdata to BibTeX. All that matters from a BIbTeX perspective is that the data is a clean superset. E.g. so long as a book, chapter, article, etc. can be reliably converted to and from BibTeX, there's no problem. The same is true of all the other bib formats out there: RIS, NLM, MODS, PRISM, OOXML, etc. How are non-flat bibliographies (beyond an article being in a book / journal / Web site) presented? A journal article is always a good example. If you like, take a look at the RDFa embedded in this example: http://bruce.darcus.name/publications/articles/outside-agitator Now, let's consider the most basic and important distinction: how you represent the journal title. In BibTeX, it's (typically) a flat journal key. In the DC/BIBO representation here, you use a dc:isPartOf relation, so that the triples look like: http://bruce.darcus.name/publications/articles/outside-agitator a bibo:AcademicArticle ; dc:title Dissent, Public Space and the Politics of Citizenship: Riots and the Outside Agitator@en ; bibo:doi 10.1080/1356257042000309652 ; bibo:issue 3 ; bibo:pageEnd 370 ; bibo:pageStart 355 ; bibo:volume 8 ; dc:creator http://bruce.darcus.name/about#me ; dc:isPartOf [ dc:title Space amp; Polity ] . So that same mechanism can be used to represent related titles of all sorts: weblogs, magazines and newspapers, court reporters (which are really just periodicals that published legal decisions), etc. The alternative in a totally flat model is having to invent new title properties every time you come across new data (or using a more generic key than journal to represent the containing title). I explain the basic thinking behind this using some actual examples from citation styles here: http://www.users.muohio.edu/darcusb/misc/citations-spec.html They're really just design notes, but I think communicate the point. Since renderings of bibliography don't show the type of the reference usually, having to use 'misc' for almost everything isn't a practical problem although it is aesthetically displeasing. But this is not the point of adding structured data to HTML; it's to allow it be extracted, and subsequently processed, as data. More to the point, allow to be extracted and used as bibliography source data for another publication to avoid repetitive data entry. Yes. Citation and bibliographic formatting conventions do include information that suggests type; it's not that it requires a human reader to decipher. OK. The styles that I've observed make a difference that isn't traceable to the availability of fields on an item have mainly made a distinction between atomic publications and compilations. Yes. But you also have styles that have conventions like if you have a book, format title in italics, else ... So there are little hints like that which give a (human) reader information they can use to find the source in question. As the creator of CSL, I've always said my intention is to contribute toward helping us move beyond some of these eccentric traditions, though! • Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways. Do you have an example? (I've never used the other formats.) Here's the in-progress mapping of Zotero's types to RDF (BIBO, and a few others; PO from the BBC, and SIOC): https://www.zotero.org/trac/wiki/BiboMapping On the surface, it seems that it would possible to mint more field types and publications for BibTeX to support those cases, but what is the publication type information used for? Are there as many different entry presentations as there are entry types? Or are the type tokens supposed to be mapped to localized human-readable label strings? It depends. For Zotero, a lot of it is about mapping to particular UI configurations for data entry and editing. But they can also be used for mapping to output styling as defined in CSL (which is loosely inspired by BibTeX's BST language, but is XML). Also, the non-flatness I see is an item being part of a compilation which is already supported by BibTeX without allowing the whole model to generalize into a graph. Where is the generic BibTeX key to denote a containing item? There's no publication-title or
Re: [whatwg] on bibtex-in-html5
Both FOAF and vCard have unstructured personal name properties (foaf:name and v:fn) that address this. But vCard required both N and FN, so if you only have FN, you can't get an N without a lot of dictionary-based domain knowledge and special rules. (Or you can make a GIGO N...) Hmm ... that's not how it's implemented in hcard. It is, actually. hCard requires both FN and N, but allows N to be implied by FN in some cases. http://microformats.org/wiki/hcard#Implied_.22n.22_Optimization Ted
[whatwg] on bibtex-in-html5
Re: the recent microdata work and the subsequent effort to include BibTeX in the spec, I summarized my argument against this on my blog: http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5 I think it's fair to say that the Zotero project [1] agrees with my thoughts; Dan Cohen just said as much in an email, where he expressed concern both with the introduction of a new generic metadata-in-HTML spec alongside RDFa, and also the use of BibTeX in particular. Thanks to Ian for the email conversation on this, BTW, and for taking the basic use case seriously. Bruce PS - Brief background on me: am a professional scholar (a social scientist) [2], and author or co-author of some relevant work in this area: the Bibliographic Ontology [3], and the Citation Style Language (CSL) [4], both of which have been collaborations with Zotero (among others). I also had a major hand in the new RDF/RDFa-based extensible metadata support in OpenDocument 1.2. So I have quite a bit of practical experience on different sides of this: both user and developer. [1] http://zotero.org [2] http://bruce.darcus.name [3] http://bibliontology.com/ [4] http://xbiblio.sourceforge.net/csl/