subject:"\[whatwg\] Annotating structured data that HTML has no semantics for"

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-07-07 Thread Ian Hickson

On Tue, 9 Jun 2009, Frank Hellenkamp wrote:
 
  I agree entirely. I actually tried to find a workable solution to 
  address this but unfortunately the only general solutions I could come 
  up with that would allow this were selector-based, and in practice 
  authors are still having trouble understanding how to use Selectors 
  even with CSS.
 
 At least simple selectors are well understood and a well established 
 technique on the web.

Sure, but with tables, you can't use simple selectors. Simple selectors 
(e.g. a class attribute on each cell) wouldn't be any better than 
repeating itemprop= everywhere.


 There is widespread use for it in CSS (so it is very simple to test, if 
 your selector works for the correct set of elements).

It'd actually be quite hard to test a selector layer for microdata, 
relative at least to the testing that (say) CSS gets. The thing is, with 
CSS, if there's a mistake then the worst that will happen is that the rule 
will be ignored, or will apply in some way you didn't realise, but with 
the end result being what you want.

With microdata, if you get the rules wrong, you won't really know, until 
someone tries to apply the data in some way you didn't expect, and then 
it'll fail in ways you won't know about.


 And with a selector-based aproach it is far easier to add 
 metadata-information to existing content, than with the 
 metadata-proposal. So for authors it would be much easier, I think.
 
 It would work like a dezentralized microformats-approach (btw. it would 
 be easy to map the existing microformats to such a css-based 
 metadata-format), with the benefit that you can simply map your own 
 classes and ids to global ones like foaf, dc or hcard.
 
 And you could easily use such profiles from other pages, e.g.: Someone 
 could markup the songs on his page in a way last.fm does and then simply 
 use a copy of their meta-data profile (basically in the same way we use 
 microformats now).

A selectors-based approach would be similar to GRDDL in this respect.

I don't think this really needs support within HTML; I would encourage you 
to work on this as a stand-alone technology.


  There's also the problem with separating the data from the rules that 
  say how to interpret the data, which would likely lead to more 
  problems than the typos one would get from repeating the itemprop=s.
 
 The only real problem I see is the unfortunate fact, that it is harder 
 for browser-implementors to write a good copy  paste code which 
 preserves all metadata from one source to another.

That's one symptom of the above, yes.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-07-01 Thread Ian Hickson

On Tue, 9 Jun 2009, Jonas Sicking wrote:
 
  Some of the improvement suggestions that I have heard that sounds 
  interesting, though possibly for the next version of microdata.
 
  * Support for specifying a machine-readable value, such as for dates, 
  colors, numbers, etc.
 
  I expect we will add support for these based on demand, the same way 
  we added time in the first place.
 
 Using dedicated elements for each data type seems like it will 
 eventually bloat the language.

Only if people don't show restraint in extending the language.


 For example what use would a color element or a number element do?

It would allow conformance checkers to do type checking for the most 
commonly used types, and might allow (for number, anyway) localised 
formatting.


 If instead mashine readable values could be added using a generic 
 method, such as a 'itemvalue' or 'propvalue' attribute, each microdata 
 format can define how to interpret the values, be they numbers, dates, 
 body parts, or chemical formulas.

You can do that now with meta itemprop=... content=


  I even wonder it would allow replacing the time element with a 
  standardized microformat, such as:
 
  Christmas is going down on span item=w3c.time 
  itemvalue=12-25-2009The 25th day of Decemberspan!
 
  I don't really understand how that would be better than dedicated 
  elements.
 
 The idea would be to reduce the size of the language. I.e. if a feature 
 isn't heavily used, it might be better expressed as a microdata format. 

Well, you can do it today as:

   Christmas is going down on meta itemprop=w3c.time 
   content=12-25-2009The 25th day of December!

...which (assuming that in your example you meant itemprop and not 
item, and assuming that you didn't mean the contents of the span to 
have any effect on the microdata processing model) would result in exactly 
the same name/value pair being generated into the relevant item.

On the other hand, if you really meant item=, which I guess you might 
have meant... you could do that today as:

   Christmas is going down on span item=w3c.timemeta itemprop=value
   content=12-25-2009The 25th day of December/span!

...or some such (it doesn't matter what the textual contents of the span 
are in this example). However, this is going to result in much more 
painful structures, and you'd still need to link the item with a parent 
item (assuming there is one), as in:

   div item=com.example.somethingorother
Christmas is going down on span itemprop=com.example.startdate 
item=w3c.timemeta itemprop=value content=12-25-2009The 25th 
day of December/span!
   /div

...which is really getting complicated compared to just:

   div item=com.example.somethingorother
Christmas is going down on meta itemprop=w3c.time 
content=12-25-2009The 25th day of December!
   /div

...or (preferred today):

   div item=com.example.somethingorother
Christmas is going down on time itemprop=w3c.time 
datetime=12-25-2009The 25th day of December/time!
   /div


 For example, why didn't you add elements for bibtex or vCard, but 
 instead used microdata?

New elements didn't really fit the use cases as well.


 Another reason is as a test of the microdata feature itself. Microdata 
 is a sort of extension mechanism to HTML 5. In software development, it 
 is common to test your extension system by developing parts of the 
 product using the extension system. This way you can both keep the core 
 code small, and you get a good test bed for your extension system.

Indeed.


 You have already done this with the predefined vocabularies

Right.


 and apparently the lack of ability to define a mashine readable value 
 separate from the human readable one was not a problem. However it would 
 seem that the same does not hold true for time.

Right, that's why I adapted time into the microdata model.


  * Support for tabular data.
 
  This would be nice if we can find a way to do it that doesn't put 
  undue burdens on simple implementations. (e.g. I would imagine that 
  while a microdata implementation today can be a few hundred lines 
  total, adding support for the table model could easily double that.)

 Quite possibly.
 
 In both these cases I'm perfectly happy to wait with adding more 
 features to microdata for now and see if what we have is successful, 
 before we start over engineering it to cover every imaginable case.

Agreed.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-06-09 Thread Ian Hickson

On Mon, 11 May 2009, Simon Pieters wrote:
 On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote:
 
 Page 3:
 h2My Catsh2
 dl
  dtSchrouml;dinger
  dd item=com.damowmow.cat
   meta property=com.damowmow.name content=Schrouml;dinger
   meta property=com.damowmow.age content=9
   p property=com.damowmow.descOrange male.
  dtErwin
  dd item=com.damowmow.cat
   meta property=com.damowmow.name content=Lord Erwin
   meta property=com.damowmow.age content=3
   p property=com.damowmow.descSiamese color-point.
   img property=com.damowmow.img alt= src=/images/erwin.jpeg
 /dl
 
 Given the microdata solution and this example, there is now a reason other
 than styling to introduce di, since here you duplicate the dt information
 in meta.
 
   dl
di item=com.damowmow.cat
 dt property=com.damowmow.nameSchrouml;dinger
 dd
  meta property=com.damowmow.age content=9
  p property=com.damowmow.descOrange male.
/di
...
 
 The styling problem is discussed at
 http://forums.whatwg.org/viewtopic.php?t=47

Yeah, I noticed that. I agree that if it turns out that this is a common 
authoring pattern (and assuming we can work around the difficulties in 
adjusting the parser to handle this), we should probably introduce di 
after all. I intend to wait and see what happens first though.


On Mon, 11 May 2009, Giovanni Gentili wrote:
 Ian Hickson:
  � USE CASE: Annotate structured data that HTML has no semantics for, and
  � which nobody has annotated before, and may never again, for private use or
  � use in a small self-contained community.
  (..)
  � SCENARIOS:
 
 Between the scenarios should be considered also this case:
 
 * a user (or groups of users) wants to annotate
 items present on a generic web page with
 additional properties in a certain vocabulary.
 for example Joe wants to gather in a blog
 a series of personal annotation to movies
 (or other type of items) present in imdb.com.

This isn't really a use case, it's a solution. What is the end-user 
scenario that the author is trying to address? For example, what kind of 
software will collect this information? What problem are we solving?


 a) In the case of properties specified for element without ancestor with 
 an item attribute specified the corresponding item should be the 
 document? (element body with implicit item attribute).

We already have mechanisms for providing name-value pairs for a document; 
namely, meta name and link rel.


 b) Do we need to require UA to offer a standard way to visualize (at 
 least as an option left to the user) the structured information carried 
 in microdata ?

Not as far as I can tell; what use case would this be for?


 And copypaste?

The spec already requires user agents to include microdata in copy and 
paste.


On Tue, 12 May 2009, Tim Tepa�e wrote:
  
  (Note the metas in the last example -- since sometimes the 
  information isn't visible, rather than requiring that people put it in 
  and hide it with display:none, which has a rather poor accessibility 
  story, I figured we could just allow meta anywhere, if it has a 
  property= attribute.)
 
 That seems to be a solution optimised for extremely invisible metadata 
 but not for metadata which differs from the human visible data.

It handles both -- instead of:

   span itemprop=xy/span

...you can do:

   spanmeta itemprop=x content=yz/span


 Imagine as an example the simple act of marking up a number (and 
 ignoring what the number denotes). For human consumption a thousands 
 seperator is often used, the type of seperator differs by language, 
 locale and context. Just in my little word I see on regular basis the 
 point, the comma, the space, the thin space and sometimes the the 
 apostrophe. Parsing different representations of numbers would be a 
 chore. The value of textContent of the element span 
 itemprop=com.example.price�nbsp;1thinsp;000thinsp;000,mdash;/span 
 is clearly unusable, demanding an additional invisible meta 
 property=com.example.price content=100.

Right.


 My irritation lies in the element proliferation, requiring one element/ 
 attribute combination for machines, one element/text content combination 
 for humans. Of course, any sane author would arrange both elements in a 
 close relation, as parent/child or sibling but there would be still two 
 different elements to maintain, leading to a higher cognitive load. Not 
 just for authors but also for programmers: a fluctating price had to be 
 actualized on two different elements; tree walking DOM scripts had to 
 take meta-Elements in account. Furthermore it clashes with the familiar 
 habit of other elements in HTML. A hyperlink is one element with a 
 machine-readable attribute and human- readable text content. A citation 
 is one element with a machine-readable reference and human-readable text 
 content. The same model is used in meter, progress, time, abbr 
 ... but not in user-defined

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-06-09 Thread Jonas Sicking

 Some of the improvement suggestions that I have heard that sounds
 interesting, though possibly for the next version of microdata.

 * Support for specifying a machine-readable value, such as for dates,
 colors, numbers, etc.

 I expect we will add support for these based on demand, the same way we
 added time in the first place.

Using dedicated elements for each data type seems like it will
eventually bloat the language. For example what use would a color
element or a number element do? If instead mashine readable values
could be added using a generic method, such as a 'itemvalue' or
'propvalue' attribute, each microdata format can define how to
interpret the values, be they numbers, dates, body parts, or chemical
formulas.

 I even wonder it would allow replacing the time element with a
 standardized microformat, such as:

 Christmas is going down on span item=w3c.time
 itemvalue=12-25-2009The 25th day of Decemberspan!

 I don't really understand how that would be better than dedicated
 elements.

The idea would be to reduce the size of the language. I.e. if a
feature isn't heavily used, it might be better expressed as a
microdata format. For example, why didn't you add elements for bibtex
or vCard, but instead used microdata?

However, it's quite possible that time is going to be commonly used
enough that it's worth using an element rather than a microdata
format.

Another reason is as a test of the microdata feature itself. Microdata
is a sort of extension mechanism to HTML 5. In software development,
it is common to test your extension system by developing parts of the
product using the extension system. This way you can both keep the
core code small, and you get a good test bed for your extension
system.

You have already done this with the predefined vocabularies, and
apparently the lack of ability to define a mashine readable value
separate from the human readable one was not a problem. However it
would seem that the same does not hold true for time.

 * Support for tabular data.

 This would be nice if we can find a way to do it that doesn't put undue
 burdens on simple implementations. (e.g. I would imagine that while a
 microdata implementation today can be a few hundred lines total, adding
 support for the table model could easily double that.)

Quite possibly.

In both these cases I'm perfectly happy to wait with adding more
features to microdata for now and see if what we have is successful,
before we start over engineering it to cover every imaginable case.

/ Jonas

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-06-09 Thread Kristof Zelechovski

The problem of W3C DTD DDoS does not apply to CURIE because software
processing RDF does not need to retrieve the resources referenced on a
regular basis.  Even in the case of DTD, the problem is that some software
does not cache, not that some software tries to access it.
IMHO,
Chris

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-06-09 Thread Frank Hellenkamp

Ian Hickson wrote:
 I agree entirely. I actually tried to find a workable solution to address 
 this but unfortunately the only general solutions I could come up with 
 that would allow this were selector-based, and in practice authors are 
 still having trouble understanding how to use Selectors even with CSS. 
 There's also the problem with separating the data from the rules that say 
 how to interpret the data, which would likely lead to more problems than 
 the typos one would get from repeating the itemprop=s.

I am sorry, but I cannot agree on this one.

At least simple selectors are well understood and a well established
technique on the web.

There is widespread use for it in CSS (so it is very simple to test, if
your selector works for the correct set of elements).

And the fact that jquery is *so* successful is based on jquery's
capability to work with selectors in such an easy way – not the other
way around.

And with a selector-based aproach it is far easier to add
metadata-information to existing content, than with the
metadata-proposal. So for authors it would be much easier, I think.

It would work like a dezentralized microformats-approach (btw. it would
be easy to map the existing microformats to such a css-based
metadata-format), with the benefit that you can simply map your own
classes and ids to global ones like foaf, dc or hcard.

And you could easily use such profiles from other pages, e.g.:
Someone could markup the songs on his page in a way last.fm does and
then simply use a copy of their meta-data profile (basically in the same
way we use microformats now).


The only real problem I see is the unfortunate fact, that it is harder
for browser-implementors to write a good copy  paste code which
preserves all metadata from one source to another.


Best regards

Frank

-- 
frank hellenkamp | interface designer
solmsstraße 7 | 10961 berlin

+49.30.49 78 20 70 | tel
+49.173.70 55 781 | mbl
+49.3212.100 35 22 | fax
jo...@depagecms.net

http://www.depagecms.net
http://immerdasgleiche.de
http://everydayisexactlythesame.net/




signature.asc
Description: OpenPGP digital signature

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Henri Sivonen


On May 14, 2009, at 23:52, Eduard Pascual wrote:

On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com 
 wrote:

It doesn't matter one syntax or another. But if a syntax already
exists (RDFa), building a new syntax should be properly justified.


It was at the start of this thread:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html


As
of now, the only supposed benefit I have heard of for this syntax is
that it avoids CURIEs... yet it replaces them with reversed domains??
Is that a benefit?


There's no indirection. A decade of Namespaces in XML shows that both  
authors and implementors have trouble getting prefix-based indirection  
right.


(If we were limited to reasoning about something that we don't have  
experience with yet, I might believe that people can't be too inept to  
use prefix-based indirection. However, a decade of actual evidence  
shows that actual behavior defies reasoning here and prefix-based  
indirection is something that both authors and implementors get wrong  
over and over again.)



I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll agree
that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.


Problems shared by CURIEs, URIs and reverse DNS names:
 * Long.
 * Identifiers outlive organization charts.

Problems that reverse DNS names don't have but CURIEs and URIs do have:
 * http://; 7 characters of even extra length.
 * Affordance of dereferencability when mere identifier sementics are  
meant.


Problems that reverse DNS names and URIs don't have but CURIEs have:
 * Prefix-based indirection.
 * Violation of the DOM Consistency Design Principle if xmlns:foo used.

(I understand that if the microdata syntax offered no advantages  
over RDFa,

then it would be a wasted effort to diverge.

Which are the advantages it offers?


The syntax is simpler for the use cases it was designed for. It uses a  
simpler conceptual model (trees as opposed to graphs). It allows short  
token identifiers. It doesn't use prefix-based indirection. It doesn't  
violate the DOM Consistency Design Principle.


On May 15, 2009, at 14:11, Eduard Pascual wrote:

On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak m...@apple.com  
wrote:

[...]
From my cursory study, I think microdata could subsume many of the  
use cases

of both microformats and RDFa.

Maybe. But microformats and RDFa can handle *all* of these cases.
Again, which are the benefits of creating something entirely new to
replace what already exists while it can't even handle all the cases
of what it is replacing?


Compared to microformats, microdata defines the processing model and  
conformance criteria. The microformats community has failed to provide  
processing model and conformance criteria on similar level of detail.  
The processing model side is perceived to be such a serious issue that  
the lack of a unified microformats parsing spec is cited as a  
motivation to use RDFa instead of microformats.


It seems to me that it avoids much of what microformats advocates  
find objectionable

Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?


RDFa uses a data model that is an overkill for the use cases.


but at the same time it seems it can represent a full RDF data
model.

No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.


That's a feature.


Wait. Are you refering to microdata as an incremental improvement over
RDFa?? IMO, it's rather a decremental enworsement.


That depends on the point of view. I'm sensing two major points of view:

1) Graphs are more general than trees. Hence, being able to serialize  
graphs is better.


2) Graphs are more general than trees. Hence, graphs are harder to  
design UIs for, harder to traverse and harder for authors to grasp.  
Hence, if trees are enough to address use cases, we should only enable  
trees to be serialized.


I subscribe to view #2, and it seems that trees are indeed enough for  
the use cases (that were stipulated by the pro-graph people!).



- Microdata can't represent the full RDF data model (while RDFa can):
some complex structures are just not expressable with microdata.


That's not a use case. That's theoretical purity.


- Microdata relies on reversed domains. While some people argue these
to be better than CURIEs, they are equally horrendous for the average
user, and have the additional disadvantage that they don't map to
anything useful (if they map to something at all), while CURIEs map to
the descriptions and/or definitions of what they represent.


I consider it an advantage that reverse domains don't suggest that you  
should try dereferencing identifiers as if they were addresses.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Julian Reschke


Henri Sivonen wrote:
There's no indirection. A decade of Namespaces in XML shows that both 
authors and implementors have trouble getting prefix-based indirection 
right.


It's true that people get this wrong again and again. But it's also true 
that lots of developers understand it once for all, and then 
consistently get it right.


The interesting question here is whether there's a better system.


I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll agree
that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.


Problems shared by CURIEs, URIs and reverse DNS names:
 * Long.
 * Identifiers outlive organization charts.


That depends on the choice of the URI scheme.


Problems that reverse DNS names don't have but CURIEs and URIs do have:
 * http://; 7 characters of even extra length.
 * Affordance of dereferencability when mere identifier sementics are 
meant.


Again, that depends on the URI scheme.


Problems that reverse DNS names and URIs don't have but CURIEs have:
 * Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated 
indirection mechanism (CSS).



 * Violation of the DOM Consistency Design Principle if xmlns:foo used.


I think there is consensus that this is a drawback, but not about how 
significant this is.


The syntax is simpler for the use cases it was designed for. It uses a 
simpler conceptual model (trees as opposed to graphs). It allows short 
token identifiers. It doesn't use prefix-based indirection. It doesn't 
violate the DOM Consistency Design Principle.


(devil's advocate argument) - so how does the syntax behave for those 
use cases it *hasn't* been designed for?


Compared to microformats, microdata defines the processing model and 
conformance criteria. The microformats community has failed to provide 
processing model and conformance criteria on similar level of detail. 


Indeed.

The processing model side is perceived to be such a serious issue that 
the lack of a unified microformats parsing spec is cited as a motivation 
to use RDFa instead of microformats.


Indeed.


RDFa uses a data model that is an overkill for the use cases.


It would be interesting to understand which use cases that RDFa can do 
are not supported by microdata (I don't understand enough about the 
subject to try myself), and whether the potential advantage of having a 
simpler model outweighs the disadvantage of not using network effects 
and creating a competing syntax.



...


BR, Julian

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Henri Sivonen


On May 18, 2009, at 12:18, Julian Reschke wrote:


Henri Sivonen wrote:
There's no indirection. A decade of Namespaces in XML shows that  
both authors and implementors have trouble getting prefix-based  
indirection right.


It's true that people get this wrong again and again. But it's also  
true that lots of developers understand it once for all, and then  
consistently get it right.


The interesting question here is whether there's a better system.


 1) Centralized allocation of short names.
 2) Prefixing a short name by (an abbreviation of) the name of the  
vocabulary, which makes the probability of collision negligible once  
the designer has googled to check the probable absence of public  
collisions at minting time (e.g. openid.delegate).



I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll  
agree

that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.

Problems shared by CURIEs, URIs and reverse DNS names:
* Long.
* Identifiers outlive organization charts.


That depends on the choice of the URI scheme.


I guess one could use e.g. data:,foo URIs as a namespace URI, but  
why not just use foo?



Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated  
indirection mechanism (CSS).


This would be a persuasive argument if we were reasoning about a  
feature we don't have experience with yet. However, experience shows  
prefix-based indirection is too hard. If at the same time CSS isn't  
too hard, I just have to accept the evidence from the real world even  
if it defies reasoning.


The syntax is simpler for the use cases it was designed for. It  
uses a simpler conceptual model (trees as opposed to graphs). It  
allows short token identifiers. It doesn't use prefix-based  
indirection. It doesn't violate the DOM Consistency Design Principle.


(devil's advocate argument) - so how does the syntax behave for  
those use cases it *hasn't* been designed for?


That's hard to test, because the use case search has been exhausted  
for the moment. It seems we'd need to wait to see new use cases to pop  
up.



RDFa uses a data model that is an overkill for the use cases.


It would be interesting to understand which use cases that RDFa can  
do are not supported by microdata (I don't understand enough about  
the subject to try myself), and whether the potential advantage of  
having a simpler model outweighs the disadvantage of not using  
network effects and creating a competing syntax.


Are there use cases of RDFa that are currently known but that the call  
for use cases didn't turn up?


Either @prefix or RDFa-profiles would break the network effects of the  
deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if  
breaking network effects is on the table in the form of @prefix and  
RDFa-profiles, I don't see why microdata wouldn't be on the table as  
far as network effects go.


--
Henri Sivonen
hsivo...@iki.fi
http://hsivonen.iki.fi/

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Julian Reschke


Henri Sivonen wrote:

The interesting question here is whether there's a better system.


 1) Centralized allocation of short names.


Sounds like urn: to me. Registry is defined in RFC 3406.

 2) Prefixing a short name by (an abbreviation of) the name of the 
vocabulary, which makes the probability of collision negligible once the 
designer has googled to check the probable absence of public collisions 
at minting time (e.g. openid.delegate).


Too fragile for disambiguation for my taste.


That depends on the choice of the URI scheme.


I guess one could use e.g. data:,foo URIs as a namespace URI, but why 
not just use foo?



URI give you the choice of having something easily referenceable (if you 
want), or not.



Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.


HTML developers regularly have to deal with a much more complicated 
indirection mechanism (CSS).


This would be a persuasive argument if we were reasoning about a feature 
we don't have experience with yet. However, experience shows 
prefix-based indirection is too hard. If at the same time CSS isn't too 
hard, I just have to accept the evidence from the real world even if it 
defies reasoning.


No, I don't think we have evidence that prefix-based indirection is too 
hard. There are way to many people getting it right.



...
Either @prefix or RDFa-profiles would break the network effects of the 
deployment of outside-of-REC RDFa-in-XHTML-as-text/html, so if breaking 
network effects is on the table in the form of @prefix and 
RDFa-profiles, I don't see why microdata wouldn't be on the table as far 
as network effects go.


Introducing @prefix will be much simpler to deploy than introducing a 
completely different system.


That being said, I do agree that the current situation is a mess, and 
that the RDFa-in-XHTML spec has created it.


Given the current situation, the simplest possible solution probably is 
to live with it, and use xmlns declarations in HTML for the purpose of 
RDFa as well.


BR, Julian

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Eduard Pascual

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi wrote:
 On May 14, 2009, at 23:52, Eduard Pascual wrote:

 On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com
 wrote:
 It doesn't matter one syntax or another. But if a syntax already
 exists (RDFa), building a new syntax should be properly justified.

 It was at the start of this thread:
 http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html
Ian's initial message goes step by step through the creation of this
new syntax; but does *not* mention at all *why* it was being created
on the first place. The insight into the choices taken is indeed a
good think, and I thank Ian for it; but he omitted to provide insight
into the first choice taken: discarding the multiple options already
available (not only Microformats and RDFa, but also other less
discussed ones such as eRDF, EASE, etc). Sure, there has been a lot of
discussion on this topic; and it's possible that the choice was taken
as part of such discussions. In any case, I think Ian should have
clearly stated the reasons to build a brand new solution when many
others have been out for a while and users have been able to try and
test them.
Please keep in mind that I'm not critizicing the choice itself (at
least, not now), but the lack of information and reasoning behind that
choice.

 As
 of now, the only supposed benefit I have heard of for this syntax is
 that it avoids CURIEs... yet it replaces them with reversed domains??
 Is that a benefit?

 There's no indirection. A decade of Namespaces in XML shows that both
 authors and implementors have trouble getting prefix-based indirection
 right.
Really? I haven't seen any hint about that. Sure, there will be some
people who have trouble understanding namespaces, just like there is
some people who have trouble understanding why something like
trtdfoo/tdtdbar/tr/td is wrong.
Please, could you quote a source for that claim? I could also claim
something like fifteen years of Java show that reversed domains are
error-prone and harmful, and even argue about it; but this kind of
arguments, without a serious analisis or study to back them, are
completely meaningless and definitely subjective.

 (If we were limited to reasoning about something that we don't have
 experience with yet, I might believe that people can't be too inept to use
 prefix-based indirection. However, a decade of actual evidence shows that
 actual behavior defies reasoning here and prefix-based indirection is
 something that both authors and implementors get wrong over and over again.)
Curious: you refer to a decade of actual evidence, but you fail to
refer to any actual evidence. I'm eager to see that evidence; could
you share it with us? Thank you.

 I have been a Java programmer for some years, and
 still find that convention absurd, horrible, and annoying. I'll agree
 that CURIEs are ugly, and maybe hard to understand, but reversed
 domains are equally ugly and hard to understand.

 Problems shared by CURIEs, URIs and reverse DNS names:
  * Long.
  * Identifiers outlive organization charts.
Ehm. CURIEs ain't really long: the main point of prefixes is to make
them as short as reasonably possible.
Good identifiers outlive bad organization charts. Good organization
outlives bad identifiers. Good organization and good identifier tend
to outlive the context they are used in.

 Problems that reverse DNS names don't have but CURIEs and URIs do have:
  * http://; 7 characters of even extra length.
  * Affordance of dereferencability when mere identifier sementics are meant.
A CURIE (at least as typed by an author) doesn't have the http://:
it is a prefix, a colon, and whatever goes after it. Once resolved
(ie: after replacing the prefix and colon by what the prefix
represents) what you get is no longer a CURIE, but a URI like the ones
you'd type in your browser or inside a link's href attribute.
Derefercability is not a problem on itself: having more than what is
strictly needed can be either irrelevant or an advantage, not a
problem. Of course, it *may* be the cause of some actual problem, but
in that case you should rather describe the problem itself, so it can
be evaluated.

 Problems that reverse DNS names and URIs don't have but CURIEs have:
  * Prefix-based indirection.
Indirection can't be taken as a problem when most currently used RDFa
tools don't use it at all (which proves that they can work without
relying on it). Sure, it's not as big an advantage as some may claim
it to be. But the ability of indirection itself, even if not 100%
guaranteed to work, it is an actual advantage. As a real world
example, I have been able to learn about vocabularies I didn't know by
following the links on prefix declarations in documents using them.
  * Violation of the DOM Consistency Design Principle if xmlns:foo used.
*if* xmlns:foo is used. Very strong emphasis on the conditional, and
on the multiple possibilities that have already been proposed to deal

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Maciej Stachowiak



On May 18, 2009, at 6:05 AM, Eduard Pascual wrote:

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi  
wrote:

On May 14, 2009, at 23:52, Eduard Pascual wrote:

On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com 


wrote:
It doesn't matter one syntax or another. But if a syntax already
exists (RDFa), building a new syntax should be properly justified.


It was at the start of this thread:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019681.html

Ian's initial message goes step by step through the creation of this
new syntax; but does *not* mention at all *why* it was being created
on the first place. The insight into the choices taken is indeed a
good think, and I thank Ian for it; but he omitted to provide insight
into the first choice taken: discarding the multiple options already
available (not only Microformats and RDFa, but also other less
discussed ones such as eRDF, EASE, etc).



I think Ian did explain why he discarded RDFa as an option.

In the email linked above, Ian Hickson wrote:

Another solution we could consider is RDFa:

 section typeof=d:cat xmlns:d=http://damowmow.com/;
  h1 property=d:nameHedral/h1
  p property=d:descHedral is a male american domestic  
shorthair,

  with a fluffy black fur with white paws and belly./p
  img src=hedral.jpeg alt= title=Hedral, age 18 months
  class=photo rel=d:img
 /section

This unfortunately also has a number of problems.

 - it uses prefixes, which most authors simply do not understand, and
   which many implementors end up getting wrong (e.g. SearchMonkey
   hard-coded certain prefixes in its first implementation, Google's
   handling of RDF blocks for license declarations is all done with
   regular expressions instead of actually parsing the namespaces,  
etc).

   Even if implemented right, namespaces still lead to flaky
   copy-and-paste behaviour.

 - it sometimes uses rel= and sometimes uses property= and it's  
hard

   to know when to use one or the other.

 - it introduces much more power than is necessary to solve this  
problem.



I believe Microformats were discarded as a solution because the  
proposed use case was as follows:


USE CASE: Annotate structured data that HTML has no semantics for,  
and which nobody has annotated before, and may never again, for  
private use or use in a small self-contained community.


But Microformats are only intended for widely used and generally  
agreed upon public vocabularies. The Microformats process is not  
applicable to private-use/small-community vocabularies. And  
Microformats define specific vocabularies, not a general way to add  
new kinds of semantic markup. I expect Microformats experts would  
agree with this assessment.



So I think it is clear why neither Microformats or RDFa were seen as  
suitable solutions to the use case, even if the matter was addressed  
somewhat briefly.



Regards,
Maciej

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-18 Thread Henri Sivonen

On May 18, 2009, at 16:05, Eduard Pascual wrote:

On Mon, May 18, 2009 at 10:38 AM, Henri Sivonen hsivo...@iki.fi
wrote:

(If we were limited to reasoning about something that we don't have
experience with yet, I might believe that people can't be too inept
to use
prefix-based indirection. However, a decade of actual evidence
shows that

actual behavior defies reasoning here and prefix-based indirection is
something that both authors and implementors get wrong over and
over again.)

Curious: you refer to a decade of actual evidence, but you fail to
refer to any actual evidence. I'm eager to see that evidence; could
you share it with us? Thank you.

I thought everyone had seen the confusion. There are pointers at
http://wiki.whatwg.org/wiki/Namespace_confusion
The wiki page is less than a decade old, so it's length isn't quite
that impressive.

I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll
agree

that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.

Problems shared by CURIEs, URIs and reverse DNS names:
* Long.
* Identifiers outlive organization charts.

Ehm. CURIEs ain't really long: the main point of prefixes is to make
them as short as reasonably possible.

You need to consider the length of the prefix declarations, too.

Problems that reverse DNS names and URIs don't have but CURIEs have:
* Prefix-based indirection.

Indirection can't be taken as a problem when most currently used RDFa
tools don't use it at all (which proves that they can work without
relying on it).

What do you mean? Current RDFa tools don't use prefixes?

(I understand that if the microdata syntax offered no advantages
over

RDFa,
then it would be a wasted effort to diverge.

Which are the advantages it offers?

The syntax is simpler for the use cases it was designed for. It
uses a
simpler conceptual model (trees as opposed to graphs). It allows
short token
identifiers. It doesn't use prefix-based indirection. It doesn't
violate the

DOM Consistency Design Principle.

Ok, the syntax is simpler for a subset of the use cases; but it leaves
entirely out the rest of use cases.

What are the rest of the use cases? Why weren't they put forward when
Hixie asked for use cases?

The DOM Consistency again is not an advantage of the microdata syntax
because this could have been fulfilled with other syntaxes as well.

It's an advantage over RDFa-in-XHTML-served-as-text/html. It's not an
advantage over microformats or may not be an advantage over a
speculative yet undefined variation of RDFa.

It seems to me that it avoids much of what microformats advocates
find

objectionable

Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?

RDFa uses a data model that is an overkill for the use cases.

Which use cases?

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html

No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.

That's a feature.

What?? Being unable to deal with all the use cases is a feature??

Being simpler while addressing all the use cases is a feature.

Wait. Are you refering to microdata as an incremental improvement
over

RDFa?? IMO, it's rather a decremental enworsement.

That depends on the point of view. I'm sensing two major points of
view:

1) Graphs are more general than trees. Hence, being able to
serialize graphs

is better.

2) Graphs are more general than trees. Hence, graphs are harder to
design
UIs for, harder to traverse and harder for authors to grasp. Hence,
if trees

are enough to address use cases, we should only enable trees to be
serialized.

¬¬ Again, what's your basis to decide that trees are enough to
address use cases?? Of course, they are enough to solve some use
cases, but the convenience of dealing with just trees is not worth
sacrificing the needs of those use cases you are arbirarily deciding
to ignore.

I don't see anything on http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-April/019374.html
that doesn't boil down to trees or simple key-value pairs attached
to an item.

I subscribe to view #2, and it seems that trees are indeed enough
for the

use cases (that were stipulated by the pro-graph people!).

- Microdata can't represent the full RDF data model (while RDFa
can):

some complex structures are just not expressable with microdata.

That's not a use case. That's theoretical purity.

It's not theoretical purity, it's something simpler:
*extensibility*. And, with over two decades between versions of the
specs, this is a strong requirement: if a problem is noticed after
HTML5 becomes the standard, it's essential to be able to solve it
without waiting 10 or 20 years for HTML6 to come out.

Well, you have to commit to some bounds on

Re: [whatwg] Annotating structured data that HTML has no semantics

2009-05-17 Thread Eduard Pascual

On Sat, May 16, 2009 at 10:02 AM, Leif Halvard Silli l...@malform.no wrote:
 [...]
 But may be, after all, it ain't so bad. It is good to have the opportunity.
 :-)
This is the exactly the point (at least, IMO): RDFa may be quite good
at embedding inline metadata, but can't deal at all with describing
the semantics that are inherent to the structure. OTOH, EASE does
quite the latter, but can't handle the former at all.
That's why I was advocating for a solution that allows either
approach, and even mixing both when appropriate.

On a side note, about the idea of mixing CSS+EASE or CSS+CRDF or
CSS+whatever: my PoV is that these *should* not be mixed; but any
CSS-like semantic description would benefit from some foolproofing,
ensuring that if an author puts CRDF this would get ignored by CSS
parsers (and viceversa). In addition, CSS's error-handling rules make
this kind of shielding relatively easy. OTOH, adding the semantic code
as part of the CSS styling, or trying to consider this as part (or
even as an extension) of the CSS language is wrong by definition:
semantics is not styling; and we should try to make authors aware
enough of the difference.

Regards,
Eduard Pascual

Re: [whatwg] Annotating structured data that HTML has no semantics

2009-05-16 Thread Leif Halvard Silli


Tab Atkins Jr. On 09-05-15 22.15:
On Wed, May 13, 2009 at 10:04 AM, Leif Halvard Silli 
  

Toby Inkster on Wed May 13 02:19:17 PDT 2009:


Hear hear.  Lets call it Cascading RDF Sheets.


http://buzzword.org.uk/2008/rdf-ease/spec
http://buzzword.org.uk/2008/rdf-ease/reactions
  
RDFa is better though.
  

What does 'better' mean in this context? Why and how? Because it is easier
to process? But EASE seems more compatible with microformats, and is
better in that sense.



I'd also like clarification here.  I dislike *all* of the inline
metadata proposals to some degree, for the same reasons that I dislike
inline @style and @onfoo handlers.  A Selector-based way of applying
semantics fits my theoretical needs much better.
  


A possibly 10 year old use case where I think EASE  - or GRDDL as such - 
should fit in:


Shelley and Geoffrey reminded us that RSS 1.0 stands for RDF Site 
Summary 1.0.  The W3 also uses RSS 1.0. for its feed[1].  The feed is 
generated via a profile transformation [2] that happens with XSLT. The 
profile defines the div class=item as news items (note the 
combination of element and class - as in EASE)-  But the profile also 
implements particular rules for particular elements without looking at 
the @class. (E.g. each div class=item must contain h2 or h3, for 
example.)


All in all, it sounds very similar to what the newer technology GRDDL 
does, since it is all happening based on a profile and some class names 
and specific element structures. And, this is possible to test with the 
W3 GRDDL service, which produces a feed that in fact, when you look 
with the right eyes, is the same as the published homepage feed[3].


If the microdata becomes part of the final version of HTML 5, then 
GRDDL  (with or without EASE) will probably prosper, since it probably 
doesn't matter to GRDDL whether it looks into @class or @item, as long 
as the thing is part of the profile and the profiletransformation. 
(But it would be interesting if someone in the know could test if the 
triples would be the same, etc ...) And if so, then the introduction 
of microdata increases the need for @profile in HTML 5.


[1] http://www.w3.org/2000/08/w3c-synd/home.rss
[2] http://www.w3.org/2000/08/w3c-synd/
[3] 
http://www.w3.org/2007/08/grddl/?docAddr=http%3A%2F%2Fwww.w3.org%2Foutput=rdfxml



I read all the reactions you pointed to. Some made the claim that EASE would
move semantics out of the HTML file, and that microformats was better as it
keeps the semantics inside the file. But I of course agree with you that
EASE just underline/outline the semantics already in the file.



Yup.  The appropriate critique of separated metadata is that the
*data* is moved out of the document, where it will inevitably decay
compared to the live document.  RDF-EASE keeps all the data stored in
the live document, and merely specifies how to extract it.  The only
way you can lose data then is by changing the html structure itself,
which is much less common than just changing the content.
  


That the structure changes seldom, /could/ be a reason for using RDFa to 
store the meta info in the very element instead of using EASE (or even 
Dublin Core in meta elements in the head). OTOH, that the structure 
changes little, could also be something that /permits/ the use of GRDDL 
... So it depends on how you see it.



From the EASE draft:


All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1
Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be
safely mixed in one file, [...]
  

I wonder why you think it is so important to be able to mix CSS and EASE. It
seems better to separate the two completely.



I'm not thrilled with the mixture of CSS and metadata either.  Just
because it uses Selectors doesn't mean it needs to be specifiable
alongside CSS.  jQuery uses Selectors too, but it stays where it
belongs.  ^_^  (That being said, there's a plugin for it that allows
you to specify js in your CSS, and it gets applied to the matching
elements from the block's selector.)
  


But may be, after all, it ain't so bad. It is good to have the 
opportunity. :-) (Since you, as I perceived it, disagreed with yourself 
above, I continue the tradition.) :-)

--
leif halvard silli

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Eduard Pascual

On Thu, May 14, 2009 at 10:17 PM, Maciej Stachowiak m...@apple.com wrote:
 [...]
 From my cursory study, I think microdata could subsume many of the use cases
 of both microformats and RDFa.
Maybe. But microformats and RDFa can handle *all* of these cases.
Again, which are the benefits of creating something entirely new to
replace what already exists while it can't even handle all the cases
of what it is replacing? Both the new syntax, and the cases
restrictions, are costs: what are these costs buying? If it's not
clear what we are getting for these costs, it is impossible to
evaluate whether the costs are worth it or not.

 It seems to me that it avoids much of what microformats advocates find 
 objectionable
Could you specify, please? Do you mean anything else than WHATWG's
almost irrational hate toward CURIEs and everything that involves
prefixes?

 but at the same time it seems it can represent a full RDF data
 model.
No, it *can't* represent a full RDF model: it has already been shown
several times on this thread.

 Thus, I think we have the potential to get one solution that works for 
 everyone.
RDFa itself doesn't work for everyone; but microdata is even more
restricted: it leaves out the cases that RDFa leaves out, but it also
leaves out some cases that RDFa was able to handle. So, where do you
see such potential?

 I'm not 100% sure microdata can really achieve this, but I think making the
 attempt is a positive step.
What do you mean by making the attempt? If there is something
microdata can't handle, it won't be able to handle it without changing
the spec. If you meant that evolving that microdata proposal towards
something that works for everyone is a positive step, then I agree;
but if you meant that engraving this microdata approach into the spec
and set it into stone, then attempt for everyone to accept it, then I
definitelly disagree. So, please, could you clarify the meaning of
that statement? Thanks.

 One other detail that it seems not many people have picked up on yet is that
 microdata proposes a DOM API to extract microdata-based info from a live
 document on the client side. In my opinion this is huge and has the
 potential to greatly increase author interest in semantic markup.
Allright, an API may be a benefit. Most probably it is. However, a
similar API could have been built from RDFa, or eRDF, or EASE, or any
other already existing or new solution; so it doesn't justify creating
a new syntax. I have to insist: which are the benefits from such
built-from-the-ground, restrictive *syntax*? That's what we need to
know to evaluate it against its costs.

 Now, it may be that microdata will ultimately fail, either because it is
 outcompeted by RDFa, or because not enough people care about semantic
 markup, or whatever. But at least for now, I don't see a reason to strangle
 it in the cradle.
At least for now, I don't see a reason why it was created to begin
with. Maybe if somebody could enlighten us with this detail, this
discussion could evolve into something more useful and productive.

On Fri, May 15, 2009 at 6:53 AM, Maciej Stachowiak m...@apple.com wrote:

 On May 14, 2009, at 1:30 PM, Shelley Powers wrote:

 So, if I'm pushing for RDFa, it's not because I want to win. It's
 because I have things I want to do now, and I would like to make sure have a
 reasonable chance of working a couple of years in the future. And yeah, once
 SVG is in HTML5, and RDFa can work with HTML5, maybe I wouldn't mind giving
 old HTML a try again. Lord knows I'd like to user ampersands again.

 It sounds like your argument comes down to this: you have personally
 invested in RDFa, therefore having a competing technology is bad, regardless
 of the technical merits.
Pause, please. Before going on, I need to ask again: which are those
technical merits??

 I don't mean to parody here - I am somewhat sympathetic to this line of 
 argument.
I think I'm interpreting Shelley's argument slightly differently. She
didn't chose RDFa because it was better than microdata. She chose RDFa
because it was better than other options, and microdata didn't even
exist yet. Now microdata comes out, some drawbacks are highlighted in
comparison with RDFa (lack of typing, inability to depict the full RDF
model, Reversed domains are as ugly as CURIEs (but at least CURIEs
resolve to something useful, while reversed domains often don't
resolve at all), and you ask RDFa proponents to give microdata a
chance, to not strangle it in the cradle; but nobody seems willing
to answer the one question: what does microdata provide to make up for
its drawbacks?

 Often pragmatic concerns mean that an incremental improvement just isn't 
 worth the cost of switching
Wait. Are you refering to microdata as an incremental improvement over
RDFa?? IMO, it's rather a decremental enworsement.

 My personally judgment is that we're not past the point of
 no return on data embedding. There's microformats, RDFa, and then dozens of
 other serializations of

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Shelley Powers


Maciej Stachowiak wrote:


On May 14, 2009, at 1:30 PM, Shelley Powers wrote:

So, if I'm pushing for RDFa, it's not because I want to win. It's 
because I have things I want to do now, and I would like to make sure 
have a reasonable chance of working a couple of years in the future. 
And yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I 
wouldn't mind giving old HTML a try again. Lord knows I'd like to 
user ampersands again.


It sounds like your argument comes down to this: you have personally 
invested in RDFa, therefore having a competing technology is bad, 
regardless of the technical merits. I don't mean to parody here - I am 
somewhat sympathetic to this line of argument. Often pragmatic 
concerns mean that an incremental improvement just isn't worth the 
cost of switching (for example HTML vs. XHTML). My personally judgment 
is that we're not past the point of no return on data embedding. 
There's microformats, RDFa, and then dozens of other serializations of 
RDF (some of which you cited). This doesn't seem like a space on the 
verge of picking a single winner, and the players seem willing to 
experiment with different options.



There are not dozens of other serializations of RDF.

The point I was trying to make is, I'd rather put my time into something 
that exists now, than have to watch the wheel re-invented. I'd rather 
see semantic metadata become a reality. I'm glad that you personally 
feel that companies will be just peachy keen on having to support 
multiple parsers to get the same data.


On the HTML WG side, I will never support microdata, because no case has 
been made for its existence.




The point is, people in the real world have to use this stuff. It 
helps them if they have one, generally agreed on approach. As it 
is, folks have to contend with both RDFa and microformats, but at 
least we know these have different purposes.


From my cursory study, I think microdata could subsume many of the 
use cases of both microformats and RDFa. It seems to me that it 
avoids much of what microformats advocates find objectionable, and 
provides a good basis for new microformats; but at the same time it 
seems it can represent a full RDF data model. Thus, I think we have 
the potential to get one solution that works for everyone.


I'm not 100% sure microdata can really achieve this, but I think 
making the attempt is a positive step.



It can't, don't you see?

Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will 
be around for years, decades. In addition, XHTML5 already supports RDFa.


Supporting XHTML 1.1 has about 0.001% as much value as 
supporting  text/html. XHTML 2.0 is completely irrelevant to the Web, 
and looks on track to remain so. So I don't find this point very 
persuasive.


I don't think you'll find that the world is breathlessly waiting for 
HTML5. I think you'll find that XHTML 1.1 will have wider use than HTML5 
for the next decade. If not longer. I wouldn't count out XHTML 2.0, 
either.  And in a decade, a lot can change.


Why you think something completely brand new, no vendor support, 
drummed up in a few hours or a day or so is more robust, and a better 
option than a mature spec in wide use, well frankly boggles my mind.


I haven't evaluated it enough to know for sure (as I said). I do think 
avoiding CURIEs is extremely valuable from the point of view of sane 
text/html semantics and ease of authoring; and RDF experts seem to 
think it works fine for representing RDF data models. So tentatively, 
I don't see any gaping holes. If you see a technical problem, and not 
just potential competition for the technology you've invested in, then 
you should definitely cite it.


I don't think CURIEs are that difficult, nor impossible no matter the 
arguments that Henri brings out.


I am impressed with your belief in HTML5.

But
One other detail that it seems not many people have picked up on yet 
is that microdata proposes a DOM API to extract microdata-based info 
from a live document on the client side. In my opinion this is huge 
and has the potential to greatly increase author interest in 
semantic markup.




Not really. Can do this now with RDFa in XHTML. And I don't need any 
new DOM to do it.


The power of semantic markup isn't really seen until you take that 
markup data _outside_ the document. And merge that data with data 
from other documents. Google rich snippets. Yahoo searchmonkey. Heck, 
even an application that manages the data from different subsites of 
one domain.


I respectfully disagree. An API to do things client-side that doesn't 
require an external library is extremely powerful, because it lets 
content authors easily make use of the very same semantic markup that 
they are vending for third parties, so they have more incentive to use 
it and get it right.



Sure, we'll have to disagree on this one.


Now, it may be that microdata will ultimately fail, either because 
it is outcompeted by RDFa,

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Simon Pieters

On Thu, 14 May 2009 22:30:41 +0200, Shelley Powers shell...@burningbird.net 
wrote:

 I'm not 100% sure microdata can really achieve this, but I think making  
 the attempt is a positive step.

 It can't, don't you see?

 Microdata will only work in HTML5/XHTML5.

Actually, as specified, it would work for any text/html and any XHTML content. 
It would just be valid in (X)HTML5, but it would work even if the input is not 
valid (X)HTML5 or looks like HTML4 or XHTML 1.1.


 XHTML 1.1 and yes, 2.0 will be  
 around for years, decades. In addition, XHTML5 already supports RDFa.

XHTML5 supports RDFa to the same extent that XHTML 1.1 supports microdata (in 
both cases, it would work but is not valid).

-- 
Simon Pieters
Opera Software

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-15 Thread Tab Atkins Jr.

On Wed, May 13, 2009 at 10:04 AM, Leif Halvard Silli l...@malform.no wrote:
 Toby Inkster on Wed May 13 02:19:17 PDT 2009:

 Leif Halvard Silli wrote:

  Hear hear.  Lets call it Cascading RDF Sheets.

 http://buzzword.org.uk/2008/rdf-ease/spec

 http://buzzword.org.uk/2008/rdf-ease/reactions

 I have actually implemented it. It works.

 Oh! Thanks for sharing.

Indeed, RDF-EASE seems fairly nice!

 RDFa is better though.

 What does 'better' mean in this context? Why and how? Because it is easier
 to process? But EASE seems more compatible with microformats, and is
 better in that sense.

I'd also like clarification here.  I dislike *all* of the inline
metadata proposals to some degree, for the same reasons that I dislike
inline @style and @onfoo handlers.  A Selector-based way of applying
semantics fits my theoretical needs much better.

 I read all the reactions you pointed to. Some made the claim that EASE would
 move semantics out of the HTML file, and that microformats was better as it
 keeps the semantics inside the file. But I of course agree with you that
 EASE just underline/outline the semantics already in the file.

Yup.  The appropriate critique of separated metadata is that the
*data* is moved out of the document, where it will inevitably decay
compared to the live document.  RDF-EASE keeps all the data stored in
the live document, and merely specifies how to extract it.  The only
way you can lose data then is by changing the html structure itself,
which is much less common than just changing the content.

 From the EASE draft:

 All properties in RDF-EASE begin with the string -rdf-, as per §4.1.2.1
 Vendor-specific extensions in [CSS21]. This allows RDF-EASE and CSS to be
 safely mixed in one file, [...]

 I wonder why you think it is so important to be able to mix CSS and EASE. It
 seems better to separate the two completely.

I'm not thrilled with the mixture of CSS and metadata either.  Just
because it uses Selectors doesn't mean it needs to be specifiable
alongside CSS.  jQuery uses Selectors too, but it stays where it
belongs.  ^_^  (That being said, there's a plugin for it that allows
you to specify js in your CSS, and it gets applied to the matching
elements from the block's selector.)

~TJ

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread James Graham


jgra...@opera.com wrote:

Quoting Philip Taylor excors+wha...@gmail.com:


On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:


One of the more elaborate use cases I collected from the e-mails sent in
over the past few months was the following:

  USE CASE: Annotate structured data that HTML has no semantics for, and
  which nobody has annotated before, and may never again, for private 
use or

  use in a small self-contained community.

[...]

To address this use case and its scenarios, I've added to HTML5 a simple
syntax (three new attributes) based on RDFa.


There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.


I have a *totally unfinished* demo that does something rather similar
at [1]. It is highly likely to break and/or give incorrect results**.
If you use it for anything important you are insane :)


I have now added extremely preliminary RDF support with output as N3 and 
 RDF/XML courtesy of rdflib. It is certain to be buggy.

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Shelley Powers


James Graham wrote:

jgra...@opera.com wrote:

Quoting Philip Taylor excors+wha...@gmail.com:


On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:


One of the more elaborate use cases I collected from the e-mails 
sent in

over the past few months was the following:

  USE CASE: Annotate structured data that HTML has no semantics 
for, and
  which nobody has annotated before, and may never again, for 
private use or

  use in a small self-contained community.

[...]

To address this use case and its scenarios, I've added to HTML5 a 
simple

syntax (three new attributes) based on RDFa.


There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.


I have a *totally unfinished* demo that does something rather similar
at [1]. It is highly likely to break and/or give incorrect results**.
If you use it for anything important you are insane :)


I have now added extremely preliminary RDF support with output as N3 
and  RDF/XML courtesy of rdflib. It is certain to be buggy.


So much concern about generating RDF, makes one wonder why we didn't 
just implement RDFa...


Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Dan Brickley


On 14/5/09 14:18, Shelley Powers wrote:

James Graham wrote:

jgra...@opera.com wrote:

Quoting Philip Taylor excors+wha...@gmail.com:


On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:


One of the more elaborate use cases I collected from the e-mails
sent in
over the past few months was the following:

USE CASE: Annotate structured data that HTML has no semantics for, and
which nobody has annotated before, and may never again, for private
use or
use in a small self-contained community.

[...]

To address this use case and its scenarios, I've added to HTML5 a
simple
syntax (three new attributes) based on RDFa.


There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.


I have a *totally unfinished* demo that does something rather similar
at [1]. It is highly likely to break and/or give incorrect results**.
If you use it for anything important you are insane :)


I have now added extremely preliminary RDF support with output as N3
and RDF/XML courtesy of rdflib. It is certain to be buggy.


So much concern about generating RDF, makes one wonder why we didn't
just implement RDFa...


Having HTML5-microdata -to- RDF parsers is pretty critical to having 
test cases that help us all understand where RDFa-Classic and HTML5 
diverge. I'm very happy to see this work being done and that there are 
multiple implementations.


As far as I can see, the main point of divergence is around URI 
abbreviation mechanisms. But also HTML5 might not have a notion 
equivalent to RDF/RDFa's bNodes construct. The sooner we have these 
parsers the sooner we'll know for sure.


Dan

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Shelley Powers

Dan Brickley wrote:

On 14/5/09 14:18, Shelley Powers wrote:

James Graham wrote:

jgra...@opera.com wrote:

Quoting Philip Taylor excors+wha...@gmail.com:

On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:

One of the more elaborate use cases I collected from the e-mails
sent in
over the past few months was the following:

USE CASE: Annotate structured data that HTML has no semantics
for, and

which nobody has annotated before, and may never again, for private
use or
use in a small self-contained community.

[...]

To address this use case and its scenarios, I've added to HTML5 a
simple
syntax (three new attributes) based on RDFa.

There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.

I have a *totally unfinished* demo that does something rather similar
at [1]. It is highly likely to break and/or give incorrect results**.
If you use it for anything important you are insane :)

I have now added extremely preliminary RDF support with output as N3
and RDF/XML courtesy of rdflib. It is certain to be buggy.

So much concern about generating RDF, makes one wonder why we didn't
just implement RDFa...

Having HTML5-microdata -to- RDF parsers is pretty critical to having
test cases that help us all understand where RDFa-Classic and HTML5
diverge. I'm very happy to see this work being done and that there are
multiple implementations.

As far as I can see, the main point of divergence is around URI
abbreviation mechanisms. But also HTML5 might not have a notion
equivalent to RDF/RDFa's bNodes construct. The sooner we have these
parsers the sooner we'll know for sure.

Dan

Actually, I believe there are other differences, as others have pointed
out.

http://www.jenitennison.com/blog/node/103

http://realtech.burningbird.net/semantic-web/semantic-web-issues-and-practices/holding-on-html5

Some of the differences have resulted in more modifications to the
underlying HTML5 spec, which is curious, because Ian has stated in
comments that support for RDF is only a side interest and not the main
purpose behind the microdata section.

With the statement that support for RDF isn't a particular goal of
microdata, Dan, I think you're being optimistic about the good this
effort will generate for RDFa. But, more power to you.

Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Philip Taylor

On Thu, May 14, 2009 at 1:25 PM, Dan Brickley dan...@danbri.org wrote:
 Having HTML5-microdata -to- RDF parsers is pretty critical to having test
 cases that help us all understand where RDFa-Classic and HTML5 diverge. I'm
 very happy to see this work being done and that there are multiple
 implementations.

 As far as I can see, the main point of divergence is around URI abbreviation
 mechanisms. But also HTML5 might not have a notion equivalent to RDF/RDFa's
 bNodes construct. The sooner we have these parsers the sooner we'll know for
 sure.

If I understand RDF correctly, the idea is that everything can be
URIs, subjects and objects can instead be blank nodes, and objects can
instead be literals. If we restrict literals to strings (optionally
with languages), then I think all triples must follow one of these
eight patterns:

  urn:subject urn:predicate urn:object .
  urn:subject urn:predicate object .
  urn:subject urn:predicate object@lang .
  urn:subject urn:predicate _:X .
  _:X urn:predicate urn:object .
  _:X urn:predicate object .
  _:X urn:predicate object@lang .
  _:X urn:predicate _:Y .

These cases can be trivially mapped into HTML5 microdata as:

  div item
link itemprop=about href=urn:subject
link itemprop=urn:predicate href=urn:object
  /div

  div item
link itemprop=about href=urn:subject
meta itemprop=urn:predicate content=object
  /div

  div item
link itemprop=about href=urn:subject
meta itemprop=urn:predicate content=object lang=lang
  /div

  div item
link itemprop=about href=urn:subject
meta itemprop=urn:predicate item id=X
  /div

  link subject=X itemprop=urn:predicate href=urn:object

  meta subject=X itemprop=urn:predicate content=object

  meta subject=X itemprop=urn:predicate content=object lang=lang

  meta subject=X itemprop=urn:predicate item id=Y

(There's the caveat about link and meta being moved into head in
some browsers; you can replace them with a and span instead.)

These aren't the most elegant ways of expressing complex structures
(because they don't make much use of nesting), but hopefully they
demonstrate that it's possible to express any RDF graph (that only
uses string literals) by decomposing into triples and then writing as
HTML with these patterns.

(If all the triples using a blank node have the same subject, then you
don't need to use 'id' and 'subject' because you can just nest the
markup instead, I think.)

With my parser (in Firefox 3.0), the output triples (sorted into a
clearer order) are:

   http://www.w3.org/1999/xhtml/vocab#item urn:subject .
   http://www.w3.org/1999/xhtml/vocab#item urn:subject .
   http://www.w3.org/1999/xhtml/vocab#item urn:subject .
   http://www.w3.org/1999/xhtml/vocab#item urn:subject .
  urn:subject urn:predicate urn:object .
  urn:subject urn:predicate object .
  urn:subject urn:predicate object@lang .
  urn:subject urn:predicate _:n0 .
  _:n0 urn:predicate urn:object .
  _:n0 urn:predicate object .
  _:n0 urn:predicate object@lang .
  _:n0 urn:predicate _:n1 .

which corresponds to what was desired.

So, I can't see any limits on expressivity other than that literals
must be strings. (But I'm not at all an expert on RDF, and I may have
missed something in the microdata spec, so please let me know if I'm
wrong!)

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Maciej Stachowiak



On May 14, 2009, at 5:18 AM, Shelley Powers wrote:

So much concern about generating RDF, makes one wonder why we didn't  
just implement RDFa...


If it's possible to produce RDF triples from microdata, and if RDF  
triples of interest can be expressed with microdata, why does it  
matter if the concrete syntax is the same as RDFa? Isn't the important  
thing about RDF the data model, not the surface syntax?


(I understand that if the microdata syntax offered no advantages over  
RDFa, then it would be a wasted effort to diverge. But my impression  
is that you'd object to anything that isn't exactly identical to RDFa,  
even if it can easily be used in the same way.)


Regards,
Maciej

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Shelley Powers


Maciej Stachowiak wrote:


On May 14, 2009, at 5:18 AM, Shelley Powers wrote:

So much concern about generating RDF, makes one wonder why we didn't 
just implement RDFa...


If it's possible to produce RDF triples from microdata, and if RDF 
triples of interest can be expressed with microdata, why does it 
matter if the concrete syntax is the same as RDFa? Isn't the important 
thing about RDF the data model, not the surface syntax?


(I understand that if the microdata syntax offered no advantages over 
RDFa, then it would be a wasted effort to diverge. But my impression 
is that you'd object to anything that isn't exactly identical to RDFa, 
even if it can easily be used in the same way.)


Regards,
Maciej


Because one would assume that one way to accomplish a task would be more 
attractive to web developers, designers, parser developers, browsers, et 
al.


In addition, one would also assume that one way to accomplish a task 
would be more attractive in regards to testing, maintaining and moving 
on in the future.


Notice how there is only VHS and not Betamax?

Notice the same about Blu-Ray and HD-TV? People won't buy into something 
while there are competitive specs, and these are competitive in that 
it makes little since to use both in a document, though you can now.


The point is, people in the real world have to use this stuff. It helps 
them if they have one, generally agreed on approach. As it is, folks 
have to contend with both RDFa and microformats, but at least we know 
these have different purposes.


Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Maciej Stachowiak



On May 14, 2009, at 1:04 PM, Shelley Powers wrote:


Maciej Stachowiak wrote:


On May 14, 2009, at 5:18 AM, Shelley Powers wrote:

So much concern about generating RDF, makes one wonder why we  
didn't just implement RDFa...


If it's possible to produce RDF triples from microdata, and if RDF  
triples of interest can be expressed with microdata, why does it  
matter if the concrete syntax is the same as RDFa? Isn't the  
important thing about RDF the data model, not the surface syntax?


(I understand that if the microdata syntax offered no advantages  
over RDFa, then it would be a wasted effort to diverge. But my  
impression is that you'd object to anything that isn't exactly  
identical to RDFa, even if it can easily be used in the same way.)


Regards,
Maciej


Because one would assume that one way to accomplish a task would be  
more attractive to web developers, designers, parser developers,  
browsers, et al.


In addition, one would also assume that one way to accomplish a task  
would be more attractive in regards to testing, maintaining and  
moving on in the future.


Notice how there is only VHS and not Betamax?

Notice the same about Blu-Ray and HD-TV? People won't buy into  
something while there are competitive specs, and these are  
competitive in that it makes little since to use both in a  
document, though you can now.


Physical media do tend to converge due to network effects. I think the  
effect is less strong for digital file formats. For example, MP3 and  
AAC are both fairly successful; similarly, MPEG-4, Windows Media and  
Ogg are all getting some degree of traction. But you may be right that  
ultimately there will be only one winner.


The point is, people in the real world have to use this stuff. It  
helps them if they have one, generally agreed on approach. As it is,  
folks have to contend with both RDFa and microformats, but at least  
we know these have different purposes.


From my cursory study, I think microdata could subsume many of the  
use cases of both microformats and RDFa. It seems to me that it avoids  
much of what microformats advocates find objectionable, and provides a  
good basis for new microformats; but at the same time it seems it can  
represent a full RDF data model. Thus, I think we have the potential  
to get one solution that works for everyone.


I'm not 100% sure microdata can really achieve this, but I think  
making the attempt is a positive step.


One other detail that it seems not many people have picked up on yet  
is that microdata proposes a DOM API to extract microdata-based info  
from a live document on the client side. In my opinion this is huge  
and has the potential to greatly increase author interest in semantic  
markup.


Now, it may be that microdata will ultimately fail, either because it  
is outcompeted by RDFa, or because not enough people care about  
semantic markup, or whatever. But at least for now, I don't see a  
reason to strangle it in the cradle.


Regards,
Maciej

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Shelley Powers


Maciej Stachowiak wrote:


On May 14, 2009, at 1:04 PM, Shelley Powers wrote:


Maciej Stachowiak wrote:


On May 14, 2009, at 5:18 AM, Shelley Powers wrote:

So much concern about generating RDF, makes one wonder why we 
didn't just implement RDFa...


If it's possible to produce RDF triples from microdata, and if RDF 
triples of interest can be expressed with microdata, why does it 
matter if the concrete syntax is the same as RDFa? Isn't the 
important thing about RDF the data model, not the surface syntax?


(I understand that if the microdata syntax offered no advantages 
over RDFa, then it would be a wasted effort to diverge. But my 
impression is that you'd object to anything that isn't exactly 
identical to RDFa, even if it can easily be used in the same way.)


Regards,
Maciej


Because one would assume that one way to accomplish a task would be 
more attractive to web developers, designers, parser developers, 
browsers, et al.


In addition, one would also assume that one way to accomplish a task 
would be more attractive in regards to testing, maintaining and 
moving on in the future.


Notice how there is only VHS and not Betamax?

Notice the same about Blu-Ray and HD-TV? People won't buy into 
something while there are competitive specs, and these are 
competitive in that it makes little since to use both in a 
document, though you can now.


Physical media do tend to converge due to network effects. I think the 
effect is less strong for digital file formats. For example, MP3 and 
AAC are both fairly successful; similarly, MPEG-4, Windows Media and 
Ogg are all getting some degree of traction. But you may be right that 
ultimately there will be only one winner.


Now, that's the problem with all of this effort...winners and losers.

I don't support a spec because it gives me grins and giggles. I have 
certain tasks I want to do, and I look for what is the technology that 
has the most support in order to do them.


I've long been an adherent to RDF, which isn't really up for debate. 
Originally, I was an RDF/XML person, until the RDF-in-XHTML folks 
changed my mind.


What I see of RDFa is a specification that has been through a very long 
period of time, testing, commenting, being implemented by major players. 
I also have tools, right now, that I can use to process the RDFa, as 
well as support by two major search engine companies.


As Dan pointed out earlier, microdata seems to support most of RDF. 
Well, I know that RDFa does. It makes little sense to me to start from 
scratch when a mature specification with multi-vendor support already 
exists.


Especially when Drupal 7 rolls out with RDFa baked in. That's 1.7 
million sites supporting the spec. Then there's the new Google snippet 
thing -- who knows how many additional sites we'll now find supporting RDFa.


So, if I'm pushing for RDFa, it's not because I want to win. It's 
because I have things I want to do now, and I would like to make sure 
have a reasonable chance of working a couple of years in the future. And 
yeah, once SVG is in HTML5, and RDFa can work with HTML5, maybe I 
wouldn't mind giving old HTML a try again. Lord knows I'd like to user 
ampersands again.




The point is, people in the real world have to use this stuff. It 
helps them if they have one, generally agreed on approach. As it is, 
folks have to contend with both RDFa and microformats, but at least 
we know these have different purposes.


From my cursory study, I think microdata could subsume many of the use 
cases of both microformats and RDFa. It seems to me that it avoids 
much of what microformats advocates find objectionable, and provides a 
good basis for new microformats; but at the same time it seems it can 
represent a full RDF data model. Thus, I think we have the potential 
to get one solution that works for everyone.


I'm not 100% sure microdata can really achieve this, but I think 
making the attempt is a positive step.



It can't, don't you see?

Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0 will be 
around for years, decades. In addition, XHTML5 already supports RDFa.


Why you think something completely brand new, no vendor support, drummed 
up in a few hours or a day or so is more robust, and a better option 
than a mature spec in wide use, well frankly boggles my mind.


I am impressed with your belief in HTML5.

But
One other detail that it seems not many people have picked up on yet 
is that microdata proposes a DOM API to extract microdata-based info 
from a live document on the client side. In my opinion this is huge 
and has the potential to greatly increase author interest in semantic 
markup.




Not really. Can do this now with RDFa in XHTML. And I don't need any new 
DOM to do it.


The power of semantic markup isn't really seen until you take that 
markup data _outside_ the document. And merge that data with data from 
other documents. Google rich snippets. Yahoo searchmonkey. Heck, even an 
application

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Eduard Pascual

On Thu, May 14, 2009 at 3:54 PM, Philip Taylor excors+wha...@gmail.com wrote:
 [...]
 If we restrict literals to strings [...]
But *why* restrict literals to strings?? Being unable to state that
2009-05-14 is a date makes that value completely useless: it would
only be useful on contexts where a date is expected (bascially,
because it is a date), but it can't be used on such contexts because
the tool retrieving the value has no hint about it being a date. Same
is true for integers, prices (a.k.a. decimals plus a currency symbol),
geographic coordinates, iguana descriptions, and so on.

On Thu, May 14, 2009 at 8:25 PM, Maciej Stachowiak m...@apple.com wrote:

 On May 14, 2009, at 5:18 AM, Shelley Powers wrote:

 So much concern about generating RDF, makes one wonder why we didn't just
 implement RDFa...

 If it's possible to produce RDF triples from microdata, and if RDF triples
 of interest can be expressed with microdata, why does it matter if the
 concrete syntax is the same as RDFa? Isn't the important thing about RDF the
 data model, not the surface syntax?
It doesn't matter one syntax or another. But if a syntax already
exists (RDFa), building a new syntax should be properly justified. As
of now, the only supposed benefit I have heard of for this syntax is
that it avoids CURIEs... yet it replaces them with reversed domains??
Is that a benefit? I have been a Java programmer for some years, and
still find that convention absurd, horrible, and annoying. I'll agree
that CURIEs are ugly, and maybe hard to understand, but reversed
domains are equally ugly and hard to understand.

 (I understand that if the microdata syntax offered no advantages over RDFa,
 then it would be a wasted effort to diverge.
Which are the advantages it offers? I asked about them yesterday, and
no one has answered, so I'm asking again: please, enlighten me on this
because if I see no advantages myself and nobody else tells me about
any advantage, then the only conclusion a rational mind can take is
that there are no advantages. So, that's the position I'm on. I can
easily change my mind if anyone points out some advantage that might
actually help me more than RDFa when trying to add semantics and
metadata to my pages.

 But my impression is that you'd
 object to anything that isn't exactly identical to RDFa, even if it can
 easily be used in the same way.)
Actually, I do object to RDFa itself. Since the very first moment I
saw discussions about it on these lists, I have been trying to
highlight its flaws and to suggest ideas for alternatives.
Now, would you really expect me not to object to what, at least from
my current PoV, is simply worse than RDFa? IMHO, RDFa is just
*passable*, and microdata is too *mediocre* to get a pass. I don't
know about any solution that would be perfect, but I really think that
this community is definitely capable of producing something that is,
at least, *good*.

Of course, these are just my opinions, but I have told also what they
are based in. I'm eager to change my mind of there is base for it.

Regards,
Eduard Pascual

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Jonas Sicking

On Thu, May 14, 2009 at 5:00 PM, Jonas Sicking jo...@sicking.cc wrote:
 * Support for specifying a machine-readable value, such as for dates,
 colors, numbers, etc.
 * Support for tabular data.

 Especially the former is very interesting to me. I even wonder it
 would allow replacing the time element with a standardized
 microformat, such as:

 Christmas is going down on span item=w3c.time
 itemvalue=12-25-2009The 25th day of Decemberspan!

 (Though I'd probably avoid prefixes for 'standardized' item names).

Hmm.. I guess the syntax would be

span item itemprop=w3c.time propvalue=12-25-2009

Not very nice I admit.

/ Jonas

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Philip Taylor

On Thu, May 14, 2009 at 2:54 PM, Philip Taylor excors+wha...@gmail.com wrote:
 [...]
  urn:subject urn:predicate _:X .
 [...]
  div item
    link itemprop=about href=urn:subject
    meta itemprop=urn:predicate item id=X
  /div
 [...]
 So, I can't see any limits on expressivity other than that literals
 must be strings.

Hmm, I think I'm wrong here. 'id' has to be unique, which means this
pattern won't work if _:X is the object for triples with two different
subjects.

Additionally, there must be a chain from every blank node back to 
via http://www.w3.org/1999/xhtml/vocab#item, else it won't get
serialised (since serialisation starts from top-level items and
recurses down the correspondence chains). As a consequence of this and
the previous point, it is impossible to express cycles (e.g. _:X
urn:predicate _:X, or any longer cycles) unless the cycle contains
.

So there are these two restrictions on the shapes of expressible RDF
graphs. (I can't think of any other restrictions, though...)

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-14 Thread Maciej Stachowiak



On May 14, 2009, at 1:30 PM, Shelley Powers wrote:

So, if I'm pushing for RDFa, it's not because I want to win. It's  
because I have things I want to do now, and I would like to make  
sure have a reasonable chance of working a couple of years in the  
future. And yeah, once SVG is in HTML5, and RDFa can work with  
HTML5, maybe I wouldn't mind giving old HTML a try again. Lord knows  
I'd like to user ampersands again.


It sounds like your argument comes down to this: you have personally  
invested in RDFa, therefore having a competing technology is bad,  
regardless of the technical merits. I don't mean to parody here - I am  
somewhat sympathetic to this line of argument. Often pragmatic  
concerns mean that an incremental improvement just isn't worth the  
cost of switching (for example HTML vs. XHTML). My personally judgment  
is that we're not past the point of no return on data embedding.  
There's microformats, RDFa, and then dozens of other serializations of  
RDF (some of which you cited). This doesn't seem like a space on the  
verge of picking a single winner, and the players seem willing to  
experiment with different options.






The point is, people in the real world have to use this stuff. It  
helps them if they have one, generally agreed on approach. As it  
is, folks have to contend with both RDFa and microformats, but at  
least we know these have different purposes.


From my cursory study, I think microdata could subsume many of the  
use cases of both microformats and RDFa. It seems to me that it  
avoids much of what microformats advocates find objectionable, and  
provides a good basis for new microformats; but at the same time it  
seems it can represent a full RDF data model. Thus, I think we have  
the potential to get one solution that works for everyone.


I'm not 100% sure microdata can really achieve this, but I think  
making the attempt is a positive step.



It can't, don't you see?

Microdata will only work in HTML5/XHTML5. XHTML 1.1 and yes, 2.0  
will be around for years, decades. In addition, XHTML5 already  
supports RDFa.


Supporting XHTML 1.1 has about 0.001% as much value as  
supporting  text/html. XHTML 2.0 is completely irrelevant to the Web,  
and looks on track to remain so. So I don't find this point very  
persuasive.


Why you think something completely brand new, no vendor support,  
drummed up in a few hours or a day or so is more robust, and a  
better option than a mature spec in wide use, well frankly boggles  
my mind.


I haven't evaluated it enough to know for sure (as I said). I do think  
avoiding CURIEs is extremely valuable from the point of view of sane  
text/html semantics and ease of authoring; and RDF experts seem to  
think it works fine for representing RDF data models. So tentatively,  
I don't see any gaping holes. If you see a technical problem, and not  
just potential competition for the technology you've invested in, then  
you should definitely cite it.




I am impressed with your belief in HTML5.

But
One other detail that it seems not many people have picked up on  
yet is that microdata proposes a DOM API to extract microdata-based  
info from a live document on the client side. In my opinion this is  
huge and has the potential to greatly increase author interest in  
semantic markup.




Not really. Can do this now with RDFa in XHTML. And I don't need any  
new DOM to do it.


The power of semantic markup isn't really seen until you take that  
markup data _outside_ the document. And merge that data with data  
from other documents. Google rich snippets. Yahoo searchmonkey.  
Heck, even an application that manages the data from different  
subsites of one domain.


I respectfully disagree. An API to do things client-side that doesn't  
require an external library is extremely powerful, because it lets  
content authors easily make use of the very same semantic markup that  
they are vending for third parties, so they have more incentive to use  
it and get it right.




Now, it may be that microdata will ultimately fail, either because  
it is outcompeted by RDFa, or because not enough people care about  
semantic markup, or whatever. But at least for now, I don't see a  
reason to strangle it in the cradle.




Outcompeted...wow, what a way to think of it. Sorry, but competition  
has no place in spec work.


With due respect, you're the one who brought competition into this  
discussion by saying there can only be one winner. I don't really  
think that's true, in this case.


Regards,
Maciej

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-13 Thread Toby Inkster

Leif Halvard Silli wrote:

 Hear hear.  Lets call it Cascading RDF Sheets.

http://buzzword.org.uk/2008/rdf-ease/spec

http://buzzword.org.uk/2008/rdf-ease/reactions

I have actually implemented it. It works. RDFa is better though.

-Toby

[whatwg] Annotating structured data that HTML has no semantics for

2009-05-13 Thread Leif Halvard Silli


Toby Inkster on Wed May 13 02:19:17 PDT 2009:

Leif Halvard Silli wrote:

 Hear hear.  Lets call it Cascading RDF Sheets.

http://buzzword.org.uk/2008/rdf-ease/spec

http://buzzword.org.uk/2008/rdf-ease/reactions

I have actually implemented it. It works.


Oh! Thanks for sharing.


RDFa is better though.


What does 'better' mean in this context? Why and how? Because it is 
easier to process? But EASE seems more compatible with microformats, and 
is better in that sense.


I read all the reactions you pointed to. Some made the claim that EASE 
would move semantics out of the HTML file, and that microformats was 
better as it keeps the semantics inside the file. But I of course agree 
with you that EASE just underline/outline the semantics already in the file.


The thing that probably is most different from (most) microformats (and 
RDFa?) is that EASE can apply semantics even to bare naked elements 
without any @class, @id or other attributes. However, EASE do not 
/require/ one to use it like that. One may choose to create an entirely 
class based EASE document.


It would even be possible to use EASE together with Ian's microdata, 
don't you think?


From the EASE draft:
All properties in RDF-EASE begin with the string -rdf-, as per 
§4.1.2.1 Vendor-specific extensions in [CSS21]. This allows RDF-EASE 
and CSS to be safely mixed in one file, [...]
I wonder why you think it is so important to be able to mix CSS and 
EASE. It seems better to separate the two completely.


From the EASE draft:
The algorithm assumes that the document is held in a DOM-compatible 
representation,
Side kick: meta is proposed as part of microdata. But both Firefox and 
Safari will in the DOM render meta as part of head, regardless.

--
leif halvard silli

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-13 Thread Giovanni Gentili

 In terms of prefixes, I find that 'com.foaf-project.name' is a lot more
 difficult to write than 'foaf:name'. Reverse domain names are
 non-intuitive for non-programmer types (or non-Java programmers).

 If we can come up with a way of using the string foaf:name without
 having to declare foaf in each document, I'm totally in agreement. I've
 considered maybe registering the foaf URL scheme, or using some other
 punctuation character and having people register prefixes, but I don't
 know what punctuation character to use (':' and '.' are both taken).

put in HTML5 some predefinited prefixes for @itemprop:

dc = http://purl.org/dc/terms/
foaf = http://xmlns.com/foaf/0.1/
vcard = http://www.w3.org/2001/vcard-rdf/3.0#
owl = http://www.w3.org/2002/07/owl#
rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs = http://www.w3.org/2000/01/rdf-schema#
sioc = http://rdfs.org/sioc/ns#
skos = http://www.w3.org/2004/02/skos/core#
xsd = http://www.w3.org/2001/XMLSchema#

also, instead of @item @itemprop @subject
is better @item @prop @subj
or @rdf-typeof @rdf-property @rdf-about (and @rdf-rel)
-- 
Giovanni Gentili

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-13 Thread Eduard Pascual

Let me start with some apologies:

On Tue, May 12, 2009 at 12:55 PM, Eduard Pascual herenva...@gmail.com wrote:
 [...]
 Seeing that solutions are already being discussed
 here, I'm trying to put the ideas into a human-readable document that
 I plan to submit to this list either late today or early tomorrow for
 your review and consideration.
Oops, I'm already late with that. I had some unexpected compromises and
had no time to finish that doc. I still hope, however, to publish it today.

On Tue, May 12, 2009 at 12:55 PM, Eduard Pascual herenva...@gmail.com wrote:
 [...]
 Third issue: also a flaw inherited from RDFa, it can be summarized as
 completelly ignoring the requirement I submitted to this list on April
 28th, in reply to Ian asking us to review the use cases [1].
 [...]
 [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019487.html

On Tue, May 12, 2009 at 7:30 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Well, he didn't quite *ignore* it - he did explicitly call out that
 requirement to say that his solution didn't solve it at all.

I missed that part of Ian's post, sorry. I really read it from top to bottom,
but it was quite long. I guess I should have re-read it.
Now, after some re-reading, I have noticed a point I should reply to:

On Sun, May 10, 2009 at 12:32 PM, Ian Hickson i...@hixie.ch wrote:
 [...]
 * Any additional markup or data used to allow the machine to understand
   the actual information shouldn't be redundantly repeated (e.g. on each
   cell of a table, when setting it on the column is possible).

 This isn't met at all with the current proposal. Unfortunately the only
 general solutions I could come up with that would allow this were
 selector-based, and in practice authors are still having trouble
 understanding how to use Selectors even with CSS.

First, I'd like to ask for a clarification from Ian: what do you mean by
autrhos are still having trouble understanding how to use Selectors?
If you mean that they have trouble when trying to select something like
the second cell of the first row that has a 'foo' attribute different from
'bar' within tables that have four or more rows or even more obscure stuff,
then I should agree: most authors will definitely have trouble dealing with
so complex cases, and I bet many will always have such trouble. However, if
you mean that authors can't deal with simple class, id, and/or
children/descendant
selectors, then I think you are seriously understimating authors.
On a side note, I'd like to advance that my idea, despite being Selector-based
(actually, I should say CSS-based: it reuses quite more than
selectors), wouldn't
require authors to use selectors at all, at least for the cases that
can currently
be solved by RDFa (or, FWIW, with the current Microdata approach on
the spec); the
same way a page can be completely styled with CSS without using
selectors, via the
style attribute.

On Tue, May 12, 2009 at 1:59 PM, Philip Taylor excors+wha...@gmail.com wrote:
 On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote:
 [...]
 (at least for now: many RDFa-aware agents vs. zero HTML5's
 microdata -aware agents)

 HTML5 microdata parsers seem pretty trivial to write -
 http://philip.html5.org/demos/microdata/demo.html is only about two
 hundred lines to read all the data and to produce JSON and
 N3-serialised RDF. It shouldn't take more than a few hours to produce
 a similar library for other languages, including the time taken to
 read the spec, so the implementation cost for generic parser libraries
 doesn't seem like a significant problem.

Actually, I was thinking about the cost of deploying implementations,
rather than
writting them, since RDFa consumers are already out there and working. This,
however, strays a bit out of the original idea: it's not really a matter of how
big the cost is on its own, but of what do you get for that cost. This
is probably
my own fault, but I still fail to see what Ian's suggestion offers
that RDFa doesn't;
so my impression is that these costs, even if they are small, are
buying nothing, so
they are not worth it. If someone is willing to highlight what makes
this proposal
worth the costs (ie: what makes it better than RDFa), I'm willing to listen.

On Tue, May 12, 2009 at 2:30 PM, Shelley Powers
shell...@burningbird.net wrote:
 [...] Eduard, looking forward to seeing your own interpretation
 of the best metadata annotation.

Hey, who said my proposal will be, or try to be, the best one?
Definitelly, I didn't.
Actually, the reason to submit it here will be to have other people
look at it and
figure out ways to improve it (and I'm quite sure it can be improved,
I'm human after all).
Please, let me explicitly state that I don't pretend that idea to be
the best solution.
Since neither RDFa, nor Microformats, nor Ian's proposal could solve
my needs, my goal was
to build a solution that solves both my needs, and those solved by
other approaches, as a
proof that

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Eduard Pascual

I don't really like to be harsh, but I have some criticism to this,
and it's going to be quite hard. However, my goal by pointing out what
I consider so big mistakes is to help HTML5 becoming as good as it
could be.

First issue: it solves a (major) subset of what RDFa would solve.
However, it has been taken as a requirement to avoid
clashes/incompatibilities with RDFa. In other words, as things stand,
authors will face two options: either use RDFa in HTML5, which would
forsake validation but actually work; or take a less powerful, less
supported (at least for now: many RDFa-aware agents vs. zero HTML5's
microdata -aware agents) that validates but provides no pragmatic
advantages.
IMO, an approach that forces authors to choose between
validity/conformance which doesn't *yet* works vs. invalid solutions
that actually work is a horrible idea: it encourages authors to
forsake validity if they want things to work.
Wouldn't the RDFa + @prefix solution suggested many times work better
and require less effort (for spec writters, for implementors, and for
content authors)? Keep in mind that I don't think RDFa + @prefix is
the solution we need; I'm just trying to point out that the current
approach is even worse than that.

Second issue: as the decaffeinated RDFa it is, the HTML5 Microdata
approach tends to fail where RDFa itself fails. It's nice that, thanks
to the time element, the problem with trying to reuse human-readable
dates as machine-readable is dodged; but there are other cases where
separate values might be needed: for example using a street address
for the human-readable representation of a location and the exact
geographic coordinates as the machine-readable (since not all
micro-data parsers can rely on Google Maps's database to resolve
street addresses, you know); or using a colored name (such as lime
green displayed on lime green color) as the human-readable
representation of a color, and the hexcode (like #00FF00) as the
machine-readable representation. These are just the cases from the top
of my head, and this can't be considered in any way a complete list.
While *favoring* the reuse of human-readable values for the
machine-readable ones is appropiate, because it's the widely most
common case, *forcing* that reuse is a quite bad idea, because it is
*not* the *only* case.

Third issue: also a flaw inherited from RDFa, it can be summarized as
completelly ignoring the requirement I submitted to this list on April
28th, in reply to Ian asking us to review the use cases [1]. I'll try
to illustrate it with a example, inspired by the original use-case:
Let's say someone's marking up a collection of iguanas (or cats, or
even CDs, doesn't really make a difference when illustrating this
issue), making a page for each iguana (or whatever) with all the
details for it; and then making an index page listing the maybe 20
iguanas with their name, picture, and link to the corresponding page.
Adding micro-data to that index, either with RDFa or with Ian's
microdata proposal, would involve stating 20 times in the markup
something like this is the iguana's picture; this is the iguana's
name; and this is the iguana's URL. It would be preferable to be able
to state something like each (row) tr in the table describes an
iguana: the imgs are each iguana's picture, the contents of the
a's are the names, and the @href of the a's are the URLs to their
main pages just once. If I only need to state the table headings once
for the users to understand this concept, why should a micro-data
consumer require me to state it 20 times, once for each row?
Please note how such a page would be quite painful to maintain: any
mistake in the micro-data mark-up would generate invalid data and
require a manual harvest of the data on the page, thus killing the
whole purpose of micro-data. And repeating something 20 (or more)
times brings a lot of chances to put a typo in, or to miss an
attribute, or any minor but devastating mistake like these.

Last, but not least, I'm not sure if it was wise to start defining a
solution while some of the requirements seem to be still under
discussion. Actually, I had a possible solution in mind, but I was
holding it while reviewing it against the requiremetns being
discussed, so I could adapt it to any requirements I might had
initially missed. Seeing that solutions are already being discussed
here, I'm trying to put the ideas into a human-readable document that
I plan to submit to this list either late today or early tomorrow for
your review and consideration.


Regards,
Eduard Pascual

[1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-April/019487.html

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Philip Taylor

On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote:
 [...]
 (at least for now: many RDFa-aware agents vs. zero HTML5's
 microdata -aware agents)

HTML5 microdata parsers seem pretty trivial to write -
http://philip.html5.org/demos/microdata/demo.html is only about two
hundred lines to read all the data and to produce JSON and
N3-serialised RDF. It shouldn't take more than a few hours to produce
a similar library for other languages, including the time taken to
read the spec, so the implementation cost for generic parser libraries
doesn't seem like a significant problem.

The cost of integration with backend RDF-based systems seems more
significant - hopefully you could simply replace the frontend RDFa
parser with a microdata parser and generate the same RDF triples and
it would all work fine, but I don't know whether that's true in
practice (because maybe the microdata syntax is too restrictive to
represent the vocabularies people want to use, and so they'd have to
go to lots of extra effort to create a new vocabulary).

 [...] there are other cases where
 separate values might be needed: for example using a street address
 for the human-readable representation of a location and the exact
 geographic coordinates as the machine-readable (since not all
 micro-data parsers can rely on Google Maps's database to resolve
 street addresses, you know); or using a colored name (such as lime
 green displayed on lime green color) as the human-readable
 representation of a color, and the hexcode (like #00FF00) as the
 machine-readable representation.

You could replace
  span itemprop=colorlime green/span
  span itemprop=location1 High Street/span
with
  meta itemprop=color content=#00FF00spanlime green/span
  meta itemprop=location.lat content=56.78meta
itemprop=location.long content=-12.34span1 High Street/span
to get the desired output. (Not particularly elegant syntax, though.)

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Shelley Powers


Philip Taylor wrote:

On Tue, May 12, 2009 at 11:55 AM, Eduard Pascual herenva...@gmail.com wrote:
  

[...]
(at least for now: many RDFa-aware agents vs. zero HTML5's
microdata -aware agents)



HTML5 microdata parsers seem pretty trivial to write -
http://philip.html5.org/demos/microdata/demo.html is only about two
hundred lines to read all the data and to produce JSON and
N3-serialised RDF. It shouldn't take more than a few hours to produce
a similar library for other languages, including the time taken to
read the spec, so the implementation cost for generic parser libraries
doesn't seem like a significant problem.
  


Writing something that will produce triples may be easy, but what's 
important is that you're producing an RDF model.


Philip, I've been looking at your application, and you're not producing 
the same model for Ian's microdata proposal that is produced using 
either eRDF or RDFa. I'll have more on this later.

The cost of integration with backend RDF-based systems seems more
significant - hopefully you could simply replace the frontend RDFa
parser with a microdata parser and generate the same RDF triples and
it would all work fine, but I don't know whether that's true in
practice (because maybe the microdata syntax is too restrictive to
represent the vocabularies people want to use, and so they'd have to
go to lots of extra effort to create a new vocabulary).

  

[...] there are other cases where
separate values might be needed: for example using a street address
for the human-readable representation of a location and the exact
geographic coordinates as the machine-readable (since not all
micro-data parsers can rely on Google Maps's database to resolve
street addresses, you know); or using a colored name (such as lime
green displayed on lime green color) as the human-readable
representation of a color, and the hexcode (like #00FF00) as the
machine-readable representation.



You could replace
  span itemprop=colorlime green/span
  span itemprop=location1 High Street/span
with
  meta itemprop=color content=#00FF00spanlime green/span
  meta itemprop=location.lat content=56.78meta
itemprop=location.long content=-12.34span1 High Street/span
to get the desired output. (Not particularly elegant syntax, though.)

  


It's funny, but oddly enough, this discussion reminds me of when I 
started at Boeing, right after college. I started just when the great 
debate between SQL and QUEL was ending, in SQL's favor. Most folks still 
feel that QUEL was the superior option, but SQL won out in the end 
because it had widespread use, and was supported by more of the 
(powerful) database companies, and hence the companies using the databases.


The same could be said of Betamax versus VHS, and even the recent HDTV 
and Blu-Ray debates: we can get caught up in issues of superiority and 
argue the fine points of (mostly) obscure markup until the cows come 
home, but at some point in time, you have to pick a standard to get 
behind, or no one will any confidence in _any_ of the options being 
proposed--and the concept underlying the competing technologies (or 
standards) is hindered, perhaps for years.


Sorry, I digress. Eduard, looking forward to seeing your own 
interpretation of the best metadata annotation.


Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Ian Hickson

On Tue, 12 May 2009, Peter Mika wrote:

 Just a quick comment on:
 
   it uses prefixes, which most authors simply do not understand, and
   which many implementors end up getting wrong (e.g. SearchMonkey
   hard-coded certain prefixes in its first implementation, Google's
   handling of RDF blocks for license declarations is all done with
 
 Actually, the problem we see is not so much the prefixes themselves but rather
 the cumbersome way of specifying namespace prefix definitions using xmlns. So
 I think it would make sense to have some mechanism for referencing bundles of
 namespace prefixes ('profiles') or namespace registries, in order to easy
 authoring.
 
 In terms of prefixes, I find that 'com.foaf-project.name' is a lot more 
 difficult to write than 'foaf:name'. Reverse domain names are 
 non-intuitive for non-programmer types (or non-Java programmers).

If we can come up with a way of using the string foaf:name without 
having to declare foaf in each document, I'm totally in agreement. I've 
considered maybe registering the foaf URL scheme, or using some other 
punctuation character and having people register prefixes, but I don't 
know what punctuation character to use (':' and '.' are both taken).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Shelley Powers


Ian Hickson wrote:

On Tue, 12 May 2009, Peter Mika wrote:
  

Just a quick comment on:

  it uses prefixes, which most authors simply do not understand, and
  which many implementors end up getting wrong (e.g. SearchMonkey
  hard-coded certain prefixes in its first implementation, Google's
  handling of RDF blocks for license declarations is all done with

Actually, the problem we see is not so much the prefixes themselves but rather
the cumbersome way of specifying namespace prefix definitions using xmlns. So
I think it would make sense to have some mechanism for referencing bundles of
namespace prefixes ('profiles') or namespace registries, in order to easy
authoring.

In terms of prefixes, I find that 'com.foaf-project.name' is a lot more 
difficult to write than 'foaf:name'. Reverse domain names are 
non-intuitive for non-programmer types (or non-Java programmers).



If we can come up with a way of using the string foaf:name without 
having to declare foaf in each document, I'm totally in agreement. I've 
considered maybe registering the foaf URL scheme, or using some other 
punctuation character and having people register prefixes, but I don't 
know what punctuation character to use (':' and '.' are both taken).


  
But then we would lose the extensibility, which is the power behind all 
of this.


If I remember correctly, Henri had an issue with the DOM when it came to 
support of namespaces in XHTML, and not in HTML, which was the reason 
that @prefix or something along those lines proposed. There was quite 
positive progress in this regard, too. I don't know what happened to 
that progress.


But regardless, the majority of people will include metadata markup by 
installing a plug-in or module, and making a couple of choices. And if 
you put together a good ten-minute tutorial for the average developer, 
they'll have no problem with foaf:name. Training and clarity of 
communication is much ore important than form, it always has been with 
technology.


The examples you come up with just don't justify discarding 
consideration of a capability that just started getting incorporated 
into Google search. I would say if your fellow Google developers could 
understand how this all works, there is hope for others.


Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Sam Ruby

On Tue, May 12, 2009 at 4:34 PM, Shelley Powers
shell...@burningbird.net wrote:

 I
 would say if your fellow Google developers could understand how this all
 works, there is hope for others.

if

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html

 Shelley

- Sam Ruby

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Shelley Powers


Sam Ruby wrote:

On Tue, May 12, 2009 at 4:34 PM, Shelley Powers
shell...@burningbird.net wrote:
  

I
would say if your fellow Google developers could understand how this all
works, there is hope for others.



if

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html

  
\

- Sam Ruby

  
Ah heck, I've made mistakes with vocabularies too. That's why you ask 
for feedback. Unfortunately, asking for feedback isn't an option when 
you're creating secret stuff.


I could have wished Google used FOAF or DC, too, but it's a start.

Shelley

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Philip Taylor

On Tue, May 12, 2009 at 10:21 PM, Sam Ruby ru...@intertwingly.net wrote:
 On Tue, May 12, 2009 at 4:34 PM, Shelley Powers
 shell...@burningbird.net wrote:

 I
 would say if your fellow Google developers could understand how this all
 works, there is hope for others.

 if

 http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0064.html

Also: The instructions at
http://google.com/support/webmasters/bin/answer.py?answer=146898 (and
related pages) alternate between
 xmlns:v=http://rdf.data-vocabulary.org;
and
 xmlns:v=http://rdf.data-vocabulary.org/;
seemingly at random.

(The first means that property=v:name abbreviates the bogus URI
http://rdf.data-vocabulary.orgname;, if I understand correctly. The
second means it's http://rdf.data-vocabulary.org/name; which is a
404. Perhaps they meant xmlns:v=http://rdf.data-vocabulary.org/#;
which would point at the relevant bit of the vocabulary RDF file?
Hopefully people won't actually deploy content using the inconsistent
namespaces before the documentation is fixed...)

(They've also got a spanstrong property=v:name and spanspan
property=v:locality and some unclosed as, so it seems the
documentation writers are having difficulty even writing plain HTML.)

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-12 Thread Leif Halvard Silli


Tab Atkins Jr. on Tue, 12 May 2009 12:30:27 -0500:


On Tue, May 12, 2009 at 5:55 AM, Eduard Pascual:
  

 [...] It would be preferable to be able
 to state something like each (row) tr in the table describes an
 iguana: the imgs are each iguana's picture, the contents of the
 a's are the names, and the @href of the a's are the URLs to their
 main pages just once.

Indeed.

 If I only need to state the table headings once
 for the users to understand this concept, why should a micro-data
 consumer require me to state it 20 times, once for each row?
 Please note how such a page would be quite painful to maintain: any
 mistake in the micro-data mark-up would generate invalid data and
 require a manual harvest of the data on the page, thus killing the
 whole purpose of micro-data.


Indeed. (But of course, for copy-paste safety, the format has to be 
wordy and repetitive.)



 And repeating something 20 (or more)
 times brings a lot of chances to put a typo in, or to miss an
 attribute, or any minor but devastating mistake like these.



Well, he didn't quite *ignore* it - he did explicitly call out that
requirement to say that his solution didn't solve it at all.  He also
laid down the reason why - it's unlikely that any reasonable simple
in-place metadata solution would allow you to do that.  You either
need significant complexity, some reliance on language semantics (like
tables can rely on their headers), or moving to out-of-band
specification, likely through a Selectors-based model.
  
Indeed. And Ian's arguments against a selector based model (the claim 
that authors have problems understanding selectors) was one of the least 
convincing arguments he made, I think.  CSS and selectors appears to be 
one of the best understood technologies of the web.

The last is likely the best solution for that, and is even easier to
implement within Ian' simplified proposal.  I don't see a good reason
why that can't advance on a separate track, as (being out-of-band) it
doesn't require changes to HTML to be usable.

I floated a basic proposal for Cascading RDF[1] several months ago,
and someone else (I think Eduard?  I'd have to check my archives) did
something very similar.

[1]: http://www.xanthir.com/rdfa-vs-crdf.php
  


Hear hear.  Lets call it Cascading RDF Sheets. It could be used for 
the following purposes:


1. The IRI of the Cascading RDF Sheet could serve the role of profile URI;
2. The Cascading RDF Sheet itself could serve the role of a profile 
document; (Finally we could get some kind of registered profile format.)
3. Just as CSS sheets today, a cRDFsheet could be used as authoring 
help, when authoring with a microformat. HTML editing programs could 
offer the elements + classes in the Cascading RDF Sheet to authors, the 
same way that some editors to today use the selectors in stylesheets as 
a vocabulary repository for the current file or project. CSS selectors 
is already a well known format. (One may then, of course, already use a 
CSS style sheet for this, kind of. But this soon becomes clumsy. Better 
to separate styling from semantics and structure.)


In fact, I myself begun looking into creating something along these 
lines ... Though rather than a Cascading RDF Sheet, I looked into 
creating a Profile Style Sheet which could be used to define a machine 
readable microformat profile. My motivation for doing this was the 
authoring side of things, as I have been using a text editor which more 
or less uses CSS selectors the same way. (Instead of only offering me to 
pick p it also offers me to to pick p class=a etc.) Ian's 
proposal do not give much thought about the authoring side, I feel, 
except  for the more casual author. For authors, it is helpful to have a 
recipe document and to avoid repetition and data rot, as you 
mentioned in another message.


Ian's microdata format is easy to grasp the inner logics of - that is a 
good side of the proposal, this could help that it gets used.  But when 
it comes to author's and author groups' ability to define their own, 
decentralised semantics etc., then a decent profile format, which could 
be easily and simply integrated with authoring tools,  seems like a just 
as important  issue as a super simple microdata format.


The microformats.org community does not really have a machine parsable 
profile format. If there were such a format, I believe we would see more 
of more decentralized microformats.

--
leif halvard silli

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Simon Pieters


On Sun, 10 May 2009 12:32:34 +0200, Ian Hickson i...@hixie.ch wrote:


   Page 3:
   h2My Catsh2
   dl
dtSchrouml;dinger
dd item=com.damowmow.cat
 meta property=com.damowmow.name content=Schrouml;dinger
 meta property=com.damowmow.age content=9
 p property=com.damowmow.descOrange male.
dtErwin
dd item=com.damowmow.cat
 meta property=com.damowmow.name content=Lord Erwin
 meta property=com.damowmow.age content=3
 p property=com.damowmow.descSiamese color-point.
 img property=com.damowmow.img alt= src=/images/erwin.jpeg
   /dl


Given the microdata solution and this example, there is now a reason other than styling to 
introduce di, since here you duplicate the dt information in meta.

  dl
   di item=com.damowmow.cat
dt property=com.damowmow.nameSchrouml;dinger
dd
 meta property=com.damowmow.age content=9
 p property=com.damowmow.descOrange male.
   /di
   ...


The styling problem is discussed at http://forums.whatwg.org/viewtopic.php?t=47

--
Simon Pieters
Opera Software

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Giovanni Gentili

Ian Hickson:
   USE CASE: Annotate structured data that HTML has no semantics for, and
   which nobody has annotated before, and may never again, for private use or
   use in a small self-contained community.
 (..)
   SCENARIOS:

Between the scenarios should be considered also this case:

* a user (or groups of users) wants to annotate
items present on a generic web page with
additional properties in a certain vocabulary.
for example Joe wants to gather in a blog
a series of personal annotation to movies
(or other type of items) present in imdb.com.

other examples of external annotation could
be derived from this document [1].

this option require that @subject accept:

1) ID of an element with an item attribute, in the same Document
or
2) valid URL of an element with an item attribute elsewhere in the web
or
3) a valid URL (ithe item is the referred document or fragment)

This raises two other questions:

a) In the case of  properties specified for element
without ancestor with an item attribute specified
the corresponding item should be the document?
(element body with implicit item attribute).

b) Do we need to require UA to offer a standard
way to visualize (at least as an option left to the user)
the structured information carried in microdata ?
And copypaste? See also this email [2].

[1] http://www.w3.org/TR/2009/WD-media-annot-reqs-20090119/#req-r01
[2] http://lists.w3.org/Archives/Public/public-html/2009Jan/0082.html

-- 
Giovanni Gentili

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Philip Taylor

On Mon, May 11, 2009 at 6:15 PM, Giovanni Gentili
giovanni.gent...@gmail.com wrote:
 * a user (or groups of users) wants to annotate
 items present on a generic web page with
 additional properties in a certain vocabulary.
 for example Joe wants to gather in a blog
 a series of personal annotation to movies
 (or other type of items) present in imdb.com.

 [...]

 this option require that @subject accept:

 1) ID of an element with an item attribute, in the same Document
 or
 2) valid URL of an element with an item attribute elsewhere in the web
 or
 3) a valid URL (ithe item is the referred document or fragment)

For the RDF output, you can use link property=about
href=http://subject/; to create triples whose subject is a URL. (I
believe in general you can also do:
  meta item id=n0
  link subject=n0 property=about href=http://subject/;
  link subject=n0 property=http://predicate1/; href=http://object1/;
  meta subject=n0 property=http://predicate2/; content=object2
to represent arbitrary RDF triples.)

I don't think it would make sense for @subject to be a URL when
generating JSON output, because there wouldn't be anywhere to
represent that URL in the output structure. But there could be a
convention that properties called about indicate the URLs that the
item applies to, and then it would work with exactly the same markup
as the RDF case.

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-11 Thread Tim Tepaße

A cursory glance on the new section 5 raises two questions on  
indirection:


(Note the metas in the last example -- since sometimes the  
information

isn't visible, rather than requiring that people put it in and hide it
with display:none, which has a rather poor accessibility story, I  
figured
we could just allow meta anywhere, if it has a property=  
attribute.)


That seems to be a solution optimised for extremely invisible metadata  
but not for metadata which differs from the human visible data.  
Imagine as an example the simple act of marking up a number (and  
ignoring what the number denotes).  For human consumption a thousands  
seperator is often used, the type of seperator differs by language,  
locale and context. Just in my little word I see on regular basis the  
point, the comma, the space, the thin space and sometimes the the  
apostrophe. Parsing different representations of numbers would be a  
chore. The value of textContent of the element span  
itemprop=com.example.price€nbsp;1thinsp;000thinsp;000,mdash;/ 
span is clearly unusable, demanding an additional invisible meta  
property=com.example.price content=100.


My irritation lies in the element proliferation, requiring one element/ 
attribute combination for machines, one element/text content  
combination for humans. Of course, any sane author would arrange both  
elements in a close relation, as parent/child or sibling but there  
would be still two different elements to maintain, leading to a higher  
cognitive load. Not just for authors but also for programmers: a  
fluctating price had to be actualized on two different elements; tree  
walking DOM scripts had to take meta-Elements in account. Furthermore  
it clashes with the familiar habit of other elements in HTML. A  
hyperlink is one element with a machine-readable attribute and human- 
readable text content. A citation is one element with a machine- 
readable reference and human-readable text content. The same model is  
used in meter, progress, time, abbr ... but not in user- 
defined objects. I'd prefer an additional @content-like attribute  
which supersedes the text content and maybe even the default values of  
the other value-bearing elements, reducing two different elements to  
maintain or change to just one.



Instead, let us try using the regular IDREF functionality that  
HTML uses
in a variety of other places, like label for=. For this we'll  
need a
new attribute, but unfortunately we can't use about= (which would  
be the
obvious name to use), because that would conflict with RDFa, so  
instead

we'll use subject=:



I'm slighty irritated by the implied change from active, possessive  
formulating (“The cat has the name Hedral.”) to something more passive- 
y (“Hedral is a name owned by that cat.“). My mental model for  
property relationships orients itself more on the former wording; link  
relationships are similar in that regard. @about/@subject are like  
@rev; a @resource alias @rel would feel more natural. There are  
practical relation by the missing @resource, I think. Imagine a  
document documenting an household and a household vocabulary which  
allows triples of humans which are in an owner relationship to a  
cat. Given an household of two humans and one cat; how does one  
markup the assumption that the cat has two owners?

[whatwg] Annotating structured data that HTML has no semantics for

2009-05-10 Thread Ian Hickson


One of the more elaborate use cases I collected from the e-mails sent in 
over the past few months was the following:

   USE CASE: Annotate structured data that HTML has no semantics for, and
   which nobody has annotated before, and may never again, for private use or
   use in a small self-contained community.

   SCENARIOS:
 * A group of users want to mark up their iguana collections so that they
   can write a script that collates all their collections and presents
   them in a uniform fashion.
 * A scholar and teacher wants other scholars (and potentially students)
   to be able to easily extract information about what he teaches to add
   it to their custom applications.
 * The list of specifications produced by W3C, for example, and various
   lists of translations, are produced by scraping source pages and
   outputting the result. This is brittle. It would be easier if the data
   was unambiguously obtainable from the source pages. This is a custom
   set of properties, specific to this community.
 * Chaals wants to make a list of the people who have translated W3C
   specifications or other documents, and then use this to search for
   people who are familiar with a given technology at least at some
   level, and happen to speak one or more languages of interest.
 * Chaals wants to have a reputation manager that can determine which of
   the many emails sent to the WHATWG list might be more than usually
   valuable, and would like to seed this reputation manager from
   information gathered from the same source as the scraper that
   generates the W3C's TR/ page.
 * A user wants to write a script that finds the price of a book from an
   Amazon page.
 * Todd sells an HTML-based content management system, where all
   documents are processed and edited as HTML, sent from one editor to
   another, and eventually published and indexed. He would like to build
   up the editorial metadata used by the system within the HTML documents
   themselves, so that it is easier to manage and less likely to be lost.
 * Tim wants to make a knowledge base seeded from statements made in
   Spanish and English, e.g. from people writing down their thoughts
   about George W. Bush and George H.W. Bush, and has either convinced
   the people making the statements that they should use a common
   language-neutral machine-readable vocabulary to describe their
   thoughts, or has convinced some other people to come in after them and
   process the thoughts manually to get them into a computer-readable
   form.

   REQUIREMENTS:
 * Vocabularies can be developed in a manner that won't clash with future
   more widely-used vocabularies, so that those future vocabularies can
   later be used in a page making use of private vocabularies without
   making the earlier annotations ambiguous.
 * Using the data should not involve learning a plethora of new APIs,
   formats, or vocabularies (today it is possible, e.g., to get the price
   of an Amazon product, but it requires learning a new API; similarly
   it's possible to get information from sites consistently using 'class'
   values in a documented way, but doing so requires learning a new
   vocabulary).
 * Shouldn't require the consumer to write XSLT or server-side code to
   process the annotated data.
 * Machine-readable annotations shouldn't be on a separate page than
   human-readable annotations.
 * The information should be convertible into a dedicated form (RDF,
   JSON, XML) in a consistent manner, so that tools that use this
   information separate from the pages on which it is found have a
   standard way of conveying the information.
 * Should be possible for different parts of an item's data to be given
   in different parts of the page, for example two items described in the
   same paragraph. (The two lamps are A and B. The first is $20, the
   second $30. The first is 5W, the second 7W.)
 * It should be possible to define globally-unique names, but the syntax
   should be optimised for a set of predefined vocabularies.
 * Adding this data to a page should be easy.
 * The syntax for adding this data should encourage the data to remain
   accurate when the page is changed.
 * The syntax should be resilient to intentional copy-and-paste
   authoring: people copying data into the page from a page that already
   has data should not have to know about any declarations far from the
   data.
 * The syntax should be resilient to unintentional copy-and-paste
   authoring: people copying markup from the page who do not know about
   these features should not inadvertently mark up their page with
   inapplicable data.
 * Any additional markup or data used to allow the machine to understand

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-10 Thread Philip Taylor

On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:

 One of the more elaborate use cases I collected from the e-mails sent in
 over the past few months was the following:

   USE CASE: Annotate structured data that HTML has no semantics for, and
   which nobody has annotated before, and may never again, for private use or
   use in a small self-contained community.

 [...]

 To address this use case and its scenarios, I've added to HTML5 a simple
 syntax (three new attributes) based on RDFa.

There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.

-- 
Philip Taylor
exc...@gmail.com

Re: [whatwg] Annotating structured data that HTML has no semantics for

2009-05-10 Thread jgraham


Quoting Philip Taylor excors+wha...@gmail.com:


On Sun, May 10, 2009 at 11:32 AM, Ian Hickson i...@hixie.ch wrote:


One of the more elaborate use cases I collected from the e-mails sent in
over the past few months was the following:

  USE CASE: Annotate structured data that HTML has no semantics for, and
  which nobody has annotated before, and may never again, for private use or
  use in a small self-contained community.

[...]

To address this use case and its scenarios, I've added to HTML5 a simple
syntax (three new attributes) based on RDFa.


There's a quickly-hacked-together demo at
http://philip.html5.org/demos/microdata/demo.html (works in at least
Firefox and Opera), which attempts to show you the JSON serialisation
of the embedded data, which might help in examining the proposal.


I have a *totally unfinished* demo that does something rather similar
at [1]. It is highly likely to break and/or give incorrect results**.
If you use it for anything important you are insane :)

My general impression from writing the tool is that this proposal is,
at least, easy to write consumers for. I get the feeling that the
production side will also be within the grasp of most authors,
although it is hard to say for sure since I haven't really tried
authoring anything.

[1] http://james.html5.org/microdata/

** Known bugs include: incorrect lowercasing of non ascii characters,
lack of support for resolving uris, lack of rdf output, some others
that I forget

52 matches

Mail list logo