Re: [whatwg] Creative Commons Rights Expression Language

2008-09-01 Thread Henri Sivonen

On Sep 1, 2008, at 06:20, Karl Dubost wrote:


Le 29 août 2008 à 23:04, Henri Sivonen a écrit :
Also, having more metadata leads to UI clutter and data entry  
fatigue that alienates users. In the past, I worked on a content  
repository project that failed because (among other things) the  
content upload UI asked for an insane amount (a couple of  
screenfuls back then; probably a screenful today) of metadata when  
it didn't occur to system specifiers to invest in full text search.  
More metadata isn't better. Instead, systems should ask for the  
least amount of metadata that can possibly work (when the metadata  
must be entered by humans as opposed to being captured by machines  
like EXIF data). See also

http://www.w3.org/QA/2008/08/the-digital-stakhanovite


hehe. This was a-good-try-but-mischaracterization-from-the-ministry- 
of-truth


That was uncalled for.


to associate this article with the rants on metadata :) Let's clarify.


It's an excellent article. Thank you for writing it.

What I explain in the article is not the volume of metadata, but the  
volume of items and the context of usage.


  1. Extract anything you can from the data itself (exif, iptc, xmp,  
modifications, date)


Yes.

It's sad how some systems ask the user for a title when the title is  
already in an HTML or PDF file but it never occurred to the specifiers  
of the system that files can actually be parsed. It even sadder to ask  
the user for keywords, because it never occurred to the specifiers of  
a system that full-text search has been invented.



  2. Give a possibility in the UI to modify or add data.


Even the *possibility* to add costs UI real estate, so specifiers of a  
system should be very, very careful in what possibilities they offer.


In a business environment, you might have to give metadata about a  
work. I do it in my every day job. I give titles to my emails, I put  
comments in my cvs commits, etc. etc. These are all constraints. Not  
adding the data would still work technically.


Sure. However, writing a string that appears in mailbox list view or  
in a list view of commits is the baseline of user-entered metadata.  
Everything else is something *more*.


Just because something happens in a business setting where people can  
be fired doesn't mean that more metadata is better. I've seen metadata  
fail even in the military where they thought they could *order* people  
to enter metadata (and where they have a more elaborate punishment  
structure than in an ordinary working environment).


Having a UI cluttered with fields to enter is not a failure of  
metadata, it is a failure of the project in the social and business  
constraints of the project.


It's definitely a failure of the project in the social and business  
constraints. The reason for failure was a line of thought that went  
something like this:
Metadata is good. Therefore, let's have more of it. Let's model what  
can be said about the domain. We are in a position to require people  
to enter the metadata.


The process didn't try to seriously find out what the real must-have  
hard social and business constraints were.


My point is that metadata is useful isn't the whole story.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Creative Commons Rights Expression Language

2008-09-01 Thread ddailey
Sorry for joining in naively to a conversation I've not been following, but 
reading Karl's remarks on the facilitation of metadata entry for users, some 
discussions in the vicinity of the recent SVGOpen that concerned usability, 
accessibility,  and metadata made me think the following (that I suppose is 
rather outside the realm of HTML):


Suppose the user or author (since in an app the distinction is blurred 
somewhat) is building something like a graph (in the discrete math sense), 
an image repository, or even a diagram (though the categories of content 
here are heterogeneous, making the argument a bit more tenuous) using a 
guiwebapp (like inkscape for diagrams or 
http://srufaculty.sru.edu/david.dailey/svg/graphs30.svg for graphs). Let's 
say there are n basic entities (like graphs or images) for which metadata is 
required. Let us furthermore assume the metadata description language is of 
order 0 1 2 3  or 4 * and that the minimum number of user operations 
required to complete the metadata description for a single entity is bounded 
above by k.


We then may plot a user performance function that estimates the probability, 
p,  that users will actually succeed in entering data (as a function perhaps 
of not only n and k, but of the user's investment in the process). Clearly 
as n and k grow and as the user's investment in the process declines, so 
does p. We are interested, through, interface, in maximizing p.


I have a hunch (in math it is called a conjecture, but in CHI it is more 
like a hunch) that not only how, but also when, this conversation between 
user and software takes place affects the probability. For example if an 
artist were using Inkscape to draw SVG, then mandating a conversation about 
metadata each time a curve or gradient is completed is likely to drive users 
to AutoCad for their diagrams, even if wine is served.


In certain cases, it makes most sense to build that conversation as an exit 
interview. If we will have k phrases to enter (using a grammar of graph 
theoretic phrases) for each of n objects, then we may wish to build a very 
comfortable GUI to facilitate that for all the affected entities upon 
closing the app: Dear user, you have just completed a schematic drawing for 
the Intel i-Chore 42x processor, would you now like to a) save b) enter 
appropriate metadata c) save and enter data d) drink wine. The notion is 
that a GUI enabling such, could if it were viewed as a stage or mode of 
development a) rely on the visualization of the opus as thus far created b) 
be appropriately rich to the order of the metadata description language and 
c) make the data entry process unbundled from the creation process, hence 
allowing diversification of the assignments of tasks to workers (e.g. the 
familiar phrase of the assessment revolt of 2028: let the bureacrats do the 
bureaucracy!). That isn't to say that we should not also facilitate the 
entry of data at each stage of the drawing process, with a sub-interface of 
the master metadata editor, but given the complexity that some metadata 
editors may have to convey, the nature of the conversation between user and 
software may not be allowed to remain entirely casual (that is, wine may 
need to be upgraded to tequila).


/fwiw
David

(by the way, an Intellectual Property/provenance description language such 
as the library and visual rights communities work with might be an 
interesting overlay for the web, provided both free and corporate models 
(together with ample graph theory) are included)


* define the order of a metadata description language as 0 if it consists of 
simple non-delimited strings, 1 if it consists of delimited strings (with a 
single delimiter), 2 if the delimiters are parentheses (required to match), 
3 if the delimiters act like parentheses of multiple flavors as in XML,  and 
4 if the language is fully graph theoretic (parenthesized strings plus cross 
linkages -- footnotes).


- Original Message - 
From: Karl Dubost [EMAIL PROTECTED]

To: Henri Sivonen [EMAIL PROTECTED]
Cc: Ben Adida [EMAIL PROTECTED]; Paul Prescod [EMAIL PROTECTED]; Ian 
Hickson [EMAIL PROTECTED]; WHAT-WG [EMAIL PROTECTED]

Sent: Sunday, August 31, 2008 11:20 PM
Subject: Re: [whatwg] Creative Commons Rights Expression Language



Le 29 août 2008 à 23:04, Henri Sivonen a écrit :
Also, having more metadata leads to UI clutter and data entry  fatigue 
that alienates users. In the past, I worked on a content  repository 
project that failed because (among other things) the  content upload UI 
asked for an insane amount (a couple of screenfuls  back then; probably a 
screenful today) of metadata when it didn't  occur to system specifiers to 
invest in full text search. More  metadata isn't better. Instead, systems 
should ask for the least  amount of metadata that can possibly work (when 
the metadata must be  entered by humans as opposed to being captured by 
machines like EXIF  data). See also

http://www.w3.org/QA/2008

Re: [whatwg] Creative Commons Rights Expression Language

2008-08-31 Thread Karl Dubost


Le 29 août 2008 à 23:04, Henri Sivonen a écrit :
Also, having more metadata leads to UI clutter and data entry  
fatigue that alienates users. In the past, I worked on a content  
repository project that failed because (among other things) the  
content upload UI asked for an insane amount (a couple of screenfuls  
back then; probably a screenful today) of metadata when it didn't  
occur to system specifiers to invest in full text search. More  
metadata isn't better. Instead, systems should ask for the least  
amount of metadata that can possibly work (when the metadata must be  
entered by humans as opposed to being captured by machines like EXIF  
data). See also

http://www.w3.org/QA/2008/08/the-digital-stakhanovite


hehe. This was a-good-try-but-mischaracterization-from-the-ministry-of- 
truth to associate this article with the rants on metadata :) Let's  
clarify.


What I explain in the article is not the volume of metadata, but the  
volume of items and the context of usage.


   1. Extract anything you can from the data itself (exif, iptc, xmp,  
modifications, date)

   2. Give a possibility in the UI to modify or add data.

In a business environment, you might have to give metadata about a  
work. I do it in my every day job. I give titles to my emails, I put  
comments in my cvs commits, etc. etc. These are all constraints. Not  
adding the data would still work technically.


For my own personal photo, I don't (want/have) time to put plenty of  
metadata. And that's fine. I do though bulk metadata at a regular  
pace, for location (ex: all these selected photos have been taken in  
Taiwan with the help of GUI tools. Yes tools save my life).



Having a UI cluttered with fields to enter is not a failure of  
metadata, it is a failure of the project in the social and business  
constraints of the project.




--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool








Re: [whatwg] Creative Commons Rights Expression Language

2008-08-29 Thread Henri Sivonen

On Aug 28, 2008, at 15:31, Paul Prescod wrote:


I don't really understand why there is any debate about the utility of
metadata in general. Are you also against microformats? Title
elements? The meta element?

It seems obvious to me that a) metadata has been a huge success on the
web (the success of other techniques like NLP and PageRank
notwithstanding) and b) we haven't yet invented every metadata tag we
need. I think it is worthwhile to debate whether RDFa is the right
solution but do we really want to go back to a debate over whether
metadata is valuable or not?

This is useful stuff, right?


Some metadata may be useful. A lot of it isn't. Strugeon's Revelation  
applies.


I don't know what the right way to find the useful bits is, but just  
telling people out there to publish metadata and expecting use cases  
to emerge later isn't a good way, since that approach wastes a lot of  
people's effort. (I'm not suggesting that you are telling people to  
just go publish a lot of stuff. However, the upwards-scalable RDF  
naming approach and the approach of ignoring triples the consumer  
doesn't know about seem to be designed for erring on the side of  
publishing too much whereas the Microformats Process and the WHATWG  
approach ask for use cases first.)


One example of useless metadata evangelism that I myself fell for 8  
years ago was embedding Dublin Core metadata in HTML. It wasn't nice  
to realize that I had been tricked into something totally pointless.  
(The data was redundant with HTML and HTTP native data.)


Also, having more metadata leads to UI clutter and data entry fatigue  
that alienates users. In the past, I worked on a content repository  
project that failed because (among other things) the content upload UI  
asked for an insane amount (a couple of screenfuls back then; probably  
a screenful today) of metadata when it didn't occur to system  
specifiers to invest in full text search. More metadata isn't better.  
Instead, systems should ask for the least amount of metadata that can  
possibly work (when the metadata must be entered by humans as opposed  
to being captured by machines like EXIF data). See also

http://www.w3.org/QA/2008/08/the-digital-stakhanovite

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/




Re: [whatwg] Creative Commons Rights Expression Language

2008-08-29 Thread Ben Adida
Henri Sivonen wrote:
 I don't know what the right way to find the useful bits is, but just
 telling people out there to publish metadata and expecting use cases to
 emerge later isn't a good way, since that approach wastes a lot of
 people's effort.

In this email you claim there are no use cases.

But in another email only 6 hours earlier, you said:

 I'm getting mixed signals about the extent to which RDFa in
 envisioned to be browser-sensitive. Weren't browsers supposed to do
 cool stuff with it according to some emails in this thread?

So, clearly, there are use cases we've explained. Here they are again,
just in case:

SearchMonkey:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2008-August/015967.html

Ubiquity:
http://lists.w3.org/Archives/Public/www-archive/2008Aug/0127.html
(I can't find the WHATWG link, but it was sent to WHATWG, too...)

 (I'm not suggesting that you are telling people to just
 go publish a lot of stuff. However, the upwards-scalable RDF naming
 approach and the approach of ignoring triples the consumer doesn't know
 about seem to be designed for erring on the side of publishing too much
 whereas the Microformats Process and the WHATWG approach ask for use
 cases first.)

You could put it that way, but what RDF is really about is publishing
data in a fine-grained enough matter that applications can easily
overlap. That's why you can ignore parts of the data if you don't need
it. You get a much more loosely-coupled, opportunistic Web, that way,
which is exactly the kind of opportunity explored by tools like Ubiquity.

 One example of useless metadata evangelism that I myself fell for 8
 years ago was embedding Dublin Core metadata in HTML. It wasn't nice to
 realize that I had been tricked into something totally pointless. (The
 data was redundant with HTML and HTTP native data.)

I don't think anyone was trying to trick you (did someone make money or
acquire fame off your DC markup?), but certainly it's true that the
infrastructure wasn't quite there (and the flexibility of adding other
vocabularies wasn't there, either.) We tried to remedy that with RDFa,
and we can already see from the CC uptake and tools that the situation
is quite a bit better.

 Also, having more metadata leads to UI clutter and data entry fatigue
 that alienates users. In the past, I worked on a content repository
 project that failed because (among other things) the content upload UI
 asked for an insane amount (a couple of screenfuls back then; probably a
 screenful today) of metadata when it didn't occur to system specifiers
 to invest in full text search. More metadata isn't better. Instead,
 systems should ask for the least amount of metadata that can possibly
 work (when the metadata must be entered by humans as opposed to being
 captured by machines like EXIF data). See also
 http://www.w3.org/QA/2008/08/the-digital-stakhanovite

That is an extremely limited view of how this might be used, as if every
web site that wants to publish RDFa is going to prompt users for 20
fields. No. Take Craigslist again: 5 structured fields is plenty to do
super interesting stuff, but we'd have to come up with a special
microformat for apartment listings if we reject fine-grained metadata
like RDF.

And in some cases, you *do* need to be able to output lots of metadata
(think publication records at pubmed, other online journals.)

Some past approaches to this problem have failed for reasons we believe
we've identified:
- repetition of rendered and machine-readable data leading to staleness
- too hard to include (modifying the HEAD, separate file) for non-savvy
web publishers
- vocabularies are monolithic and non-remixable, limiting reuse and
ability of little guys to participate

We've worked hard to address these in RDFa, and the publisher interest
we're seeing shows we've done *something* right.

Maybe it's time to let go of old ghosts and explore how this new
solution may address some of the problems of the past?

-Ben


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Ben Adida
Ian Hickson wrote:
 Clearly, and as the voice-over states, the site needs embedded metadata 
 that easily connects what the user is pointing to to the structured 
 data required for mapping.
 
 Since Craigslist doesn't have structured data now, that seems like a 
 verifiably false claim. :-)

Did you listen to the video? It clearly states that they wrote a
specific hack for Craigslist, but that they expect this to work more
generically. Site-specific hacks don't scale to the Web. A solution that
scales will require a single parser, not site-specific parsers (though
site-specific parsers will likely be a transition path.)

The video's comments about microformats should make that clear.

 In fact, Craigslist is a great example. Given how hostile Craigslist has 
 been to people reusing their data,

You're confusing two issues. Craigslist doesn't want other *web sites*
redistributing their data. I doubt they would take issue with users
trying to process the data for their own private needs.

Craigslist mostly relies on its no bots Terms of Use to prevent other
sites from reusing their data. They certainly don't make it too
difficult to screen-scrape, given their simple templates.

 what reason do we have to believe that they would ever make their data 
 accessible using RDFa? (Or any other metadata system in fact.)

So, assuming you're right about Craigslist (and I think you're wrong, as
mentioned above), in your opinion, there won't be a reasonable number of
publishers who want to publish RDFa (or something like it?) Everyone
will just obscure their data so it's only human readable?

That's a rather limited view of the potential of the web. Do you not see
the value that's unleashed by tools like Ubiquity, and the incentive
that web sites will have to plug in?

-Ben


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Ian Hickson
On Wed, 27 Aug 2008, Ben Adida wrote:
 Ian Hickson wrote:
  Clearly, and as the voice-over states, the site needs embedded 
  metadata that easily connects what the user is pointing to to the 
  structured data required for mapping.
  
  Since Craigslist doesn't have structured data now, that seems like a 
  verifiably false claim. :-)
 
 Did you listen to the video? It clearly states that they wrote a 
 specific hack for Craigslist, but that they expect this to work more 
 generically.

Sure, I'm just debating needs. It is possible to do it without 
structured data, indeed the flagship example here doesn't have any. I'm 
not saying that that's a better design (on the contrary). It's just the 
way it is today.


 Site-specific hacks don't scale to the Web. A solution that scales will 
 require a single parser, not site-specific parsers (though site-specific 
 parsers will likely be a transition path.)

To scale to the whole Web, the only thing I can see working is the 
computers understanding human language. I just don't see the whole Web 
marking up their data using fine grained semantic markup. We have enough 
trouble getting them to use h1 and p.

Examine the markup of this page (which I originally stumbled across a few 
months ago, but which was updated just yesterday):

   http://puysl.com/view.htm

This is the level of authoring that we have to deal with if we're 
targetting the whole Web. That page is a microcosm of specialness, but 
pages like it abound.


 So, assuming you're right about Craigslist (and I think you're wrong, as 
 mentioned above), in your opinion, there won't be a reasonable number of 
 publishers who want to publish RDFa (or something like it?) Everyone 
 will just obscure their data so it's only human readable?

Not everyone, no. Some, many even, will get the religion and mark up their 
data in useful ways. But I don't see any evidence to suggest that a 
critical mass will do so.


 That's a rather limited view of the potential of the web. Do you not see 
 the value that's unleashed by tools like Ubiquity, and the incentive 
 that web sites will have to plug in?

I absolutely see the value. I would absolutely love for the Semantic Web 
vision to be the future. However, just because I want it to come true 
doesn't mean it will come true. It fundamentally relies on humans acting 
in a way that we _know_ they don't. We can't just ignore 18 years of 
experience with the Web and Web authors and say well our idea is so great 
that authors will all magically make it happen.

I think (some hip) sites will totally plug in, just as they already have, 
using site-specific scripts that can be downloaded by the users of those 
sites. I think a few will use simple domain-specific fine grained markup 
conventions (like Microformats); I think fewer still, possibly many but 
likely not a critical mass, will use RDF and RDFa.

This mirrors what happens today (e.g. GMail and other big sites have 
contacts APIs, a small number of sites have hCard, a very few have FOAF).

I don't see that tools like Ubiquity give any incentive to use RDF. The 
immediate reward from a hard-coded site-specific script is more effective 
than the compound reward of writing a generic script (typically a harder 
task), convincing at least one site to rewrite its markup to use a 
suitable convention, and then debugging the script to work around the bugs 
that that site has, even if one eventually convinces multiple sites to 
support the same conventions.

(Also, note that as much as things like Ubiquity are great for people like 
us, they, like Quicksilver before it, and the Unix command line before 
that, would totally confuse regular users. The concept of using a site 
for a single task, and copying the output of that site into another site, 
resonates with users in a way that just trust us, if you tell the 
computer what you want it'll do it somehow doesn't. If power like 
Ubiquity is the goal, we haven't yet found the UI for it.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Paul Prescod
On Wed, Aug 27, 2008 at 8:17 PM, Ian Hickson [EMAIL PROTECTED] wrote:
 On Wed, 27 Aug 2008, Ben Adida wrote:

 Consider specifically the Craigslist example, where the user selects a
 few of the apartments and says map these.

 Clearly, and as the voice-over states, the site needs embedded metadata
 that easily connects what the user is pointing to to the structured
 data required for mapping.

 Since Craigslist doesn't have structured data now, that seems like a
 verifiably false claim. :-)

 In fact, Craigslist is a great example. Given how hostile Craigslist has
 been to people reusing their data, and how unstructured their page is now,
 what reason do we have to believe that they would ever make their data
 accessible using RDFa? (Or any other metadata system in fact.)

I don't really understand why there is any debate about the utility of
metadata in general. Are you also against microformats? Title
elements? The meta element?

It seems obvious to me that a) metadata has been a huge success on the
web (the success of other techniques like NLP and PageRank
notwithstanding) and b) we haven't yet invented every metadata tag we
need. I think it is worthwhile to debate whether RDFa is the right
solution but do we really want to go back to a debate over whether
metadata is valuable or not?

This is useful stuff, right?

http://googlemapsapi.blogspot.com/2007/06/microformats-in-google-maps.html
http://greasemonkey.makedatamakesense.com/google_hcalendar/

 Paul Prescod


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Paul Prescod
On Thu, Aug 28, 2008 at 2:28 AM, Ian Hickson [EMAIL PROTECTED] wrote:
...

 Site-specific hacks don't scale to the Web. A solution that scales will
 require a single parser, not site-specific parsers (though site-specific
 parsers will likely be a transition path.)

 To scale to the whole Web, the only thing I can see working is the
 computers understanding human language. I just don't see the whole Web
 marking up their data using fine grained semantic markup. We have enough
 trouble getting them to use h1 and p.

When did it become necessary for every new HTML element to be used by
every author of every web page on the web? A huge amount of browsing
time is spent on the top hundred web sites. If they do it right, it
will filter down. If it doesn't the web is still a better place than
if those top hundred sites did not use standards for representing
metadata.

 I think (some hip) sites will totally plug in, just as they already have,
 using site-specific scripts that can be downloaded by the users of those
 sites. I think a few will use simple domain-specific fine grained markup
 conventions (like Microformats); I think fewer still, possibly many but
 likely not a critical mass, will use RDF and RDFa.

Why would hip sites prefer site-specific scripts to standard
markups, standard scripts and/or browser features? Is it really
logical for each of the top sites to invent their own markup and
scripts rather than cooperate on common tools?

 ...
 I don't see that tools like Ubiquity give any incentive to use RDF. The
 immediate reward from a hard-coded site-specific script is more effective
 than the compound reward of writing a generic script (typically a harder
 task), convincing at least one site to rewrite its markup to use a
 suitable convention, and then debugging the script to work around the bugs
 that that site has, even if one eventually convinces multiple sites to
 support the same conventions.

Good point. It turns out that we don't need standards bodies at all.
It is also easier *at first* for every site to write their own vector
markup or stylesheet language. It is even easier to invent your own
networking protocol than to get one standardized. (after all, you must
invent it before you can get it standardized)

I don't see why you believe that metadata is uniquely immune to the
forces of standardization.

 This mirrors what happens today (e.g. GMail and other big
 sites have contacts APIs, a small number of sites have hCard, a very
 few have FOAF).

HTML has no standard mechanism for embedding contacts. hCard is a sort
of de facto mechanism. Given how long it takes Web standards to work
their way through the ecosystem, I think it's doing okay. Google
supports it on some key sites. Yahoo supports it on some as well. Does
it really need to be supported on Bob's Hockey Team site in order to
be a success? It should be available and accessible to Bob if he wants
the feature, but if not, that's cool too. Javascript is not necessary
for every site out there either.

 Paul Prescod


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Kristof Zelechovski
I thought the standard mechanism for embedding contacts is
OBJECT[type=text/vcard].
Chris

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Paul Prescod
Sent: Thursday, August 28, 2008 2:51 PM
To: Ian Hickson
Cc: Ben Adida; WHAT-WG
Subject: Re: [whatwg] Creative Commons Rights Expression Language

HTML has no standard mechanism for embedding contacts. hCard is a sort
of de facto mechanism. Given how long it takes Web standards to work
their way through the ecosystem, I think it's doing okay. Google
supports it on some key sites. Yahoo supports it on some as well. Does
it really need to be supported on Bob's Hockey Team site in order to
be a success? It should be available and accessible to Bob if he wants
the feature, but if not, that's cool too. Javascript is not necessary
for every site out there either.

 Paul Prescod



Re: [whatwg] Creative Commons Rights Expression Language

2008-08-28 Thread Ben Adida
Ian Hickson wrote:
 Did you listen to the video? It clearly states that they wrote a 
 specific hack for Craigslist, but that they expect this to work more 
 generically.
 
 Sure, I'm just debating needs. It is possible to do it without 
 structured data, indeed the flagship example here doesn't have any.

The video clearly states that they have a site-specific hack for now,
and how it would be better if they could instead parse something like
microformats.

It sounds like you're saying it's not already deployed everywhere, so
we don't need to deploy it.

We're trying to put together the pieces to make it more easily deployable!

 To scale to the whole Web, the only thing I can see working is the 
 computers understanding human language. I just don't see the whole Web 
 marking up their data using fine grained semantic markup. We have enough 
 trouble getting them to use h1 and p.

As Paul said well, I don't think the feature needs to be used by
everyone, no even close. How many publishers will really know how to use
the browser SQL? I'd say in the end, the potential # of publishers is
lower for browser SQL, because you need serious tech chops to make that
work, whereas RDFa is as easy as copying and pasting a chunk of HTML
that someone (like CC) gives you into your web page.

(Total number of *end-users* will surely be higher for SQL, given the
reach of gmail and Google in general, but you keep referring to
difficulty for the *publisher*, so it's important to point out how
difficult it's going to be to get offline+browser-SQL working for the
average publisher, especially compared to markup like RDFa which
typically requires just modifying a JSP/ASP/etc... template.)

 Examine the markup of this page (which I originally stumbled across a few 
 months ago, but which was updated just yesterday):
 
http://puysl.com/view.htm

And by that reasoning, I think there are a lot of other HTML5 features
you need to kill, starting with browser SQL.

 Not everyone, no. Some, many even, will get the religion and mark up their 
 data in useful ways. But I don't see any evidence to suggest that a 
 critical mass will do so.

As I mentioned above, if you're talking about *publishers*, I think many
more will find RDFa useful before they find SQL-in-the-browser useful,
especially with client-side tools like Ubiquity.

 I absolutely see the value.

Okay, I think that's major progress: we agree that there's value :)

 I would absolutely love for the Semantic Web 
 vision to be the future. However, just because I want it to come true 
 doesn't mean it will come true.

How about letting it happen with a well-thought-out plan that tries to
grow semantics out of the existing Web, and seeing if it does succeed?
The cost is minimal, a number of publishers are interested, and the
tools are easy to build (9 implementations of RDFa parsers already, full
test suite, attribute-focused implementation, etc...)

 It fundamentally relies on humans acting 
 in a way that we _know_ they don't.

That's a false comparison. You're going back to the argument that there
is no user incentive or feedback for users to produce structured data.
But I just gave you two very high-profile examples: Ubiquity and
SearchMonkey. Both of those provide strong user incentive to play in the
structured data space, as long as that space is generic enough for small
publishers to hook in. Same tool, many publishers.

 We can't just ignore 18 years

18 years where we didn't have well thought-out metadata schemes for the
web, nor the client-side programmability of Firefox to stitch things
together. This is not the same old thing.

 I think (some hip) sites will totally plug in, just as they already have, 
 using site-specific scripts that can be downloaded by the users of those 
 sites. I think a few will use simple domain-specific fine grained markup 
 conventions (like Microformats); I think fewer still, possibly many but 
 likely not a critical mass, will use RDF and RDFa.

So you continue to confuse *publishers* and *end-users*. If you're
arguing that a small number of publishers means the feature shouldn't be
used, then you've got a number of features in HTML5 that need killing (SQL.)

 This mirrors what happens today (e.g. GMail and other big sites have 
 contacts APIs, a small number of sites have hCard, a very few have FOAF).

What happens today is limited by what's allowed in HTML. Your argument
is circular. We'd like RDFa to validate so people can feel more
comfortable adding it to their production sites.

 I don't see that tools like Ubiquity give any incentive to use RDF. The 
 immediate reward from a hard-coded site-specific script is more effective 
 than the compound reward of writing a generic script (typically a harder 
 task), convincing at least one site to rewrite its markup to use a 
 suitable convention, and then debugging the script to work around the bugs 
 that that site has, even if one eventually convinces multiple sites to 
 support the 

Re: [whatwg] Creative Commons Rights Expression Language

2008-08-27 Thread Ian Hickson
On Wed, 27 Aug 2008, Ben Adida wrote:
 
 Consider specifically the Craigslist example, where the user selects a 
 few of the apartments and says map these.
 
 Clearly, and as the voice-over states, the site needs embedded metadata 
 that easily connects what the user is pointing to to the structured 
 data required for mapping.

Since Craigslist doesn't have structured data now, that seems like a 
verifiably false claim. :-)

In fact, Craigslist is a great example. Given how hostile Craigslist has 
been to people reusing their data, and how unstructured their page is now, 
what reason do we have to believe that they would ever make their data 
accessible using RDFa? (Or any other metadata system in fact.)


(I fully intend to reply to the rest of the e-mails sent on the topic of 
RDFa in due course, by the way; unfortunately it is not my top priority 
right now and so I can only spend so much time on it each day. You can 
track what feedback is outstanding on the topic here:

   http://www.whatwg.org/issues/#rdfa

All those e-mails will get a reply from me in due course. Sorry for the 
delay.)

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-25 Thread Ben Adida

Ian,

Thanks for the details.

Some questions below.

Ian Hickson wrote:
 The Database stuff was mostly driven by requests from large Web 
 application authors (including, for example, GMail), who wanted to be able 
 to offer their services even while their users were offline.

I am quite favorable to the SQL DB in the browser approach, in that I
think it is an enabler of many great things.

However, it's pretty clear that this need comes from technically capable
folks, not from the bulk of users, right? The need from the bulk of
users is, at best, I'd like to access my email offline.

So then, why is IMAP insufficient? And why SQL? Why not something a
little simpler?

I ask these questions because there's a parallel to RDFa here. A number
of web publishers / application authors want to use RDFa, because they
know it will enable many new applications. End-users likely won't know
much about RDFa, nor should they. They'll just know that suddenly their
browsers can recall articles similarly tagged in their history, that
search engines like Yahoo's SearchMonkey can surface significant useful
information directly in their search results, that reusing CC-licensed
works is now much easier with automated attribution, etc...

Pick any single application here, and you could come up with an easier
alternative than RDFa. But putting them together, you need something
more generic. The SQL database of interoperable web data, in a sense.
And that's where RDF (and thus RDFa) comes in.

So my question is: what would it take to convince you that we need
something more generic than the one-off solutions you and others have
been suggesting?

How did the gmail proposal become we need more than just IMAP, we need
the generic SQL DB, and we will change the way the web browser works
forever by enabling offline access.

Again, I like the SQL-DB idea, I'm not arguing against it, I'm just
wondering how it got through your stringent process. And I note that
RDFa is a much more modest proposal that requires almost no work for
browser implementors.

 Consider it from our side. How would you feel if you asked a question and 
 I told you the answer was somewhere in the HTML5 spec?

Not quite the same thing, ccREL is the complete reasoning for this
particular problem, from one party (the equivalent of gmail to
SQL-in-browser).

 We have to address problems that people know they have, or would agree 
 they have if told they had them, because people won't spend any effort to 
 address problems they don't think they have.

So, just to be clear, how does that link up with SQL-in-browser? When
you say people, do you mean web publishers / application builders?

 The word problem doesn't appear once in the ccREL paper. Where is the 
 statement of what ccREL is trying to solve?

Well, the exact word doesn't have to appear, does it? Here are the first
two sentences of the introduction:

This paper introduces the Creative Commons Rights Expression
Language (ccREL), the standard recommended by Creative Commons
(CC) for machine-readable expression of copyright licensing terms
and related information. ccREL and its description in this
paper supersede all previous Creative Commons recommendations for
expressing licensing metadata.

From this it's pretty clear that we're trying to express copyright
licensing information (with all of the sub-fields it implies and all the
possible data types we might license) in machine-readable form.

 But I'm more concerned about RDFa, since presumably if we addressed the 
 problems of RDFa, ccREL would be automatically resolved.

Sure, although if you want to understand the use case, ccREL is fairly
important.

 The ccREL paper is long, wordy, and doesn't really seem to clearly state 
 the answers to the questions I listed above.

Interestingly, ccREL has been extremely well received in the
non-technical space. But I guess you can't please all the people all the
time :)

 I'm really just looking for a 
 simple one-page answer.

A one-page answer? That's only possible if you're willing to accept
premises like RDF is a good way to express interoperable data on the web.

Imagine trying to convince someone about SQL-in-browser when that
someone doesn't believe that SQL is the right approach, rather that it
should be XML object and XPath. Can you do that in one page?

So, if you're willing to start with RDF is a good idea for
interoperable web data, then we can probably put together a short
proposal. But without a baseline, you're sending me on a fool's errand.

-Ben


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-25 Thread Ian Hickson
On Mon, 25 Aug 2008, Ben Adida wrote:
 Ian Hickson wrote:
  The Database stuff was mostly driven by requests from large Web 
  application authors (including, for example, GMail), who wanted to be 
  able to offer their services even while their users were offline.
 
 However, it's pretty clear that this need comes from technically capable 
 folks, not from the bulk of users, right? The need from the bulk of 
 users is, at best, I'd like to access my email offline.

Right, it came from a group of folk who clearly described their problem 
(you can't browser GMail offline, you can't use Google Reader offline, 
every time you query the user's data in the server-side database, you have 
to do a network round-trip) and their requirements (e.g. has to work for 
Web apps, has to work for a variety of application types, has to work 
offline, has to be able to support full-text search, has to be able to 
support structured data, has to be synchronisable).

I have no idea what problem RDFa is trying to solve. I have no idea what 
the requirements are.

If you want this seriously considered for HTML5, please write a clear and 
concise e-mail that explains what the needs are.


 So my question is: what would it take to convince you that we need 
 something more generic than the one-off solutions you and others have 
 been suggesting?

I have no idea what problem you're trying to solve, so it's hard for me to 
answer this question.


  We have to address problems that people know they have, or would agree 
  they have if told they had them, because people won't spend any effort 
  to address problems they don't think they have.
 
 So, just to be clear, how does that link up with SQL-in-browser? When 
 you say people, do you mean web publishers / application builders?

Users. One of the problems, for instance, was that they could not access 
their GMail while offline.


  The word problem doesn't appear once in the ccREL paper. Where is 
  the statement of what ccREL is trying to solve?
 
 Well, the exact word doesn't have to appear, does it? Here are the first 
 two sentences of the introduction:
 
 This paper introduces the Creative Commons Rights Expression Language 
 (ccREL), the standard recommended by Creative Commons (CC) for 
 machine-readable expression of copyright licensing terms and related 
 information. ccREL and its description in this paper supersede all 
 previous Creative Commons recommendations for expressing licensing 
 metadata.

That's not a problem statement, sorry. It's a description of what it does, 
but it doesn't say why anyone needs that.


 From this it's pretty clear that we're trying to express copyright 
 licensing information (with all of the sub-fields it implies and all the 
 possible data types we might license) in machine-readable form.

I know _what_ you're trying to do, it's _why_ you're trying to do it that 
matters.


 A one-page answer? That's only possible if you're willing to accept 
 premises like RDF is a good way to express interoperable data on the 
 web.

The one-page answer should be explaining why you need to express 
interoperable data on the Web in the first place. It shouldn't even 
mention RDF. RDF is part of a proposed solution, it's not part of the 
problem.


 Imagine trying to convince someone about SQL-in-browser when that 
 someone doesn't believe that SQL is the right approach, rather that it 
 should be XML object and XPath. Can you do that in one page?

The point isn't to convince me about the solution. The point is to 
convince me that there is a problem at all, so that we can consider what 
solutions might exist.


 But without a baseline, you're sending me on a fool's errand.

I'm trying my best to explain to you how you can get somewhere here. This 
is a good faith effort at trying to help. If you think my advice is 
somehow intended to waste your time then I can't help you.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-25 Thread Julian Reschke

Ian Hickson wrote:

...
So, just to be clear, how does that link up with SQL-in-browser? When 
you say people, do you mean web publishers / application builders?


Users. One of the problems, for instance, was that they could not access 
their GMail while offline.

...


So, out of curiosity, where did the requirement to be SQL-based come 
from? Were different technologies ever considered? Such as XML, triple 
stores or JCR?


BR, Julian


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-25 Thread Ian Hickson
On Mon, 25 Aug 2008, Julian Reschke wrote:
 Ian Hickson wrote:
  ...
   So, just to be clear, how does that link up with SQL-in-browser? When you
   say people, do you mean web publishers / application builders?
  
  Users. One of the problems, for instance, was that they could not access
  their GMail while offline.
  ...
 
 So, out of curiosity, where did the requirement to be SQL-based come 
 from? Were different technologies ever considered? Such as XML, triple 
 stores or JCR?

Using table storage with a SQL query front-end wasn't a requirement, it 
was a solution, just like using XML, using triple stores, etc could have 
been. It just happened that SQL solved the problem better. For example, 
one of the use cases was the ability to easily use the same kind of data 
model as was being used server-side. We actually tried exposing an XML 
front-end at one point, but implementors didn't want to implement it (see 
the old API for what was called globalStorage at the time).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-23 Thread Ben Adida

[I've been asked to bring this back to the WHATWG list, so I'm doing so
now. For folks who want to look at the beginning of this thread on
www-archive, it begins here:

http://lists.w3.org/Archives/Public/www-archive/2008Aug/0024.html

]

Kristof Zelechovski wrote:
 Forcing metadata into content is an incompatible modification.

So, that would squarely contradict Ian's point that we can already ado
this with existing HTML extensibility.

But let's dig in for a second. Incompatible with what? What principle of
HTML or existing feature of HTML would be broken by adding metadata into
content? (Not to mention that Julian is right, the distinction between
metadata and data is often irrelevant.)

Also, does that mean microformats go against the principle of HTML?
After all, they include calendar event markup in the HTML body.

-Ben


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-23 Thread Ben Adida
Ian Hickson wrote:
 On Fri, 22 Aug 2008, Ben Adida wrote:
 cc:attributionName, cc:attributionURL, dc:title, dc:type, dc:date, 
 
 Notice how these are so unique already that you didn't have to give their 
 full names, these short names were enough for everyone to know what you 
 were talking about without risk of clashes.

So, you're looking at the web purely as humans browsing web sites?

I think Dan Brickley described it well, so I'll just point to his answer
and say I agree with it 100%:


Actually we can do a fair bit more than simply have human readable
strings. For example from the CC case, we've got a sub-property
relationship between cc:license and dc:license. RDF often (more often,
even) has relationships amongst classes too, and between classes and
properties. So for example, the SIOC vocabulary defines a class
sioc:User as a subclass of foaf:OnlineAccount; this is mechanically
evident from http://rdfs.org/sioc/ns#

http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2008-August/015933.html

The idea here is to begin to build a web of data, but to do so by simply
sprinkling it a bit of metadata to the existing HTML.

For the web of data to be useful, some amount of automated data
processing has to be possible. We're not simply trying to do
spreadsheets in HTML. Data field names mean things, they can be related
to other field names, etc...

-Ben


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-22 Thread Henri Sivonen

On Aug 21, 2008, at 21:53, Ben Adida wrote:
Not to mention that our design approach was specifically tailored to  
be HTML5-friendly.



It really isn't HTML5-friendly, since it depends on the namespace  
mapping context at a node.



Henri Sivonen writes:

and those additions use a Namespace-dependent
anti-pattern, so they aren't portable to HTML.


Namespaces are an anti-pattern, really? Says who?


The anti-pattern I was referring to was qnames-in-content. (But, I'm  
not saying that Namespaces in XML were not themselves an anti- 
pattern. :-)



The web is inherently
namespaced. Everything you go to is scoped to a URL prefix. There  
isn't

one Paris or one New York, there is wikipedia/paris, and
nyc.gov/NewYork.


At least in the case of New York, the settlers had the good sense to  
choose a short disambiguating prefix instead of thinking they were off  
in a different default namespace like Texas and free to reuse local  
names causing problems with global map search usability later.



So is it the : that bothers you? Is that really relevant?


It's not the colon per se, although now that XML and HTML do DOM-wise  
different things with the colon, the colon is trouble for element and  
attribute names.


Here's what bothers me about namespaces:
 1) I need write namespaces URIs several times a day, but the URIs  
aren't memorable. Mistyping an NS URI would waste even more time as  
bugs than looking URIs up for copying and pasting, so I look them up  
for copying and pasting, and it's a huge waste of time.

 2) The indirection layer from prefix to URI confuses people.
 3) Namespaces not inheriting to attributes confuses people. (I have  
had to give a crash course in how namespaces work on W3C telecons and  
f2f meetings! Others have had to do it as well. This point is so  
confusing that people whose job is working on Web specs get it wrong.  
I've been told about a professor teaching a class about XML who got it  
wrong.)
 4) Instead of comparing names against a string literals, you have to  
compare two datums against two literals. That is, instead of doing  
foo-bar.equals(name), you have to do http://www.example.com/2008/08/namespace# 
.equals(uri)  bar.equals(localName).
 5) Removing uri,local pairs from XML parsing context makes it hard  
to write the full name in a compact form. Witness the NSResolver  
complications with XPath and Selectors DOM APIs.
 6) That the prefix is semantically not important confuses people who  
go and write uninteroperable software thinking that they should be  
comparing the prefix instead of the URI.
 7) The design of namespaces considers parsing. It doesn't consider  
serialization. Writing an XML serializer that doesn't suck isn't  
trivial, and one will spend most of the development time on dealing  
with Namespaces. (The prefixes aren't important but people still have  
aesthetic opinions about how they should be generated...)
 8) Namespaces dropped the HTML ball a decade ago letting the HTML  
and XML DOMs diverge.
 9) Namespaces stuff their syntax into attributes as opposed to  
having syntax on their own meaning that certain magic attribute names  
need blacklisting both in parsing and in serialization.
10) Namespaces slow down parsing. (By over 20% with Xerces-J and the  
Wikipedia front page!)
11) I've spent *a lot* of time writing code that is Namespace-wise  
excruciatingly correct. Yet, Namespaces have never actually solved a  
problem for me. My software developer friends complain to me about how  
Namespaces cause them grief. No one can remember Namespaces solving a  
real problem. It's like feeding a white elephant.


Qnames in content have further problems: They complicate APIs and the  
application layer when the mapping context needs to leak to the  
application instead of being a parser-internal thing. Under scripted  
DOM scenarios, there's the issue of the mapping context not getting  
captured at node creation time thereby making the meaning of qnames  
brittle under tree mutations. Finally, serializing XML that *may* have  
qnames in content without the serializer knowing which values are  
qnames (i.e. writing a generic serializer) is complex. (See also the  
TAG finding about problems with digital signatures.)



Just look at what microformats are forced to do, which is effectively
re-inventing ad-hoc namespaces with - separators.


That's different. When the prefixes are fixed and go inside a name  
token without an indirection layer of without the name becoming a  
tuple, that's fine. You can still do foo-bar.equals(name).



The namespaces are bad argument is the most mind-boggling web-tech
meme I've seen in a while.


It's Namespaces in XML that are bad--not *necessarily* lower-case 'n'  
namespaces. Also, qname-in-content are even worse than just Namespaces  
in XML.



making them to identify which CC
license they mean, making them understand what permissions they are
giving irrevocably to others upon granting a license and 

Re: [whatwg] Creative Commons Rights Expression Language

2008-08-22 Thread Dan Brickley

Bonner, Matt wrote:

On Wed, Aug 20, 2008 at 5:22 PM, Bonner, Matt wrote:

Hola,

I see that the Creative Commons has proposed additions to HTML
to support licenses (ccREL):
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/
 ...



Tab Atkins Jr. replied:
The whole thing would be best expressed as a microformat, as the
entire thing can be made just as machine- and human-readable without
having to introduce an entire new addition to html.  I think someone
is a little confused about the important of CC...


then Dan Brickley wrote:

I encourage you to (re)-read
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/ ... the spec
explains that all of CC's concrete markup requirements are addressed
by the HTML additions in the RDFa spec. It does not propose *any* new
HTML markup to address CC's specific needs. 


(big snip)


In other words, adding 'about', 'property', 'resource', 'datatype' and
'typeof' and a namespace-URI association convention to HTML5 ...


Just so I understand you, are you saying that attributes aren't markup?
Because first you say no new markup, then you list 5 attributes to add.


Ah, sorry for the unclarity. Attributes are markup. The sentence comes 
as a whole: I meant that ccREL proposes no new *CC-specific* attributes 
or elements. They get their job done using general RDFa markup.


Second, the Introduction cites RDFa, which footnote 4 describes as an 
emerging collection of attributes and processing rules for extending 
XHTML to support RDF.  However, the Introduction text and example go

on to talk about HTML.  Independent of any other discussions, I think it
behooves the authors to clarify their intent. Is this for XHTML, HTML or 
both?


Yes, this could be clearer. The group's general line (Ben feel free to 
correct me) is that this attribute-driven markup style is intended to be 
largely neutral of its 'carrier' format, but that RDFa-in-XHTML is the 
only version that is fully specified with implementor tests etc 
underway. For this markup to work in other XML languages would require 
some more work; for it to be deployed in non-XML HTML (HTML5 etc) 
requires even more. But the general notion is that these attributes 
could be deployed in SVG-based, HTML5/6-based etc. languages too, ie. 
that this isn't a project tightly bound to (some specific version of) 
XHTML. Of course in a non-XML context, some other mechanism is needed 
(eg. link rels) to associate abbreviations with URLs.


Also in http://www.w3.org/TR/rdfa-syntax/ (now in CR at W3C, 
http://www.w3.org/TR/2008/CR-rdfa-syntax-20080620/)

[[
RDFa is a specification for attributes to be used with languages such as 
HTML and XHTML to express structured data. [...] This document only 
specifies the use of the RDFa attributes with XHTML.

]]

Does that help?

cheers

Dan

--
http://danbri.org/


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-22 Thread Kristof Zelechovski
12. DOCTYPE declarations have to use prefixes where the corresponding
namespaces are yet undeclared.  The same problem affects external CSS.  This
effectively fixes the prefixes, making the redirection to the URL redundant.
Chris

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Henri Sivonen
Sent: Friday, August 22, 2008 9:51 AM
To: Ben Adida
Cc: Tab Atkins Jr.; WHAT-WG; [EMAIL PROTECTED]; Dan Brickley; Bonner,
Matt; Ian Hickson
Subject: Re: [whatwg] Creative Commons Rights Expression Language

Here's what bothers me about namespaces:
  1) I need write namespaces URIs several times a day, but the URIs  
aren't memorable. Mistyping an NS URI would waste even more time as  
bugs than looking URIs up for copying and pasting, so I look them up  
for copying and pasting, and it's a huge waste of time.
  2) The indirection layer from prefix to URI confuses people.
  3) Namespaces not inheriting to attributes confuses people. (I have  
had to give a crash course in how namespaces work on W3C telecons and  
f2f meetings! Others have had to do it as well. This point is so  
confusing that people whose job is working on Web specs get it wrong.  
I've been told about a professor teaching a class about XML who got it  
wrong.)
  4) Instead of comparing names against a string literals, you have to  
compare two datums against two literals. That is, instead of doing  
foo-bar.equals(name), you have to do
http://www.example.com/2008/08/namespace# 
.equals(uri)  bar.equals(localName).
  5) Removing uri,local pairs from XML parsing context makes it hard  
to write the full name in a compact form. Witness the NSResolver  
complications with XPath and Selectors DOM APIs.
  6) That the prefix is semantically not important confuses people who  
go and write uninteroperable software thinking that they should be  
comparing the prefix instead of the URI.
  7) The design of namespaces considers parsing. It doesn't consider  
serialization. Writing an XML serializer that doesn't suck isn't  
trivial, and one will spend most of the development time on dealing  
with Namespaces. (The prefixes aren't important but people still have  
aesthetic opinions about how they should be generated...)
  8) Namespaces dropped the HTML ball a decade ago letting the HTML  
and XML DOMs diverge.
  9) Namespaces stuff their syntax into attributes as opposed to  
having syntax on their own meaning that certain magic attribute names  
need blacklisting both in parsing and in serialization.
10) Namespaces slow down parsing. (By over 20% with Xerces-J and the  
Wikipedia front page!)
11) I've spent *a lot* of time writing code that is Namespace-wise  
excruciatingly correct. Yet, Namespaces have never actually solved a  
problem for me. My software developer friends complain to me about how  
Namespaces cause them grief. No one can remember Namespaces solving a  
real problem. It's like feeding a white elephant.





Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Dan Brickley

+cc: Ben Adida

Tab Atkins Jr. wrote:
On Wed, Aug 20, 2008 at 5:22 PM, Bonner, Matt [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Hola,

I see that the Creative Commons has proposed additions to HTML
to support licenses (ccREL):
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/

As an example, they offer:

div about=http://lessig.org/blog/;
xmlns:cc=http://creativecommons.org/ns#;
   This page, by
   a property=cc:attributionName rel=cc:attributionURL
 href=http://lessig.org/;
  Lawrence Lessig
   /a,
   is licensed under a
   a rel=license href=http://creativecommons.org/licenses/by/3.0/;
 Creative Commons Attribution License
   /a.
/div

Unless I missed something in the HTML5 spec, at the least this would add
the property attribute to a.  Wouldn't ccREL be expressed better
using link instead of a?

Matt
--
Matt Bonner
Hewlett-Packard Company


The whole thing would be best expressed as a microformat, as the entire 
thing can be made just as machine- and human-readable without having to 
introduce an entire new addition to html.  I think someone is a little 
confused about the important of CC...


(Note: the someone is not you, Matt, but the drafters of this proposal.  
Also, I love CC as much as the next guy, but there's absolutely no 
reason to extend html to accomodate it, as everything they want to 
express can be done in existing html and formatted as a microformat.)


I encourage you to (re)-read 
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/ ... the spec 
explains that all of CC's concrete markup requirements are addressed by 
the HTML additions in the RDFa spec. It does not propose *any* new HTML 
markup to address CC's specific needs. Instead, they're telling the 
world that CC's needs (including their own requirement for independent 
extensions) are well-handled by RDFa.


RDFa adds a set of attributes; 
http://www.w3.org/MarkUp/2008/ED-rdfa-syntax-20080403/#rdfa-attributes 
has a full list. The ccREL spec shows these in an XHTML+RDFa XHTML 
format. There's a strong case to add them to HTML5 too, in my view.


In other words, adding 'about', 'property', 'resource', 'datatype' and 
'typeof' and a namespace-URI association convention to HTML5 wouldn't 
merely be addressing the important needs of the Creative Commons 
community. It would allow the expression of properties defined by any 
decentralised community, without the need for central coordination. This 
includes not just CC, but every group worldwide who are extending and 
customising CC for their own needs. Not just FOAF, but groups extending 
it for modelling forum posts and social media (eg. SIOC), or opensource 
projects (DOAP). Not just Dublin Core, but the huge range of projects 
that extend it to handle educational metadata (which itself varies 
nationally), rights, aggregation, classification etc. The addition of 
the RDFa attributes would allow HTML5 to carry structured data expressed 
in all/any of these vocabularies.


The Microformats.org community have done wonderful work and have 
inspired many others, but it is unfair on them (and unrealistic) to 
pressure their community, mailing lists and wiki by expecting their 
process to be a central bottleneck for all markup extensions to HTML. 
The Web serves a massive and fast growing community, many of whom don't 
speak English and are whose markup needs aren't core business for 
Microformats.org. By using RDFa and associating each vocabulary with a 
URI, we can spread the workload a bit more evenly.


Note also that every new vocabulary initiative at Microformats.org 
creates real and non-trivial work for parser writers, as well as work 
for vocabulary authors in specifying what it means to mix each pair of 
vocabularies. For ccREL (and FOAF, Dublin Core, SIOC, DOAP, ...), this 
is largely handled by RDF/RDFa: it can be freely mixed with any other 
RDF vocabulary, and reliably parsed by generic parser code. The tradeoff 
here is that the markup is less hand optimised for beauty than with 
microformats. (When extra-pretty custom markup is important, RDF 
provides GRDDL as a way of using XSLT to specify a mapping into its 
common data model.)


For more on RDFa, see the primer, http://www.w3.org/2006/07/SWD/RDFa/primer/

For a microformat parser that also handles RDFa, see 
http://buzzword.org.uk/cognition/   ... or an RDF toolkit that also 
parses some popular microformats, see http://arc.semsol.org/


For RDFa parsing in Javascript, see 
http://www.w3.org/2006/07/SWD/RDFa/impl/js/


cheers,

Dan

ps. my slides from a recent talk on rdf and microformats are here, if 
anyone's interested. It's more about how enthusiasts from each effort 
can learn from each other, than about the technical detail: 
http://www.slideshare.net/danbri/one-big-happy-family/ via 
http://microformats.eventwax.com/vevent



--
http://danbri.org/




Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Ian Hickson
On Thu, 21 Aug 2008, Dan Brickley wrote:
 
 The Microformats.org community have done wonderful work and have 
 inspired many others, but it is unfair on them (and unrealistic) to 
 pressure their community, mailing lists and wiki by expecting their 
 process to be a central bottleneck for all markup extensions to HTML.

I don't think anyone is suggesting that all such ideas should go through 
the Microformats community. What is being suggested is that instead of 
adding more features to HTML, the people who want to annotate their HTML 
documents with metadata, like Creative Commons, merely use some of the 
many existing HTML extension mechanisms, like class=, rel=, etc.

Microformats.org has shown several things; one is that it is important to 
actually make sure the problem you are solving is one that needs solving, 
another is that it is possible to use the existing HTML extension 
mechanisms to mark up very rich semantic data.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Henri Sivonen

On Aug 21, 2008, at 10:49, Dan Brickley wrote:

I encourage you to (re)-read http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/ 
 ... the spec explains that all of CC's concrete markup requirements  
are addressed by the HTML additions in the RDFa spec.


The RDFa spec doesn't make any additions to HTML. It only specifies  
additions to XHTML, and those additions use a Namespace-dependent anti- 
pattern, so they aren't portable to HTML.


In other words, adding 'about', 'property', 'resource', 'datatype'  
and 'typeof' and a namespace-URI association convention to HTML5  
wouldn't merely be addressing the important needs of the Creative  
Commons community.


It seems to me that the Creative Commons community has more pressing  
needs that aren't related to RDF syntax. Specifically: Making people  
to refer to license URI at all, making them to identify which CC  
license they mean, making them understand what permissions they are  
giving irrevocably to others upon granting a license and making them  
understand what licenses used by others mean (NonCommercial, anyone?).  
Syntax doesn't solve any of these.


People don't know what they are doing when they flip those Flickr  
settings:

http://diveintomark.org/archives/2008/02/05/writing-with-ease#comment-11272

At least in a non-RDF context, pointing to the license by URI seems  
too hard. See

http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202660852

Also note that even CC leadership omits the license URI. I encourage  
you to examine the last frames of the videos at http:// 
lessig.blip.tv/. The latest video (http://lessig.blip.tv/file/ 
1185352/) works as an example. Whenever the last frame acknowledges  
the use of CC-licensed photos, it doesn't show the URI of the license.  
In fact, it doesn't even state in words or icons *which* CC license  
the photos were used under!


Getting back to the comment thread on intertwingly.net, a later  
comment contained this gem:

http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202810109
My sarcasm detector isn't quite working, so I can't tell if the  
comment was *meant* to mock RDF, but the follow-up comment is spot on:

http://intertwingly.net/blog/2008/02/09/Mashups-Smashups#c1202870522

It would allow the expression of properties defined by any  
decentralised community, without the need for central coordination.  
This includes not just CC, but every group worldwide who are  
extending and customising CC for their own needs. Not just FOAF, but  
groups extending it for modelling forum posts and social media (eg.  
SIOC), or opensource projects (DOAP). Not just Dublin Core,


Interesting that you mention Dublin Core. It's a great example of why  
it's bad to just rush embedding an RDF vocabulary into HTML without a  
semantic overlap unification process like the Microformats Process.  
Most of the original DC elements duplicate native metadata facilities  
of HTML and HTTP. There will always be more content using HTML title  
than DC title, so consumers will be better off being able to consume  
HTML title. There will always be more consuming apps for HTML  
title than DC title, so publishers will be better off using HTML  
titles.


but the huge range of projects that extend it to handle educational  
metadata (which itself varies nationally), rights, aggregation,  
classification etc. The addition of the RDFa attributes would allow  
HTML5 to carry structured data expressed in all/any of these  
vocabularies.


RDFa including namespace-URI mappings isn't the only possible way to  
accomplish RDF embedding into HTML5. RDFa uses CURIEs which take the  
qnames-in-content anti-pattern and keep digging the hole. I think we  
shouldn't introduce the complexity of Namespaces and qnames-in-content  
to HTML5.


Aside: The TAG has a finding saying that qnames-in-content are  
problematic:

http://www.w3.org/2001/tag/doc/qnameids.html

There's an obvious way how RDFa could have been adjusted to avoid the  
ills of Namespaces and qnames-in-content: using full URIs instead of  
CURIEs. N-Triples demonstrate that RDF triples can be serialized  
without a prefix binding layer.


Even if RDFa were adjusted to use full URIs, there'd still be the  
issue of objections to the additional attributes by people who not  
only think Namespaces are bad but think that embedding RDF in HTML at  
all is bad. I sent an outline of a possible way to route around this  
issue to the HTML WG and xml-dev, but my trial balloon got Warnocked:

http://lists.w3.org/Archives/Public/public-html/2008Aug/0231.html

Note: I'm not suggesting that it would be good for CC to promote  
something as complex as that. I wish CC weren't telling people to use  
RDF with any syntax (or with the NonCommercial license element, but  
that's off-topic here). However, something like the eRDF5 trial  
balloon could work for communities who, unlike CC, aren't trying to  
meet the general public and, therefore, can afford more 

Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Kristof Zelechovski
Can't you just embed your XML metadata in a SCRIPT element?
Chris




Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Kristof Zelechovski
If I understand it correctly, we do not have a problem with the colon as a
namespace separator.  Our problem is that a:x sometimes means the same as
b:x and there is no reasonable way to make legacy browsers support this.
Different URLs, OTOH, are not expected to mean the same thing even if one is
an alias for another.
Chris

-Original Message-
From: Ben Adida [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 21, 2008 8:53 PM
To: Dan Brickley
Cc: Tab Atkins Jr.; Bonner, Matt; WHAT-WG; Ian Hickson; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [whatwg] Creative Commons Rights Expression Language


Namespaces are an anti-pattern, really? Says who? The web is inherently
namespaced. Everything you go to is scoped to a URL prefix. There isn't
one Paris or one New York, there is wikipedia/paris, and
nyc.gov/NewYork. So is it the : that bothers you? Is that really relevant?





Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Kristof Zelechovski
I was trying to explain the rejection of namespaces in general because it is
a general decision.  It is not enough to make sure this particular use case
does not cause problems.
AFAIK, you can make a legacy browser that supports custom elements and
scripting to display a progress bar.  This probably means you partially
right: Lynx, NCSA Mosaic and MacWeb cannot render a progress bar element.
Chris

-Original Message-
From: Ben Adida [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 21, 2008 9:36 PM
To: Kristof Zelechovski
Cc: 'Dan Brickley'; 'Tab Atkins Jr.'; 'Bonner, Matt'; 'WHAT-WG'; 'Ian
Hickson'; [EMAIL PROTECTED]
Subject: Re: [whatwg] Creative Commons Rights Expression Language

Kristof Zelechovski wrote:
 If I understand it correctly, we do not have a problem with the colon as a
 namespace separator.  Our problem is that a:x sometimes means the same
as
 b:x and there is no reasonable way to make legacy browsers support this.

But... legacy browsers have no way to display a Progress Bar either, right?

RDFa does *not* affect how something is rendered. It just tells you what
portions of the page mean what exactly (this is a license, this is a
tag, etc...) So we're okay if legacy browsers don't understand it, they
can simply ignore it. In fact, even new browsers can ignore RDFa,
leaving the job to an extension. But of course, everyone is much better
off if RDFa can be validated in HTML/XHTML.

-Ben



Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Ben Adida
Kristof Zelechovski wrote:
 If I understand it correctly, we do not have a problem with the colon as a
 namespace separator.  Our problem is that a:x sometimes means the same as
 b:x and there is no reasonable way to make legacy browsers support this.

But... legacy browsers have no way to display a Progress Bar either, right?

RDFa does *not* affect how something is rendered. It just tells you what
portions of the page mean what exactly (this is a license, this is a
tag, etc...) So we're okay if legacy browsers don't understand it, they
can simply ignore it. In fact, even new browsers can ignore RDFa,
leaving the job to an extension. But of course, everyone is much better
off if RDFa can be validated in HTML/XHTML.

-Ben



Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Silvia Pfeiffer
Just a little side-track for the video issues around this thread:

On Fri, Aug 22, 2008 at 4:53 AM, Ben Adida [EMAIL PROTECTED] wrote:
 Also note that even CC leadership omits the license URI.

 So you want a URI in the video content itself? What good would that do?

With links directly in the video, copies of the videos will continue
to contain the license, so there is a reason for putting metadata such
as the license inside the video. In fact, RDF inside video would be a
big step forward to deal with the DRM issues around videos.

 With ccREL (and specifically RDFa), the surrounding HTML can easily say
 *this* video is licensed under *that* license.

This is a good solution in the current situation where there is no
standard video and video annotation format. If there was a standard
video annotation format, we could have the video's DOM accessible
directly in the browser and such questions as what is the video's
license could be answered easily directly. I think this may come out
of the new W3C proposed video activity
http://www.w3.org/QA/2008/04/proposed_activity_for_video_on.html.

Regards,
Silvia.


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-21 Thread Ben Adida
Silvia Pfeiffer wrote:
 With links directly in the video, copies of the videos will continue
 to contain the license, so there is a reason for putting metadata such
 as the license inside the video.

Ah yes, in this case I agree: if the metadata were machine-readable,
that would be great. I was talking about the URL not appearing in the
actual human-visible content.

I do hope we see some good standardization on embedded metadata.

-Ben



[whatwg] Creative Commons Rights Expression Language

2008-08-20 Thread Bonner, Matt
Hola,

I see that the Creative Commons has proposed additions to HTML
to support licenses (ccREL): 
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/

As an example, they offer:

div about=http://lessig.org/blog/;
xmlns:cc=http://creativecommons.org/ns#;
This page, by 
a property=cc:attributionName rel=cc:attributionURL
  href=http://lessig.org/;
   Lawrence Lessig
/a,
is licensed under a
a rel=license href=http://creativecommons.org/licenses/by/3.0/;
  Creative Commons Attribution License
/a.
/div

Unless I missed something in the HTML5 spec, at the least this would add
the property attribute to a.  Wouldn't ccREL be expressed better
using link instead of a?

Matt
--
Matt Bonner
Hewlett-Packard Company


smime.p7s
Description: S/MIME cryptographic signature


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-20 Thread Ian Hickson
On Wed, 20 Aug 2008, Tab Atkins Jr. wrote:
 
  Unless I missed something in the HTML5 spec, at the least this would 
  add the property attribute to a.  Wouldn't ccREL be expressed 
  better using link instead of a?
 
 The whole thing would be best expressed as a microformat, as the entire 
 thing can be made just as machine- and human-readable without having to 
 introduce an entire new addition to html.  I think someone is a little 
 confused about the important of CC...
 
 (Note: the someone is not you, Matt, but the drafters of this proposal. 
 Also, I love CC as much as the next guy, but there's absolutely no 
 reason to extend html to accomodate it, as everything they want to 
 express can be done in existing html and formatted as a microformat.)

I tend to agree with Tab here.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] Creative Commons Rights Expression Language

2008-08-20 Thread Karl Dubost


Le 21 août 2008 à 07:22, Bonner, Matt a écrit :

I see that the Creative Commons has proposed additions to HTML
to support licenses (ccREL):
http://www.w3.org/Submission/2008/SUBM-ccREL-20080501/



And as a practical implementation of it. Click at the logo at the  
bottom, and it returns the license with parsed information from the  
initial page.

http://joi.ito.com/weblog/2008/08/06/board-report-fr.html#cc

There is also an  CC License Validator. (in maintenance as of the time  
of this email)

http://validator.creativecommons.org/

--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool








Re: [whatwg] Creative Commons Rights Expression Language

2008-08-20 Thread Bonner, Matt
Ian Hickson wrote:
 On Wed, 20 Aug 2008, Tab Atkins Jr. wrote:
 
 Unless I missed something in the HTML5 spec, at the least this would
 add the property attribute to a.  Wouldn't ccREL be expressed
 better using link instead of a?
 
 The whole thing would be best expressed as a microformat, as the
 entire thing can be made just as machine- and human-readable without
 having to introduce an entire new addition to html.  I think someone
 is a little confused about the important of CC...
 
 (Note: the someone is not you, Matt, but the drafters of this
 proposal. Also, I love CC as much as the next guy, but there's
 absolutely no reason to extend html to accomodate it, as everything
 they want to express can be done in existing html and formatted as a
 microformat.) 
 
 I tend to agree with Tab here.

Thank you both for not calling me confused.  :-)  I have no
problem with using a micro-format.  That might offer better
extensibility to other file types, too.  Just wanted to make
sure HTML people were aware of the proposal so that any needed
response to it would be timely.

best regards,
Matt
-- 
Matt Bonner
Hewlett-Packard Company


smime.p7s
Description: S/MIME cryptographic signature