[whatwg] Trying to work out the problems solved by RDFa

2008-12-31 Thread Ian Hickson

One of the outstanding issues for HTML5 is the question of whether HTML5 
should solve the problem that RDFa solves, e.g. by embedding RDFa straight 
into HTML5, or by some other method.

Before I can determine whether we should solve this problem, and before I 
can evaluate proposals for solving this problem, I need to learn what the 
problem is.

Earlier this year, there was a thread on RDFa on the WHATWG list. Very 
little of the thread focused on describing the problem. This e-mail is an 
attempt to work out what the problem is based on that feedback, on 
discussions at the recent TPAC, and on other research I have done.


On Mon, 25 Aug 2008, Manu Sporny wrote:
 Ian Hickson wrote:
  I have no idea what problem RDFa is trying to solve. I have no idea 
  what the requirements are.
 
 Web browsers currently do not understand the meaning behind human 
 statements or concepts on a web page. If web browsers could understand 
 that a particular page was describing a piece of music, a movie, an 
 event, a person or a product, the browser could then help the user find 
 more information about the particular item in question. It would help 
 automate the browsing experience. Not only would the browsing experience 
 be improved, but search engine indexing quality would be better due to a 
 spider's ability to understand the data on the page with more accuracy.

Let's see if I can rephrase that in terms of requirements.

* Web browsers should be able to help users find information related to 
  the items that page they are looking at discusses.

* Search engines should be able to determine the contents of pages with 
  more accuracy than today.

Is that right?

Are those the only requirements/problems that RDFa is attempting to 
address? If not, what other requirements are there?


 The Microformats community has done a remarkable job of working on the 
 web semantics problem, creating several different methods of expressing 
 common human concepts (contact information (hCard), events (hCalendar), 
 and audio recordings (hAudio)).

Right; with Microformats, each Microformat has its own problem space and 
thus each one can be evaluated separately. It is much harder to evaluate 
something when the problem space is as generic as it appears RDFa's is.


 The results of the first set of Microformats efforts were some pretty 
 cool applications, like the following one demonstrating how a web 
 browser could forward event information from your PC web browser to your 
 phone via Bluetooth:
 
 http://www.youtube.com/watch?v=azoNnLoJi-4

It's a technically very interesting application. What has the adoption 
rate been like? How does it compare to other solutions to the problem, 
like CalDav, iCal, or Microsoft Exchange? Do people publish calendar 
events much? There are a lot of Web-based calendar systems, like MobileMe 
or WebCalendar. Do people expose data on their Web page that can be used 
to import calendar data to these systems?


 Here is another demonstration of how one could use music metadata 
 embedded in a web page to find more information about your favorite 
 band:
 
 http://www.youtube.com/watch?v=oPWNgZ4peuI

There are two main demos in that video.

The first one shows a way to solve the problem of getting all the sample 
tracks from a bitmunk page. Here are the steps that the video shows:

 * Go to the bitmunk Web page.
 * Notice that the Web page has a music note icon in the location bar.
 * Click that icon, and then select the album from the drop down menu.
 * Click the Get Sample button on the auto-generated dialog.

Here are the steps that users do today to solve the same problem:

 * Go to the bitmunk Web page.
 * Click the Play all samples link.


The second demo shows how to solve the problem of getting data out of a 
poorly written page. However, the example seems contrived; why would an 
author manage to write accurate RDFa statements but fail so utterly to 
write a usable Web page otherwise?

Also, the example goes on to show how given some RDFa, one can do a custom 
search on another site without having to type in any search keywords. But 
that is already possible without RDFa; for example, one can select any 
text on Mac OS X and search for that string in Google ([Start Wearing 
Purple] returns a number of hits for lyrics, videos, tabs, etc about the 
song; [Start Wearing Purple Gogol] returns even more). IE8 has even more 
detailed features along these lines: select some text and you get an 
accelerator menu which can be extended to include whatever searches or 
tools you want to use.

So it's not clear that RDFa solves this particular problem better than 
other existing solutions, and in particular, it is not clear that in the 
case actually put forwards by that video -- namely, a poorly written page 
-- that RDFa would be able to solve the problem at all, whereas the other 
solutions of today would not be hampered by poor markup.


 or how one could use movie metadata on a web page to find 

[whatwg] asynchronous data providers

2008-12-31 Thread Alex Russell
Hello,

As per a discussion with Ian on IRC, several issues jumped out at me
when looking over the proposed data provider APIs for the datagrid
tag (DataGridDataProvider).:

  * most of the APIs for providing data are synchronous, implying that
the entire data set be local or that systems that want to do something
smarter must attempt to block (synchronous XHR, e.g.). In the case of
some forms of network request, this may not even be possible (e.g.,
JSON-P requests for x-domain data). Either assumption (local data or
blocking network I/O) poses a challenge to efficiently handling very
large data sets.
  * the data provider does not issue requests for rows as a block.
Instead, it passes an individual rowspec to each call of getCellData.
This makes it difficult for smart providers to bundle requests for
data in a particular range (assuming network I/O).
  * functions seem to be called to provide the results of editing for
a particular data item (editCell(...)), but no event is thrown on the
grid to implement custom value editors and it's not clear how to plug
into the grid to inform it that editing has finished.
  * the data provider API expects a real answer about how many
children a row may have (getRowCount(row)), but in the case of a
deeply nested tree and a lazy-loading data provider, this information
isn't likley to be available up-front.

These concerns stem from real-world experience with the Dojo Grid
component and the abstract data store system (dojo.data) that backs it
and allows it to handle tens of thousands of rows efficiently.

The design of that system was adapted to these needs by stipulating that:

 * data providers must always inform grids of how many rows they will
show *in total* for a particular query, even if they only return a
fraction of those rows at a time.
 * access to rows be in the form of ranges (start offset and count)
inside the # of possible returned items at any level.
 * to make programming to the system sane, property access (cell value
fetching from a particular row) is synchronous
 * all other operations are asynchronous, based on the Deferred class
found in Twisted Python, MochiKit, and Dojo. Such a promise to return
data later makes programming to asyncronous systems somewhat easier.

Regards


Re: [whatwg] number-related feedback

2008-12-31 Thread Jonas Sicking
On Wed, Dec 31, 2008 at 4:59 AM, Ian Hickson i...@hixie.ch wrote:
 On Wed, 31 Dec 2008, Jonas Sicking wrote:
 On Wed, Dec 31, 2008 at 3:17 AM, Jonas Sicking jo...@sicking.cc wrote:
  On Mon, Dec 29, 2008 at 11:37 AM, Ian Hickson i...@hixie.ch wrote:
  On Fri, 22 Aug 2008, Shannon wrote:
  Either way I would recommend making a decision on minimum and maximum
  integer values an using them consistently. If not I can imagine the
  rapid adoption of 64-bit systems will cause unexpected errors when the
  same code is run on older 32-bit systems. There are valid arguments for
  letting each system use its native integer but if this is the case then
  perhaps the spec should require MIN_INT and MAX_INT be made available as
  constants.
 
  ECMAScript does define a range, and the limits of that range are exposed
  to scripts. Are there cases where there are non-script limits that would
  benefit from being exposed? Use cases would be helpful here.
 
  I thought ECMAScript defined the value to be a IEEE 754 64bit float.

 Ah, sorry, I missed that you didn't have a 'not' in your response :)

 There are in fact interop issues given the fact that ECMAScript allows
 for a range bigger than a 32bit integer can fit. For example you could
 do

 myInput.maxLength = 50;

 This would is within the bounds and precision of ECMAScript, but won't
 work in a 32bit integer implementation.

 WebIDL defines how to handle that, though, right? (Each DOM attribute has
 an explicit bit width.) The problem, if there is a problem, would be with
 the content attribute alone.

So how would something like

input maxlength=50

be parsed? Is it defined in terms of setting the .maxLength DOM
attribute, so that its behavior depends on what WebIDL says? Or
something else?

/ Jonas


Re: [whatwg] Spellchecking mark III

2008-12-31 Thread Maciej Stachowiak


On Dec 30, 2008, at 7:20 AM, Kornel Lesiński wrote:



On 30.12.2008, at 13:45, Geoffrey Sneddon wrote:


I have therefore not added this feature to HTML5 for the time  
being. If

there is more interest in this feature, please speak up.


This seems stupid. If I want to have spell-checking, let me. Don't  
force it off. I don't see any reason to have it forced off, ever.



It's useful for fields that contain non-textual content, e.g.  
product ID, license plate number, CAPTCHA answer, etc.
Browser would mark these as misspelt, which might be confusing or at  
least distracting.


It does make sense I guess, that certain fields should not be subject  
to automatic spellchecking. However, three counterpoints:


1) At least Safari's spellchecking won't mark a word misspelled until  
you hit a space; fields that contain data which would be flagged by  
the spellchecker but which are also likely to contain internal  
whitespace are rare.


2) The proposal Hixie linked seems way overengineered for this  
purpose. First, it allows spellchecking to be explicitly turned on,  
potentially overriding normal defaults, but that seems wrong; an  
input type=email should never spellcheck regardless of the page  
author says. I can't see any valid use case for the author turning  
spellchecking on regardless of UA defaults or user preferences.  
Second, it allows spellchecking to be controlled at a finer  
granularity than editability, for which again I think there is no  
valid use case. Both of these aspects make the feature more  
complicated to implement and harder to understand, compared to just  
having a way to only disable spellchecking at the same granularity as  
editing.


In general it would be helpful if some of the Google folks who  
requested this feature and some of the Chrome folks who (apperently)  
implemented it could explain the actual use cases they had in mind.


Regards,
Maciej



Re: [whatwg] Spellchecking mark III

2008-12-31 Thread Kornel Lesiński

On 31.12.2008, at 15:15, Maciej Stachowiak wrote:

It does make sense I guess, that certain fields should not be  
subject to automatic spellchecking. However, three counterpoints:


1) At least Safari's spellchecking won't mark a word misspelled  
until you hit a space; fields that contain data which would be  
flagged by the spellchecker but which are also likely to contain  
internal whitespace are rare.


In Webkit spellchecking is also done when field loses focus, so even a  
single-word fields would be flagged.


2) The proposal Hixie linked seems way overengineered for this  
purpose. First, it allows spellchecking to be explicitly turned on,  
potentially overriding normal defaults, but that seems wrong; an  
input type=email should never spellcheck regardless of the page  
author says. I can't see any valid use case for the author turning  
spellchecking on regardless of UA defaults or user preferences.  
Second, it allows spellchecking to be controlled at a finer  
granularity than editability, for which again I think there is no  
valid use case. Both of these aspects make the feature more  
complicated to implement and harder to understand, compared to just  
having a way to only disable spellchecking at the same granularity  
as editing.


I don't like current proposal either, because true/false value is  
inconsistent with other boolean attributes in HTML. IMHO it should be  
nospellcheck=nospellcheck (which also solves problem of forcing  
spellchecking where it doesn't make sense).


--
regards, Kornel





Re: [whatwg] Spellchecking mark III

2008-12-31 Thread Robert O'Callahan
On Thu, Jan 1, 2009 at 4:15 AM, Maciej Stachowiak m...@apple.com wrote:

 2) The proposal Hixie linked seems way overengineered for this purpose.
 First, it allows spellchecking to be explicitly turned on, potentially
 overriding normal defaults, but that seems wrong; an input type=email
 should never spellcheck regardless of the page author says. I can't see any
 valid use case for the author turning spellchecking on regardless of UA
 defaults or user preferences.


It allows you to have a region of text where spellchecking is disabled via
the spellcheck attribute, but containing subregions where spellchecking is
enabled.

Second, it allows spellchecking to be controlled at a finer granularity than
 editability, for which again I think there is no valid use case. Both of
 these aspects make the feature more complicated to implement and harder to
 understand, compared to just having a way to only disable spellchecking at
 the same granularity as editing.


A use case is editable program code, where spellchecking is disabled, but
where spellchecking is enabled inside comments. Maybe that sounds a little
far-fetched for today's Web applications, but some IDEs (e.g. Eclipse)
support this so it seems like something we'd want in the future.

Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Spellchecking mark III

2008-12-31 Thread timeless
On Wed, Dec 31, 2008 at 3:22 AM, Robert O'Callahan rob...@ocallahan.org wrote:
 That handles some cases, but not others --- e.g. text boxes that contain
 program code.

I run spell checkers on code blocks.

the number of misspellings that could have been avoided by using them 

they're actually useful for spellcheckers.

and for slashdot's really lame captcha they help there too


Re: [whatwg] Spellchecking mark III

2008-12-31 Thread timeless
2008/12/30 Giovanni Campagna scampa.giova...@gmail.com:
 maybe we could just say that spellchecking is disabled when type is not text
 (for email, uri and number you have validation) and when a pattern attribute
 is specified

Personally, if I were to write Gionvanni Campagna into a multiline
text field. I'd like it to match the thing that i wrote into the email
field (it turns out that I've managed to misspell your name, I'm
sorry, but that's the point). So ideally the system which i use to
spell check would be able to share information with my contacts and
would also enable me to teach it spelling based on the email address
fields.


Re: [whatwg] number-related feedback

2008-12-31 Thread Ian Hickson
On Wed, 31 Dec 2008, Jonas Sicking wrote:
 
 So how would something like
 
 input maxlength=50
 
 be parsed? Is it defined in terms of setting the .maxLength DOM 
 attribute, so that its behavior depends on what WebIDL says? Or 
 something else?

The UA would set a limit on the value it accepts for maxlength=, and 
then cap the result at that, preventing someone from entering more than 
4GB (or 2GB, or 4TB, or whatever limit the UA has). Does that answer your 
question? In practice I would expect other limitations to come into play 
long before a test for this limit could be triggered.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


Re: [whatwg] number-related feedback

2008-12-31 Thread Cameron McCormack
Hi Ian, Jonas.

Ian Hickson:
 The UA would set a limit on the value it accepts for maxlength=, and 
 then cap the result at that, preventing someone from entering more than 
 4GB (or 2GB, or 4TB, or whatever limit the UA has). Does that answer your 
 question? In practice I would expect other limitations to come into play 
 long before a test for this limit could be triggered.

I don’t think it does answer the question, since you need to know what
happens if you do:

  e.setAttribute('maxlength', '50');
  alert(e.maxlength)

The text currently in the spec isn’t clear:

  If a reflecting DOM attribute is an unsigned integer type (unsigned
  long) then, on getting, the content attribute must be parsed according
  to rules for parsing non-negative integers, and if that is successful,
  the resulting value must be returned. If, on the other hand, it fails,
  or if the attribute is absent, the default value must be returned
  instead, or 0 if there is no default value.

The “rules for parsing non-negative integers” algorithm can return any
non-negative integer.  Web IDL doesn’t define what to do if a spec
defines an operation to return a value that is not a member of its
return type.  I’d classify that as a bug in the description of
reflecting DOM attributes.

I suggest to reword that paragraph to something like the following:

  If a reflecting DOM attribute is an unsigned integer type (unsigned
  long) then, on getting, the content attribute must be parsed according
  to rules for parsing non-negative integers, and if that successfully
  returns a value in the range of an unsigned long, that resulting value
  must be returned. If, on the other hand, it fails, returns an out of
  range value, or if the attribute is absent, the default value must be
  returned instead, or 0 if there is no default value.

Similar wording would be needed for other paragraphs in this section.

-- 
Cameron McCormack ≝ http://mcc.id.au/


Re: [whatwg] Spellchecking mark III

2008-12-31 Thread Maciej Stachowiak


On Dec 31, 2008, at 12:26 PM, Robert O'Callahan wrote:

On Thu, Jan 1, 2009 at 4:15 AM, Maciej Stachowiak m...@apple.com  
wrote:
2) The proposal Hixie linked seems way overengineered for this  
purpose. First, it allows spellchecking to be explicitly turned on,  
potentially overriding normal defaults, but that seems wrong; an  
input type=email should never spellcheck regardless of the page  
author says. I can't see any valid use case for the author turning  
spellchecking on regardless of UA defaults or user preferences.


It allows you to have a region of text where spellchecking is  
disabled via the spellcheck attribute, but containing subregions  
where spellchecking is enabled.


It seems to me you would have to have a lot of custom code to maintain  
the boundaries between such regions during editing operations for this  
to ever work right. Normal text editing would easily lead to text  
moving across the boundaries. There would have to be strong motivating  
examples to justify such a hard-to-use feature.




Second, it allows spellchecking to be controlled at a finer  
granularity than editability, for which again I think there is no  
valid use case. Both of these aspects make the feature more  
complicated to implement and harder to understand, compared to just  
having a way to only disable spellchecking at the same granularity  
as editing.


A use case is editable program code, where spellchecking is  
disabled, but where spellchecking is enabled inside comments. Maybe  
that sounds a little far-fetched for today's Web applications, but  
some IDEs (e.g. Eclipse) support this so it seems like something  
we'd want in the future.



This sounds like a pretty ill-conceived feature. It is very common for  
comments to include code, or fragments of code (such as variable  
names) mixed with natural language. (I was unable to find any evidence  
of spellchecking comments in the copy of Eclipse I downloaded, so I  
can't comment on the details.)


Furthermore, other IDEs generally don't attempt to do this, and I  
can't think of other application categories that would do something  
similar.


So I don't think this makes for a very compelling use case. It's like  
arguing for a page layout feature based on something only WordPerfect  
does.


Regards,
Maciej



Re: [whatwg] Spellchecking mark III

2008-12-31 Thread Robert O'Callahan
On Thu, Jan 1, 2009 at 2:04 PM, Maciej Stachowiak m...@apple.com wrote:

 On Dec 31, 2008, at 12:26 PM, Robert O'Callahan wrote:

 A use case is editable program code, where spellchecking is disabled, but
 where spellchecking is enabled inside comments. Maybe that sounds a little
 far-fetched for today's Web applications, but some IDEs (e.g. Eclipse)
 support this so it seems like something we'd want in the future.

 This sounds like a pretty ill-conceived feature. It is very common for
 comments to include code, or fragments of code (such as variable names)
 mixed with natural language. (I was unable to find any evidence of
 spellchecking comments in the copy of Eclipse I downloaded, so I can't
 comment on the details.)


OK. It's there, though.

Furthermore, other IDEs generally don't attempt to do this, and I can't
 think of other application categories that would do something similar.


Seems to me that an HTML source view with spellchecking of the non-markup
text would be useful.

For what it's worth, it seemed easy to implement the general spellcheck
behaviour in Gecko, once we'd decided to allow any author spellcheck control
at all (you seem to have agreed that spellcheck=no is useful). But I
really don't feel strongly one way or the other. Peter Kasting or Brett
Wilson should speak up.

Rob
-- 
He was pierced for our transgressions, he was crushed for our iniquities;
the punishment that brought us peace was upon him, and by his wounds we are
healed. We all, like sheep, have gone astray, each of us has turned to his
own way; and the LORD has laid on him the iniquity of us all. [Isaiah
53:5-6]


Re: [whatwg] Trying to work out the problems solved by RDFa

2008-12-31 Thread Charles McCathieNevile

Summary:

I believe that there are use cases for RDFa - and that they are precisely  
the sort of thing that Yahoo, Google, Ask, and their ilk are not going to  
be interested in, since they are based on solving problems that those  
search engines do not efficiently solve, such as (among others) using  
private data or dealing with trustworthy data to answer very specific  
questions automatically.


If Ian needs to understand the Semantic Web Industry and why people have  
invested in the RDFa proposal, then it is important to identify the right  
questions, and having him alone identify the sub-questions when he doesn't  
understand the issue isn't going to help him make a well-informed decision.


Some of Ian's questions are discussed here. I cut the mail short since I  
think it is already too long for many people, which means that the debate  
will simply pass without their reading or input.


On Wed, 31 Dec 2008 20:46:01 +1100, Ian Hickson i...@hixie.ch wrote:


One of the outstanding issues for HTML5 is the question of whether HTML5
should solve the problem that RDFa solves, e.g. by embedding RDFa

...

Before I can determine whether we should solve this problem, and before I
can evaluate proposals for solving this problem, I need to learn what the
problem is.

Earlier this year, there was a thread on RDFa on the WHATWG list. Very
little of the thread focused on describing the problem. This e-mail is an
attempt to work out what the problem is based on that feedback, on
discussions at the recent TPAC, and on other research I have done.


On Mon, 25 Aug 2008, Manu Sporny wrote:

Ian Hickson wrote:
 I have no idea what problem RDFa is trying to solve. I have no idea
 what the requirements are.

Web browsers currently do not understand the meaning behind human
statements or concepts on a web page. If web browsers could understand
that a particular page was describing a piece of music, a movie, an
event, a person or a product, the browser could then help the user find
more information about the particular item in question. It would help
automate the browsing experience. Not only would the browsing experience
be improved, but search engine indexing quality would be better due to a
spider's ability to understand the data on the page with more accuracy.


Let's see if I can rephrase that in terms of requirements.

* Web browsers should be able to help users find information related to
  the items that page they are looking at discusses.

* Search engines should be able to determine the contents of pages with
  more accuracy than today.

Is that right?

Are those the only requirements/problems that RDFa is attempting to
address? If not, what other requirements are there?


I don't think so. I think there are some other requirements:

A standard way to include arbitrary data in a web page and extract it for  
machine processing, without having to pre-coordinate their data models.


Since many people use RDF as an interchange, storage and processing format  
for this kind of data (because it provides for automated mapping of data  
from one schema to many others, without requiring anyone to touch the  
original schemata or agree in advance how they should be created), I  
believe there is a requirement for a method that allows third parties to  
include RDF data in, and extract it from information encoded within an  
HTML page.



The Microformats community has done a remarkable job of working on the
web semantics problem, creating several different methods of expressing
common human concepts (contact information (hCard), events (hCalendar),
and audio recordings (hAudio)).


Right; with Microformats, each Microformat has its own problem space and
thus each one can be evaluated separately. It is much harder to evaluate
something when the problem space is as generic as it appears RDFa's is.


The point is that there are a very large set of very small problem spaces  
relevant to a small group at a time. Like RDF itself, RDFa is meeting the  
problem of allowing these people to share machine-processable data without  
previously coordinating their approach.



The results of the first set of Microformats efforts were some pretty
cool applications, like the following one demonstrating how a web
browser could forward event information from your PC web browser to your
phone via Bluetooth:

http://www.youtube.com/watch?v=azoNnLoJi-4


It's a technically very interesting application. What has the adoption
rate been like? How does it compare to other solutions to the problem,
like CalDav, iCal, or Microsoft Exchange? Do people publish calendar
events much? There are a lot of Web-based calendar systems, like MobileMe
or WebCalendar. Do people expose data on their Web page that can be used
to import calendar data to these systems?


In some cases this data is indeed exposed to Webpages. However, anecdotal  
evidence (which unfortunately is all that is available when trying to  
study the enormous collections of data