[whatwg] Trying to work out the problems solved by RDFa
One of the outstanding issues for HTML5 is the question of whether HTML5 should solve the problem that RDFa solves, e.g. by embedding RDFa straight into HTML5, or by some other method. Before I can determine whether we should solve this problem, and before I can evaluate proposals for solving this problem, I need to learn what the problem is. Earlier this year, there was a thread on RDFa on the WHATWG list. Very little of the thread focused on describing the problem. This e-mail is an attempt to work out what the problem is based on that feedback, on discussions at the recent TPAC, and on other research I have done. On Mon, 25 Aug 2008, Manu Sporny wrote: Ian Hickson wrote: I have no idea what problem RDFa is trying to solve. I have no idea what the requirements are. Web browsers currently do not understand the meaning behind human statements or concepts on a web page. If web browsers could understand that a particular page was describing a piece of music, a movie, an event, a person or a product, the browser could then help the user find more information about the particular item in question. It would help automate the browsing experience. Not only would the browsing experience be improved, but search engine indexing quality would be better due to a spider's ability to understand the data on the page with more accuracy. Let's see if I can rephrase that in terms of requirements. * Web browsers should be able to help users find information related to the items that page they are looking at discusses. * Search engines should be able to determine the contents of pages with more accuracy than today. Is that right? Are those the only requirements/problems that RDFa is attempting to address? If not, what other requirements are there? The Microformats community has done a remarkable job of working on the web semantics problem, creating several different methods of expressing common human concepts (contact information (hCard), events (hCalendar), and audio recordings (hAudio)). Right; with Microformats, each Microformat has its own problem space and thus each one can be evaluated separately. It is much harder to evaluate something when the problem space is as generic as it appears RDFa's is. The results of the first set of Microformats efforts were some pretty cool applications, like the following one demonstrating how a web browser could forward event information from your PC web browser to your phone via Bluetooth: http://www.youtube.com/watch?v=azoNnLoJi-4 It's a technically very interesting application. What has the adoption rate been like? How does it compare to other solutions to the problem, like CalDav, iCal, or Microsoft Exchange? Do people publish calendar events much? There are a lot of Web-based calendar systems, like MobileMe or WebCalendar. Do people expose data on their Web page that can be used to import calendar data to these systems? Here is another demonstration of how one could use music metadata embedded in a web page to find more information about your favorite band: http://www.youtube.com/watch?v=oPWNgZ4peuI There are two main demos in that video. The first one shows a way to solve the problem of getting all the sample tracks from a bitmunk page. Here are the steps that the video shows: * Go to the bitmunk Web page. * Notice that the Web page has a music note icon in the location bar. * Click that icon, and then select the album from the drop down menu. * Click the Get Sample button on the auto-generated dialog. Here are the steps that users do today to solve the same problem: * Go to the bitmunk Web page. * Click the Play all samples link. The second demo shows how to solve the problem of getting data out of a poorly written page. However, the example seems contrived; why would an author manage to write accurate RDFa statements but fail so utterly to write a usable Web page otherwise? Also, the example goes on to show how given some RDFa, one can do a custom search on another site without having to type in any search keywords. But that is already possible without RDFa; for example, one can select any text on Mac OS X and search for that string in Google ([Start Wearing Purple] returns a number of hits for lyrics, videos, tabs, etc about the song; [Start Wearing Purple Gogol] returns even more). IE8 has even more detailed features along these lines: select some text and you get an accelerator menu which can be extended to include whatever searches or tools you want to use. So it's not clear that RDFa solves this particular problem better than other existing solutions, and in particular, it is not clear that in the case actually put forwards by that video -- namely, a poorly written page -- that RDFa would be able to solve the problem at all, whereas the other solutions of today would not be hampered by poor markup. or how one could use movie metadata on a web page to find
[whatwg] asynchronous data providers
Hello, As per a discussion with Ian on IRC, several issues jumped out at me when looking over the proposed data provider APIs for the datagrid tag (DataGridDataProvider).: * most of the APIs for providing data are synchronous, implying that the entire data set be local or that systems that want to do something smarter must attempt to block (synchronous XHR, e.g.). In the case of some forms of network request, this may not even be possible (e.g., JSON-P requests for x-domain data). Either assumption (local data or blocking network I/O) poses a challenge to efficiently handling very large data sets. * the data provider does not issue requests for rows as a block. Instead, it passes an individual rowspec to each call of getCellData. This makes it difficult for smart providers to bundle requests for data in a particular range (assuming network I/O). * functions seem to be called to provide the results of editing for a particular data item (editCell(...)), but no event is thrown on the grid to implement custom value editors and it's not clear how to plug into the grid to inform it that editing has finished. * the data provider API expects a real answer about how many children a row may have (getRowCount(row)), but in the case of a deeply nested tree and a lazy-loading data provider, this information isn't likley to be available up-front. These concerns stem from real-world experience with the Dojo Grid component and the abstract data store system (dojo.data) that backs it and allows it to handle tens of thousands of rows efficiently. The design of that system was adapted to these needs by stipulating that: * data providers must always inform grids of how many rows they will show *in total* for a particular query, even if they only return a fraction of those rows at a time. * access to rows be in the form of ranges (start offset and count) inside the # of possible returned items at any level. * to make programming to the system sane, property access (cell value fetching from a particular row) is synchronous * all other operations are asynchronous, based on the Deferred class found in Twisted Python, MochiKit, and Dojo. Such a promise to return data later makes programming to asyncronous systems somewhat easier. Regards
Re: [whatwg] number-related feedback
On Wed, Dec 31, 2008 at 4:59 AM, Ian Hickson i...@hixie.ch wrote: On Wed, 31 Dec 2008, Jonas Sicking wrote: On Wed, Dec 31, 2008 at 3:17 AM, Jonas Sicking jo...@sicking.cc wrote: On Mon, Dec 29, 2008 at 11:37 AM, Ian Hickson i...@hixie.ch wrote: On Fri, 22 Aug 2008, Shannon wrote: Either way I would recommend making a decision on minimum and maximum integer values an using them consistently. If not I can imagine the rapid adoption of 64-bit systems will cause unexpected errors when the same code is run on older 32-bit systems. There are valid arguments for letting each system use its native integer but if this is the case then perhaps the spec should require MIN_INT and MAX_INT be made available as constants. ECMAScript does define a range, and the limits of that range are exposed to scripts. Are there cases where there are non-script limits that would benefit from being exposed? Use cases would be helpful here. I thought ECMAScript defined the value to be a IEEE 754 64bit float. Ah, sorry, I missed that you didn't have a 'not' in your response :) There are in fact interop issues given the fact that ECMAScript allows for a range bigger than a 32bit integer can fit. For example you could do myInput.maxLength = 50; This would is within the bounds and precision of ECMAScript, but won't work in a 32bit integer implementation. WebIDL defines how to handle that, though, right? (Each DOM attribute has an explicit bit width.) The problem, if there is a problem, would be with the content attribute alone. So how would something like input maxlength=50 be parsed? Is it defined in terms of setting the .maxLength DOM attribute, so that its behavior depends on what WebIDL says? Or something else? / Jonas
Re: [whatwg] Spellchecking mark III
On Dec 30, 2008, at 7:20 AM, Kornel Lesiński wrote: On 30.12.2008, at 13:45, Geoffrey Sneddon wrote: I have therefore not added this feature to HTML5 for the time being. If there is more interest in this feature, please speak up. This seems stupid. If I want to have spell-checking, let me. Don't force it off. I don't see any reason to have it forced off, ever. It's useful for fields that contain non-textual content, e.g. product ID, license plate number, CAPTCHA answer, etc. Browser would mark these as misspelt, which might be confusing or at least distracting. It does make sense I guess, that certain fields should not be subject to automatic spellchecking. However, three counterpoints: 1) At least Safari's spellchecking won't mark a word misspelled until you hit a space; fields that contain data which would be flagged by the spellchecker but which are also likely to contain internal whitespace are rare. 2) The proposal Hixie linked seems way overengineered for this purpose. First, it allows spellchecking to be explicitly turned on, potentially overriding normal defaults, but that seems wrong; an input type=email should never spellcheck regardless of the page author says. I can't see any valid use case for the author turning spellchecking on regardless of UA defaults or user preferences. Second, it allows spellchecking to be controlled at a finer granularity than editability, for which again I think there is no valid use case. Both of these aspects make the feature more complicated to implement and harder to understand, compared to just having a way to only disable spellchecking at the same granularity as editing. In general it would be helpful if some of the Google folks who requested this feature and some of the Chrome folks who (apperently) implemented it could explain the actual use cases they had in mind. Regards, Maciej
Re: [whatwg] Spellchecking mark III
On 31.12.2008, at 15:15, Maciej Stachowiak wrote: It does make sense I guess, that certain fields should not be subject to automatic spellchecking. However, three counterpoints: 1) At least Safari's spellchecking won't mark a word misspelled until you hit a space; fields that contain data which would be flagged by the spellchecker but which are also likely to contain internal whitespace are rare. In Webkit spellchecking is also done when field loses focus, so even a single-word fields would be flagged. 2) The proposal Hixie linked seems way overengineered for this purpose. First, it allows spellchecking to be explicitly turned on, potentially overriding normal defaults, but that seems wrong; an input type=email should never spellcheck regardless of the page author says. I can't see any valid use case for the author turning spellchecking on regardless of UA defaults or user preferences. Second, it allows spellchecking to be controlled at a finer granularity than editability, for which again I think there is no valid use case. Both of these aspects make the feature more complicated to implement and harder to understand, compared to just having a way to only disable spellchecking at the same granularity as editing. I don't like current proposal either, because true/false value is inconsistent with other boolean attributes in HTML. IMHO it should be nospellcheck=nospellcheck (which also solves problem of forcing spellchecking where it doesn't make sense). -- regards, Kornel
Re: [whatwg] Spellchecking mark III
On Thu, Jan 1, 2009 at 4:15 AM, Maciej Stachowiak m...@apple.com wrote: 2) The proposal Hixie linked seems way overengineered for this purpose. First, it allows spellchecking to be explicitly turned on, potentially overriding normal defaults, but that seems wrong; an input type=email should never spellcheck regardless of the page author says. I can't see any valid use case for the author turning spellchecking on regardless of UA defaults or user preferences. It allows you to have a region of text where spellchecking is disabled via the spellcheck attribute, but containing subregions where spellchecking is enabled. Second, it allows spellchecking to be controlled at a finer granularity than editability, for which again I think there is no valid use case. Both of these aspects make the feature more complicated to implement and harder to understand, compared to just having a way to only disable spellchecking at the same granularity as editing. A use case is editable program code, where spellchecking is disabled, but where spellchecking is enabled inside comments. Maybe that sounds a little far-fetched for today's Web applications, but some IDEs (e.g. Eclipse) support this so it seems like something we'd want in the future. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Spellchecking mark III
On Wed, Dec 31, 2008 at 3:22 AM, Robert O'Callahan rob...@ocallahan.org wrote: That handles some cases, but not others --- e.g. text boxes that contain program code. I run spell checkers on code blocks. the number of misspellings that could have been avoided by using them they're actually useful for spellcheckers. and for slashdot's really lame captcha they help there too
Re: [whatwg] Spellchecking mark III
2008/12/30 Giovanni Campagna scampa.giova...@gmail.com: maybe we could just say that spellchecking is disabled when type is not text (for email, uri and number you have validation) and when a pattern attribute is specified Personally, if I were to write Gionvanni Campagna into a multiline text field. I'd like it to match the thing that i wrote into the email field (it turns out that I've managed to misspell your name, I'm sorry, but that's the point). So ideally the system which i use to spell check would be able to share information with my contacts and would also enable me to teach it spelling based on the email address fields.
Re: [whatwg] number-related feedback
On Wed, 31 Dec 2008, Jonas Sicking wrote: So how would something like input maxlength=50 be parsed? Is it defined in terms of setting the .maxLength DOM attribute, so that its behavior depends on what WebIDL says? Or something else? The UA would set a limit on the value it accepts for maxlength=, and then cap the result at that, preventing someone from entering more than 4GB (or 2GB, or 4TB, or whatever limit the UA has). Does that answer your question? In practice I would expect other limitations to come into play long before a test for this limit could be triggered. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] number-related feedback
Hi Ian, Jonas. Ian Hickson: The UA would set a limit on the value it accepts for maxlength=, and then cap the result at that, preventing someone from entering more than 4GB (or 2GB, or 4TB, or whatever limit the UA has). Does that answer your question? In practice I would expect other limitations to come into play long before a test for this limit could be triggered. I don’t think it does answer the question, since you need to know what happens if you do: e.setAttribute('maxlength', '50'); alert(e.maxlength) The text currently in the spec isn’t clear: If a reflecting DOM attribute is an unsigned integer type (unsigned long) then, on getting, the content attribute must be parsed according to rules for parsing non-negative integers, and if that is successful, the resulting value must be returned. If, on the other hand, it fails, or if the attribute is absent, the default value must be returned instead, or 0 if there is no default value. The “rules for parsing non-negative integers” algorithm can return any non-negative integer. Web IDL doesn’t define what to do if a spec defines an operation to return a value that is not a member of its return type. I’d classify that as a bug in the description of reflecting DOM attributes. I suggest to reword that paragraph to something like the following: If a reflecting DOM attribute is an unsigned integer type (unsigned long) then, on getting, the content attribute must be parsed according to rules for parsing non-negative integers, and if that successfully returns a value in the range of an unsigned long, that resulting value must be returned. If, on the other hand, it fails, returns an out of range value, or if the attribute is absent, the default value must be returned instead, or 0 if there is no default value. Similar wording would be needed for other paragraphs in this section. -- Cameron McCormack ≝ http://mcc.id.au/
Re: [whatwg] Spellchecking mark III
On Dec 31, 2008, at 12:26 PM, Robert O'Callahan wrote: On Thu, Jan 1, 2009 at 4:15 AM, Maciej Stachowiak m...@apple.com wrote: 2) The proposal Hixie linked seems way overengineered for this purpose. First, it allows spellchecking to be explicitly turned on, potentially overriding normal defaults, but that seems wrong; an input type=email should never spellcheck regardless of the page author says. I can't see any valid use case for the author turning spellchecking on regardless of UA defaults or user preferences. It allows you to have a region of text where spellchecking is disabled via the spellcheck attribute, but containing subregions where spellchecking is enabled. It seems to me you would have to have a lot of custom code to maintain the boundaries between such regions during editing operations for this to ever work right. Normal text editing would easily lead to text moving across the boundaries. There would have to be strong motivating examples to justify such a hard-to-use feature. Second, it allows spellchecking to be controlled at a finer granularity than editability, for which again I think there is no valid use case. Both of these aspects make the feature more complicated to implement and harder to understand, compared to just having a way to only disable spellchecking at the same granularity as editing. A use case is editable program code, where spellchecking is disabled, but where spellchecking is enabled inside comments. Maybe that sounds a little far-fetched for today's Web applications, but some IDEs (e.g. Eclipse) support this so it seems like something we'd want in the future. This sounds like a pretty ill-conceived feature. It is very common for comments to include code, or fragments of code (such as variable names) mixed with natural language. (I was unable to find any evidence of spellchecking comments in the copy of Eclipse I downloaded, so I can't comment on the details.) Furthermore, other IDEs generally don't attempt to do this, and I can't think of other application categories that would do something similar. So I don't think this makes for a very compelling use case. It's like arguing for a page layout feature based on something only WordPerfect does. Regards, Maciej
Re: [whatwg] Spellchecking mark III
On Thu, Jan 1, 2009 at 2:04 PM, Maciej Stachowiak m...@apple.com wrote: On Dec 31, 2008, at 12:26 PM, Robert O'Callahan wrote: A use case is editable program code, where spellchecking is disabled, but where spellchecking is enabled inside comments. Maybe that sounds a little far-fetched for today's Web applications, but some IDEs (e.g. Eclipse) support this so it seems like something we'd want in the future. This sounds like a pretty ill-conceived feature. It is very common for comments to include code, or fragments of code (such as variable names) mixed with natural language. (I was unable to find any evidence of spellchecking comments in the copy of Eclipse I downloaded, so I can't comment on the details.) OK. It's there, though. Furthermore, other IDEs generally don't attempt to do this, and I can't think of other application categories that would do something similar. Seems to me that an HTML source view with spellchecking of the non-markup text would be useful. For what it's worth, it seemed easy to implement the general spellcheck behaviour in Gecko, once we'd decided to allow any author spellcheck control at all (you seem to have agreed that spellcheck=no is useful). But I really don't feel strongly one way or the other. Peter Kasting or Brett Wilson should speak up. Rob -- He was pierced for our transgressions, he was crushed for our iniquities; the punishment that brought us peace was upon him, and by his wounds we are healed. We all, like sheep, have gone astray, each of us has turned to his own way; and the LORD has laid on him the iniquity of us all. [Isaiah 53:5-6]
Re: [whatwg] Trying to work out the problems solved by RDFa
Summary: I believe that there are use cases for RDFa - and that they are precisely the sort of thing that Yahoo, Google, Ask, and their ilk are not going to be interested in, since they are based on solving problems that those search engines do not efficiently solve, such as (among others) using private data or dealing with trustworthy data to answer very specific questions automatically. If Ian needs to understand the Semantic Web Industry and why people have invested in the RDFa proposal, then it is important to identify the right questions, and having him alone identify the sub-questions when he doesn't understand the issue isn't going to help him make a well-informed decision. Some of Ian's questions are discussed here. I cut the mail short since I think it is already too long for many people, which means that the debate will simply pass without their reading or input. On Wed, 31 Dec 2008 20:46:01 +1100, Ian Hickson i...@hixie.ch wrote: One of the outstanding issues for HTML5 is the question of whether HTML5 should solve the problem that RDFa solves, e.g. by embedding RDFa ... Before I can determine whether we should solve this problem, and before I can evaluate proposals for solving this problem, I need to learn what the problem is. Earlier this year, there was a thread on RDFa on the WHATWG list. Very little of the thread focused on describing the problem. This e-mail is an attempt to work out what the problem is based on that feedback, on discussions at the recent TPAC, and on other research I have done. On Mon, 25 Aug 2008, Manu Sporny wrote: Ian Hickson wrote: I have no idea what problem RDFa is trying to solve. I have no idea what the requirements are. Web browsers currently do not understand the meaning behind human statements or concepts on a web page. If web browsers could understand that a particular page was describing a piece of music, a movie, an event, a person or a product, the browser could then help the user find more information about the particular item in question. It would help automate the browsing experience. Not only would the browsing experience be improved, but search engine indexing quality would be better due to a spider's ability to understand the data on the page with more accuracy. Let's see if I can rephrase that in terms of requirements. * Web browsers should be able to help users find information related to the items that page they are looking at discusses. * Search engines should be able to determine the contents of pages with more accuracy than today. Is that right? Are those the only requirements/problems that RDFa is attempting to address? If not, what other requirements are there? I don't think so. I think there are some other requirements: A standard way to include arbitrary data in a web page and extract it for machine processing, without having to pre-coordinate their data models. Since many people use RDF as an interchange, storage and processing format for this kind of data (because it provides for automated mapping of data from one schema to many others, without requiring anyone to touch the original schemata or agree in advance how they should be created), I believe there is a requirement for a method that allows third parties to include RDF data in, and extract it from information encoded within an HTML page. The Microformats community has done a remarkable job of working on the web semantics problem, creating several different methods of expressing common human concepts (contact information (hCard), events (hCalendar), and audio recordings (hAudio)). Right; with Microformats, each Microformat has its own problem space and thus each one can be evaluated separately. It is much harder to evaluate something when the problem space is as generic as it appears RDFa's is. The point is that there are a very large set of very small problem spaces relevant to a small group at a time. Like RDF itself, RDFa is meeting the problem of allowing these people to share machine-processable data without previously coordinating their approach. The results of the first set of Microformats efforts were some pretty cool applications, like the following one demonstrating how a web browser could forward event information from your PC web browser to your phone via Bluetooth: http://www.youtube.com/watch?v=azoNnLoJi-4 It's a technically very interesting application. What has the adoption rate been like? How does it compare to other solutions to the problem, like CalDav, iCal, or Microsoft Exchange? Do people publish calendar events much? There are a lot of Web-based calendar systems, like MobileMe or WebCalendar. Do people expose data on their Web page that can be used to import calendar data to these systems? In some cases this data is indeed exposed to Webpages. However, anecdotal evidence (which unfortunately is all that is available when trying to study the enormous collections of data