Re: SpellCheck API?
On Tue, May 10, 2011 at 06:49, Olli Pettay olli.pet...@helsinki.fi wrote: On 05/10/2011 01:44 AM, Aryeh Gregor wrote: On Mon, May 9, 2011 at 3:49 PM, Boris Zbarskybzbar...@mit.edu wrote: This does mean firing tens of thousands of events during load on some pages (e.g. wikipedia article edit pages) Maybe that's not a big deal. If that's too many events, couldn't the browser optimize by not spellchecking words until they scroll into view? I imagine that might not be terribly simple, depending on how the browser is designed, but maybe tens of thousands of events aren't too expensive anyway. I don't know, up to implementers whether it's doable. I'm assuming here that there's effectively no cost if no one's registered a spellcheck handler, so it won't penalize authors who don't use the feature. Just a quick test on Nokia N900 (which is already a bit old mobile phone) using a recent browser: dispatching 1 events to a deep (depth 100) DOM (without listeners for the event - for testing purposes) takes about 3 seconds. If there is a listener, the test takes 4-5s per 1 events. If the DOM is shallow, the test without listeners takes about 1s, and with a listener about 2-3s. This is just one browser engine, but based on my testing on desktop, the differences between browser engines aren't in order of magnitude in this case. On a fast desktop those tests take 50-200ms. So, tens of thousands events doesn't sounds like a fast enough solution for mobile devices, but would be ok for desktop, I think. -Olli On the desktop I wouldn't call that an acceptable solution; requiring 200ms+ just to spell check words on a page? That sounds like a slope where lots of things may become acceptable when they _only_ take x00ms. -- Adam Shannon Web Developer University of Northern Iowa Sophomore -- Computer Science B.S. http://ashannon.us
Re: [widgets] Dig Sig spec
On Mon, May 2, 2011 at 5:25 PM, Marcos Caceres marcosscace...@gmail.com wrote: On Friday, April 29, 2011 at 8:19 PM, frederick.hir...@nokia.com wrote: Marcos I'd suggest you first send an email with the top 10 substantive changes to the list, e.g. which algorithms change from mandatory to optional or optional to mandatory etc, which processing rules you are relaxing, etc this would take less time for you and be much clearer to all. I could only come up with 5... as I previously mentioned, the spec just contained a ton of redundancies (4 pages worth), but the conformance requirements are all pretty much the same. The draft is up at: http://dev.w3.org/2006/waf/widgets-digsig/ As I previously stated, the purpose of this fix up was to make concessions for WAC 1.0 runtimes, which use the default canonicalization algorithm of XML Dig Sig 1.1. Additional changes are based on working with various vendors who implemented the WAC 1 or are working on the WAC 2 specs (including the implementation that was done at Opera). I've made C14N11 the recommended canonicalization method throughout. However, user agents are free to use whatever they want so long as it complies to XML Dig Sig 1.1. 1. Validator and signers are now true [XMLDSIG11] implementations. No exceptions. This means that the test suite can be greatly reduced because everything is palmed off to [XMLDSIG11]. There is now a clear separation between a signer and validator, meaning that the implementation product is no longer needed. 2. Validators and signers now rely on [Signature Properties] for generation and validation of signature properties (as it should have been from the start). This removes a bunch of redundant text in the spec. The validation process is now written as an algorithm, as is the signing process. It makes it easy now to generate or validate a signature by simply following the steps. In the old spec, one had to jump all over the spec to check all sorts of things (e..g, Common Constraints for Signature Generation and Validation and the Requirements on Widget Signatures, both which are now folded into the appropriate algorithms). The spec now also links to the right places in [XMLDSIG11] and [Signature Properties]. I've added the ability for user agents to optionally support all signature properties (i.e., a signer can include them, and a validator can validate them, if they want). 3. The specification now only RECOMMENDs algorithms, key lengths, and certificate formats. Validators are no longer forced fail on certain certificate formats or algorithm. The only exception is the minimum key lengths, which are still enforced during verification (impossible to test during signing, without verifying, so the requirement is kinda useless). 4. The specification strengthens the recommended key lengths to be a little bit stronger (buy a few more years). 5. The spec now only contains 3 conformance criteria. [ To digitally sign the contents of a widget package with an author signature, a signer MUST run the algorithm to generate a digital signature. To digitally sign the contents of a widget package with a distributor signature, a signer MUST run the algorithm to generate a digital signature. To validate the siganture files of a widget package, a validator MUST run the algorithm to validate digital signatures. ] I've condensed it down to just two conformance requirements by merging the two signing requirements into 1: To digitally sign the contents of a widget package with an author signature or with a distributor signature, a signer MUST run the algorithm to generate a digital signature. -- Marcos Caceres http://datadriven.com.au
Re: Proposal: Navigation of JSON documents with html-renderer-script link relation
Is there an appropriate next step to advance this proposal? It seems like there is interest in this approach. Does it need to be written up in a more formal spec? Thanks, Kris On 2/18/2011 10:03 AM, Sean Eagan wrote: Very exciting proposal! I hope my comments below can help move it along. Regarding media type choices, the following two snippets from RFC 5988 are relevant: 1) “Registered relation types MUST NOT constrain the media type of the context IRI” Thus the link context resource should not be constrained to just application/json. Other text based media types such as XML and HTML should be applicable as renderable content as well. The proposed event interface already includes a subset of XMLHttpRequest, whose support for text based media types could be leveraged. To do this, the content property could be replaced with XMLHttpRequest’s responseText and responseXML properties, and could even add responseJSON similar to “responseXML” but containing any JSON.parse()’ed “application/json” content, and “responseHTML” containing an Element with any “text/html” content as its outerHTML. Also useful would be “status” and “statusText”, and possibly abort(). The DONE readystatechange event would correspond to “onContentLoad”. The “onContentProgress” events though might not make sense for non-JSON media types. If enough of the XmlHttpRequest interface were deemed applicable, the event object could instead include an actual asynchronous XMLHttpRequest object initially in the LOADING state as its “request” property. In this case, an “onNavigation” event would initially be sent corresponding to the XMLHttpRequest’s LOADING readystatechange event which would not be fired. This might also facilitate adding cross-origin resource sharing [1] support within this proposal. 2) “, and MUST NOT constrain the available representation media types of the target IRI. However, they can specify the behaviours and properties of the target resource (e.g., allowable HTTP methods, request and response media types that must be supported).” Thus the link target resource media type should also probably not be constrained, instead support for html and/or javascript could be specified as required. Accordingly, the link relation name should be media type agnostic, some options might be “renderer”, “view”, and “view-handler”?. HTML does seem to me like it would be the most natural for both web authors and user agent implementers, here are some additional potential advantages: * Only need to include one link, and match one URI when searching for matching existing browsing contexts to send navigation events to. * Could provide a javascript binding to the link target URI via document.initialLocation, similar to document.location (window.initialLocation might get confused with the window’s initial history location). * Allows including static Loading... UIs while apps load. * More script loading control via async and defer script tag attributes. Regarding event handling: The browser should not assign the link context URI to window.location / update history until the app has fully handled the navigation event. This would allow browsing contexts to internally cancel the event, for example if they determine that their state is saturated, and a new or alternate browsing context should handle the navigation instead. Events could be canceled via event.preventDefault(). One case in which navigation should not be cancelable is explicit window.location assignment, in which case the event’s “cancelable” property should be set to false. In order to stop event propagation to any further browsing contexts, event.stopPropagation() could be used. Since the new window.location will not be available during event handling, the event should include a “location” property containing the new URL. Also, suppose a user navigates from “example.com?x=1” to “example.com?y=2”. An app may wish to retain the “x=1” state, and instead navigate to “example.com?x=1y=2”. This could be supported by making the event’s “location” property assignable. Event completion could be defined either in an implicit fashion, such as completion of all event handlers, or if necessary in an explicit fashion, such as setting an event “complete” property to true. Browsing contexts should not be required to have been initialized via the link relation to receive navigation events, browsing contexts having been initialized via traditional direct navigation to the link target resource should be eligible as well. This way link relation indirection can be avoided during initial navigations directly to the app root. Also, non window.location assignment triggered navigation events should be allowed to be sent to any existing matching browsing context, not just the browsing context in which the event originated (if any), or could ignore existing browsing contexts (as with “Open link in New Tab”). This
Re: ISSUE-137 (IME-keypress): Should keypress events fire when using an IME? [DOM3 Events]
Hi, Hallvord- The testing we did on this issue was inconsistent between different implementations in combination with different IMEs. So, we did add a note mentioning possible key suppression. http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html#key-IME-suppress Please let us know if this satisfies your issue. Regards- -Doug Web Applications Working Group Issue Tracker wrote (on 10/6/10 2:16 AM): ISSUE-137 (IME-keypress): Should keypress events fire when using an IME? [DOM3 Events] http://www.w3.org/2008/webapps/track/issues/137 Raised by: Doug Schepers On product: DOM3 Events Hallvord R. M. Steenhttp://lists.w3.org/Archives/Public/www-dom/2010JulSep/0176.html: [[ current spec text says about the keypress event: This event type shall be generated after the keyboard mapping but before the processing of an input method editor, normally associated with the dispatching of a compositionstart, compositionupdate, or compositionend event. I think this is wrong, if an IME is actively processing the input no keypress event should fire. ]]
Re: SpellCheck API?
On Tue, May 10, 2011 at 1:42 PM, Olli Pettay olli.pet...@helsinki.fi wrote: Something like that might be better. Do you have the exact API in mind? Well, just the same as I originally proposed, except with arrays instead of scalars. But Hironori Bono's reply has mooted this idea anyway. 2011/5/11 Hironori Bono (坊野 博典) hb...@google.com: When I talked with web-application developers, some of them liked to check spellings of words not only when a user types words but also when JavaScript code creates an editable element with prepopulated text. For example, a web application checks text in a To: field with its custom spellchecker and adds misspelled underlines under invalid e-mail addresses. (This example is mainly for notifying users that they may be replying phishing e-mails.) Some other web-application developers also like to check spelling of words in an editable element before sending text to a server. To satisfy these requests, a user agent may need to send spellcheck events also when JavaScript code creates an editable node or changes text in an editable node. (Even though I have not measured how much time it takes to send these events without JavaScript execution, it may hurt the speed of JavaScript code.) This shouldn't be a problem to do. For instance, we could have a method like .spellcheck() that asks the browser to fire spellcheck events for particular nodes. When I talked with web-application developers, some of them liked to integrate n-gram spellcheckers so they can correct simple grammatical errors, such as article-noun mismatches (a apple - an apple) and subject-verb mismatches (he have - he has). To satisfy their requests, a user agent may need to send two words or more (up to all words in an editable element). Hmm, okay. This means authors will have to reimplement a lot of things: * Word-breaking. * Handling changes: they want to make sure to re-check only the text the user changed, not the whole textarea, to avoid their checking being O(N) in the length of the text. * When text is preloaded, the custom spellchecker will have to check all the text, not just visible text. Maybe this is fast enough to be okay, though, if it's only on load and not on every change. However, maybe this API will only be useful to very large sites anyway, which can do all these things. Other sites can use the built-in spellchecker, or rely on a library that did all the hard work. Then we want to be flexible, even if it's harder to use. But also, we'll have to specify extra things, like: how should markers change when the text changes? If I type Foo bar and the author's spellchecker marks Foo, and I type baz so it's now Foo bar baz, does the marker on Foo get cleared automatically? What if I change it to Fooo bar? Or Floo bar? Anyway, here's some more detailed feedback on your original idea, taking the above into account: 2011/5/9 Hironori Bono (坊野 博典) hb...@google.com: This example adds two methods. * The window.spellCheckController.removeMarkers() method Removes the all misspelled underlines and suggestions in the specified node. The node parameter represents the DOM node in which a web application like to remove all the misspelling underlines and suggestions. Why do you want to put it on a new global object? Wouldn't it make more sense as a method on the node itself? Like HTMLElement.removeSpellingMarkers(). Also, what if the author wants to remove only one spelling marker? If markers don't get automatically cleared, and the user changed some text, maybe the author wants to only clear a few existing markers without recalculating all the others. * The window.spellCheckController.addMarker() method Attaches a misspelled underline and suggestions to the specified range of a node. The node parameter represents a DOM node in which a user agent adds a misspelled underline. The start and length parameters represent a range of text in the DOM node specified by the node parameter. (We do not use a Range object here because it is hard to specify a range of text in a textarea element or an input element with it.) The suggestions parameter represents a list of words suggested by the custom spellchecker. When a custom spellchecker does not provide any suggestions, this parameter should be an empty list. Do we want this to be usable for contenteditable/designMode documents as well as textarea/input? If so, we also need an API that supports Ranges, or something equivalent. This example adds two more methods to merge the results of the spellcheckers integrated to user agents. * The window.spellCheckController.checkWord() method Checks the spellings of the specified word with the spellchecker integrated to the hosting user agent. When the specified word is a well-spelled one, this method returns true. When the specified word is a misspelled one or the user agent does not have integrated spellcheckers, this method returns
CfC: publish a new LCWD of DOM 3 Events; deadline May 18
The people working on the DOM 3 Events spec have resolved all the issues we believe are critical for DOM3 Events vis-à-vis the September 2010 LCWD [LC-2010], and have addressed the issues regarding discrepancies between D3E and DOM Core [Mins]. As such, they now propose the WG publish a new LCWD and this is Call for Consensus (CfC) to do so: http://dev.w3.org/2006/webapi/DOM-Level-3-Events/html/DOM3-Events.html This CfC satisfies the group's requirement to record the group's decision to request advancement for this LCWD. Note the Process Document states the following regarding the significance/meaning of a LCWD: [[ http://www.w3.org/2005/10/Process-20051014/tr.html#last-call Purpose: A Working Group's Last Call announcement is a signal that: * the Working Group believes that it has satisfied its relevant technical requirements (e.g., of the charter or requirements document) in the Working Draft; * the Working Group believes that it has satisfied significant dependencies with other groups; * other groups SHOULD review the document to confirm that these dependencies have been satisfied. In general, a Last Call announcement is also a signal that the Working Group is planning to advance the technical report to later maturity levels. ]] Positive response to this CfC is preferred and encouraged and silence will be assumed to mean agreement with the proposal. The deadline for comments is May 18. Please send all comments to: www-...@w3.org -Art Barstow [LC-2010] http://www.w3.org/TR/2010/WD-DOM-Level-3-Events-20100907 [Mins] http://www.w3.org/2011/05/11-webapps-minutes.html#item10
Re: SpellCheck API?
2011/5/11 Aryeh Gregor simetrical+...@gmail.com: Here's an alternative suggestion that addresses the issues I had above, while (I think) still addressing all your use-cases. Create a new interface: interface SpellcheckRange { readonly unsigned long start; readonly unsigned long length; readonly DOMStringList suggestions; readonly unsigned short options = 0; const unsigned short NO_ERROR = 1; const unsigned short ADD_SUGGESTIONS = 2; } length could be end instead, whichever is more consistent. options is a bitfield. NO_ERROR means that there is no error in this range, and the UA should not mark any words there as being errors even if the spellcheck attribute is enabled. (If the author wants to completely disable built-in suggestions, they can set spellcheck=false.) ADD_SUGGESTIONS means that the provided suggestions should be given in addition to the UA's suggestions, instead of replacing them -- by default, the UA's suggestions for that range are replaced. (The default could be the other way around if that's better.) These two features allow the author to control default UA suggestions without being able to know what they are, so there's no privacy violation. With this model, i'd want the UA to provide instances for words which are misspelled according to its standard dictionary but which are in its user's custom dictionary. The web page can try to make suggestions, but generally the UA will choose to ignore the words because it knows that the user is happy with the current word.
Re: SpellCheck API?
2011/5/11 timeless timel...@gmail.com: With this model, i'd want the UA to provide instances for words which are misspelled according to its standard dictionary but which are in its user's custom dictionary. The web page can try to make suggestions, but generally the UA will choose to ignore the words because it knows that the user is happy with the current word. This is tricky if the author's spellcheck breaks words differently from the built-in spellcheck. Also, the website might have its own idea of the user's custom dictionary, which might be more correct than the browser's (like if all Google sites tracked a custom dictionary for you and you were using someone else's computer). And the author's spellcheck might actually be flagging a grammar error, or using context-sensitive info to figure out that even though a particular word is fine in general it's a mistake here. So I'd be hesitant to say anything like this.
[FileAPI] Updates to FileAPI Editor's Draft
The Editor's Draft of the FileAPI -- http://dev.w3.org/2006/webapi/FileAPI/ -- has had some updates. These are the notable changes: 1. Blob.slice behavior has changed to more closely match String.prototype.slice from ECMAScript (and Array.prototype.slice semantically). I think we're the first host object to have a slice outside of ECMAScript primitives; some builds of browsers have already vendor-prefixed slice till it becomes more stable (and till the new behavior becomes more diffuse on the web -- Blob will soon be used in the Canvas API, etc.). I'm optimistic this will happen soon enough. Thanks to all the browser projects that helped initiate the change -- the consistency is desirable. 2. The read methods on FileReader raise a new exception -- OperationNotAllowedException -- if multiple concurrent reads are invoked. I talked this over with Jonas; we think that rather than reuse DOMException error codes (like INVALID_STATE_ERR), these kinds of scenarios should throw a distinct exception. Some things on the web (as in life) are simply not allowed. It may be useful to reuse this exception in other places. 3. FileReader.abort( ) behavior has changed. 4. There is a closer integration with event loops as defined by HTML. For browser projects with open bug databases, I'll log some bugs based on test cases I've run on each implementation. A few discrepancies exist in implementations I've tested; for instance, setting FileReader.result to the empty string vs. setting it to null, and when exceptions are thrown vs. use of the error event. Feedback encouraged! Draft at http://dev.w3.org/2006/webapi/FileAPI/ -- A*
Re: SpellCheck API?
On 05/12/2011 01:29 AM, Aryeh Gregor wrote: 2011/5/11 timelesstimel...@gmail.com: With this model, i'd want the UA to provide instances for words which are misspelled according to its standard dictionary but which are in its user's custom dictionary. The web page can try to make suggestions, but generally the UA will choose to ignore the words because it knows that the user is happy with the current word. This is tricky if the author's spellcheck breaks words differently from the built-in spellcheck. Indeed, this is tricky. But I agree with timeless, and I think that user or UA should be able to (or by default should) ignore suggestions from web page, if user's custom dictionary (which can be even OS-level) can recognize the word. Also, the website might have its own idea of the user's custom dictionary, which might be more correct than the browser's Or it can be just very wrong. And also, user may have trained browser's spellchecker to fit perfectly into his/her habits. (like if all Google sites tracked a custom dictionary for you and you were using someone else's computer). And the author's spellcheck might actually be flagging a grammar error, or using context-sensitive info to figure out that even though a particular word is fine in general it's a mistake here. So I'd be hesitant to say anything like this. Well, I guess if a web page provides spellcheck suggestions, those suggestions are just hints which UA may or may not utilize. -Olli
Re: CfC: new WD of Clipboard API and Events; deadline April 5
I think I have raised my concern before but what should happen if script calls getData() within a copy/cut event handler? Should it return the clipboard content after taking account the values set by setData? Or should it always return the same value? Or should script be banned from calling getData() during copy/cut events all together? Also, scripts shouldn't be able to call clearData() during copy/cut events, correct? For 10. Cross-origin copy/paste of source code, we might also want to consider stripping elements that can refer to external URLs such as link, meta, base, etc... - Ryosuke On Tue, Mar 29, 2011 at 4:37 AM, Arthur Barstow art.bars...@nokia.comwrote: This is a Call for Consensus to publish a new Working Draft of Hallvord's Clipboard API and Events spec: http://dev.w3.org/2006/webapi/clipops/clipops.html If you have any comments or concerns about this proposal, please send them to public-webapps by April 5 at the latest. As with all of our CfCs, positive response is preferred and encouraged and silence will be assumed to be agreement with the proposal. Please note that during this CfC, Hallvord will continue to edit the ED and will create a Table of Contents before the spec is published in w3.org/TR/ . -Art Barstow
Re: [File API: FileSystem] Path restrictions and case-sensitivity
I've grouped responses to bits of this thread so far below: Glenn said: If *this API's* concept of filenames is case-insensitive, then IMAGE.JPG and image.jpg represent the same file on English systems and two different files on Turkish systems, which is an interop problem. Timeless replied: no, if the api is case insensitive, then it's case insensitive *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. I also don't think having code that works in one locale and not another [Glenn's image.jpg example] is fantastic. It was what we were stuck with when I was trying to allow implementers the choice of a pass-through implementation, but given that that's fallen to the realities of path lengths on Windows, I feel like we should try to do better. Glenn: This can be solved at the application layer in applications that want it, without baking it into the filesystem API. This is mostly true; you'd have to make sure that all alterations to the filesystem went through a single choke-point or you'd have the potential for race conditions [or you'd need to store the original-case filenames yourself, and send the folded case down to the filesystem API]. Glenn: A virtual FS as the backing for the filesystem API does not resolve that core issue. It makes sense to encourage authors to gracefully handle errors thrown by creating files and directories. Such a need has already been introduced via Google Chrome's unfortunate limitation of a 255 byte max path length. That limitation grew out of the OS-dependent passthrough implementation. We're fixing that right now, with this proposal. The one take-away I have from that bug: it would have been nice to have a more descriptive error message. It took awhile to figure out that the path length was too long for the implementation. I apologize for that--it was an oversight. If we can relax the restrictions to a small set, it'll be more obvious what the problems are. IIRC this problem was particularly confusing because we were stopping you well short of the allowed 255 bytes, due to the your profile's nesting depth. I'd like to obviate the need for complicated exceptions or APIs that suggest better names, by leaving naming up to the app developer as much as possible. [segue into other topics] Glenn asked about future expansions of IndexedDB to handle Blobs, specifically with respect to FileWriter and efficient incremental writes. Jonas replied: A combination of FileWriter and IndexedDB should be able to handle this without problem. This would go beyond what is currently in the IndexedDB spec, but it's this part that we're planning on experimenting with. The way I have envisioned it to work is to add a function called createFileEntry somewhere, for example the IDBFactory interface. This would return a fileEntry which you could then write to using FileWriter as well as store in the database using normal database operations. As Jonas and I have discussed in the past, I think that storing Blobs via reference in IDB works fine, but when you make them modifiable FileEntries instead, you either have to give up IDB's transactional nature or you have to give up efficiency. For large mutable Blobs, I don't think there's going to be a clean interface there. Still, I look forward to seeing what you come up with. Eric
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Thu, May 12, 2011 at 2:08 AM, Eric U er...@google.com wrote: Timeless replied: no, if the api is case insensitive, then it's case insensitive *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. No, I proposed case preserving. If the file is first created with a dotless i, that hint is preserved and a user agent could and should retain this (e.g. for when it serializes to a real file system). I'm just suggesting not allowing an application to ask for distinct dotted and dotless instances of the same approximate file name. There's a reasonable chance that case collisions will be disastrous when serialized, thus it's better to prevent case collisions when an application tries to create the file - the application can accept a suggested filename or generate a new one.
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wed, May 11, 2011 at 7:08 PM, Eric U er...@google.com wrote: *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. no, if the api is case insensitive, then it's case insensitive You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. I also don't think having code that works in one locale and not another [Glenn's image.jpg example] is fantastic. It was what we were stuck with when I was trying to allow implementers the choice of a pass-through implementation, but given that that's fallen to the realities of path lengths on Windows, I feel like we should try to do better. To clarify something which I wasn't aware of before digging into this deeper: Unicode case folding is *not* locale-sensitive. Unlike lowercasing, it uses the same rules in all locales, except Turkish. Turkish isn't just an easy-to-explain example of one of many differences (as it is with Unicode lowercasing); it is, as far as I see, the *only* exception. Unicode's case folding rules have a special flag to enable Turkish in case folding, which we can safely ignore here--nobody uses it for filenames. (Windows filenames don't honor that special case on Turkish systems, so those users are already accustomed to that.) That said, it's still uncomfortable having a dependency on the Unicode folding table here: if it ever changes, it'll cause both interop problems and data consistency problems (two files which used to be distinct filenames turning into two files with the same filenames due to a browser update updating its Unicode data). Granted, either case would probably be vanishingly rare in practice at this point. All that aside, I think a much stronger argument for case-sensitive filenames is the ability to import files from essentially any environment; this API's filename rules are almost entirely a superset of all other filesystems and file containers. For example, sites can allow importing (once the needed APIs are in place) directories of data into the sandbox, without having to modify any filenames to make it fit a more constrained API. Similarly, sites can extract tarballs directly into the sandbox. (I've seen tars containing both Makefile and makefile; maybe people only do that to confound Windows users, but they exist.) I'm not liking the backslash exception. It's the only thing that prevents this API from being a complete superset, as far as I can see, of all production filesystems. Can we drop that rule? It might be a little surprising to developers who have only worked in Windows, but they'll be surprised anyway, and it shouldn't lead to latent bugs. Glenn: This can be solved at the application layer in applications that want it, without baking it into the filesystem API. This is mostly true; you'd have to make sure that all alterations to the filesystem went through a single choke-point or you'd have the potential for race conditions [or you'd need to store the original-case filenames yourself, and send the folded case down to the filesystem API]. Yeah, it's not necessarily easy to get right, particularly if you have multiple threads running... (The rest was Charles, by the way.) A virtual FS as the backing for the filesystem API does not resolve that core issue. It makes sense to encourage authors to gracefully handle errors thrown by creating files and directories. Such a need has already been introduced via Google Chrome's unfortunate limitation of a 255 byte max path length. -- Glenn Maynard
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wed, May 11, 2011 at 4:47 PM, timeless timel...@gmail.com wrote: On Thu, May 12, 2011 at 2:08 AM, Eric U er...@google.com wrote: Timeless replied: no, if the api is case insensitive, then it's case insensitive *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. No, I proposed case preserving. If the file is first created with a dotless i, that hint is preserved and a user agent could and should retain this (e.g. for when it serializes to a real file system). I'm just suggesting not allowing an application to ask for distinct dotted and dotless instances of the same approximate file name. There's a reasonable chance that case collisions will be disastrous when serialized, thus it's better to prevent case collisions when an application tries to create the file - the application can accept a suggested filename or generate a new one. There are a few things going on here: 1) Does the filesystem preserve case? If it's case-sensitive, then yes. If it's case-insensitive, then maybe. 2) Is it case-sensitive? If not, you have to decide how to do case folding, and that's locale-specific. As I understand it, Unicode case-folding isn't locale specific, except when you choose to use the Turkish rules, which is exactly the problem we're talking about. 3) If you're case folding, are you going to go with a single locale everywhere, or are you going to use the locale of the user? 4) [I think this is what you're talking about w.r.t. not allowing both dotted and dotless i]: Should we attempt to detect filenames that are /too similar/ for some definition of /too similar/, ostensibly to avoid confusing the user. As I read what you wrote, you wanted: 1) yes 2) no 3) a new locale in which I, ı, I and i all fold to the same letter, everywhere 4) yes, possibly only for the case of I, ı, I and i 4 is, in the general case, impossible. It's not well-defined, and is just as likely to cause problems as solve them. If you *just* want to check for ı vs. i, it's possible, but it's still not clear to me that what you're doing will be the correct behavior in Turkish locales [are there any Turkish words, names abbreviations, etc. that only differ in that character?] and it doesn't matter elsewhere.
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wed, May 11, 2011 at 4:52 PM, Glenn Maynard gl...@zewt.org wrote: On Wed, May 11, 2011 at 7:08 PM, Eric U er...@google.com wrote: *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. no, if the api is case insensitive, then it's case insensitive You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. I also don't think having code that works in one locale and not another [Glenn's image.jpg example] is fantastic. It was what we were stuck with when I was trying to allow implementers the choice of a pass-through implementation, but given that that's fallen to the realities of path lengths on Windows, I feel like we should try to do better. To clarify something which I wasn't aware of before digging into this deeper: Unicode case folding is *not* locale-sensitive. Unlike lowercasing, it uses the same rules in all locales, except Turkish. Turkish isn't just an easy-to-explain example of one of many differences (as it is with Unicode lowercasing); it is, as far as I see, the *only* exception. Unicode's case folding rules have a special flag to enable Turkish in case folding, which we can safely ignore here--nobody uses it for filenames. (Windows filenames don't honor that special case on Turkish systems, so those users are already accustomed to that.) So it's not locale-sensitive unless it is, but nobody does that anyway, so don't worry about it? I'm a bit uneasy about that in general, but Windows not supporting it is a good point. Anyone know about Mac or Linux systems? That said, it's still uncomfortable having a dependency on the Unicode folding table here: if it ever changes, it'll cause both interop problems and data consistency problems (two files which used to be distinct filenames turning into two files with the same filenames due to a browser update updating its Unicode data). Granted, either case would probably be vanishingly rare in practice at this point. Agreed [both in the discomfort and the rarity], but I think it's a very ugly dependency anyway. All that aside, I think a much stronger argument for case-sensitive filenames is the ability to import files from essentially any environment; this API's filename rules are almost entirely a superset of all other filesystems and file containers. For example, sites can allow importing (once the needed APIs are in place) directories of data into the sandbox, without having to modify any filenames to make it fit a more constrained API. Similarly, sites can extract tarballs directly into the sandbox. (I've seen tars containing both Makefile and makefile; maybe people only do that to confound Windows users, but they exist.) I've actually ended up in that situation on Linux, with tools that autogenerated makefiles, but were run from Makefiles. It's not a situation I really wanted to be in, but it was nice that it actually worked without me having to hack around it. I'm not liking the backslash exception. It's the only thing that prevents this API from being a complete superset, as far as I can see, of all production filesystems. Can we drop that rule? It might be a little surprising to developers who have only worked in Windows, but they'll be surprised anyway, and it shouldn't lead to latent bugs. It can't be a complete superset of all filesystems in that it doesn't allow forward slash in filenames either. However, I see your point. You could certainly have a filename with a backslash in it on a Linux/ext2 system. Does anyone else have an opinion on whether it's worth the confusion potential? Glenn: This can be solved at the application layer in applications that want it, without baking it into the filesystem API. This is mostly true; you'd have to make sure that all alterations to the filesystem went through a single choke-point or you'd have the potential for race conditions [or you'd need to store the original-case filenames yourself, and send the folded case down to the filesystem API]. Yeah, it's not necessarily easy to get right, particularly if you have multiple threads running... (The rest was Charles, by the way.) Ah, sorry Glenn and Charles. A virtual FS as the backing for the filesystem API does not resolve that core issue. It makes sense to encourage authors to gracefully handle errors thrown by creating files and directories. Such a need has already been introduced via Google Chrome's unfortunate limitation of a 255 byte max path length. -- Glenn Maynard
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wed, May 11, 2011 at 8:13 PM, Eric U er...@google.com wrote: So it's not locale-sensitive unless it is, but nobody does that anyway, so don't worry about it? I'm a bit uneasy about that in general, but Windows not supporting it is a good point. It's not locale-sensitive at all, unless the one special case, Turkish, is enabled explicitly. I think the norm is to ignore Turkish entirely for purposes of case folding. (I wasn't even able to find a way to do a Turkish-enabled case folding with libicu, though the header constant U_FOLD_CASE_EXCLUDE_SPECIAL_I suggests it's in there somewhere.) Anyone know about Mac or Linux systems? Native Linux filesystems are case-sensitive, so I'm not sure there's anything to compare against there. (glibc itself doesn't have direct support for case folding, as far as I know; you use a libraries like libicu for that sort of thing, and libicu does consider i == I when case folding, including in Turkish locales.) I'm not liking the backslash exception. It's the only thing that prevents this API from being a complete superset, as far as I can see, of all production filesystems. Can we drop that rule? It might be a little surprising to developers who have only worked in Windows, but they'll be surprised anyway, and it shouldn't lead to latent bugs. It can't be a complete superset of all filesystems in that it doesn't allow forward slash in filenames either. However, I see your point. You could certainly have a filename with a backslash in it on a Linux/ext2 system. Does anyone else have an opinion on whether it's worth the confusion potential? Of all production end-user filesystems--on any systems where they're allowed, users are going to be used to this being incompatible with the rest of the world already. I guess there's one other case where it's not necessarily a superset: filenames containing invalid byte sequences which can't be represented in UTF-16. I do end up with these from time to time, eg. when extracting a ZIP containing non-UTF-8 filenames. I think I'm not very worried about this (at least for the sandbox case)--this is an error recovery case, where backslashes in filenames are legitimate, if uncommon. -- Glenn Maynard
Re: Concerns regarding cross-origin copy/paste security
On Wed, May 4, 2011 at 2:46 PM, Daniel Cheng dch...@chromium.org wrote: From my understanding, we are trying to protect against [1] hidden data being copied without a user's knowledge and [2] XSS via pasting hostile HTML. In my opinion, the algorithm as written is either going to remove too much information or not enough. If it removes too much, the HTML paste is effectively useless to a client app. If it doesn't remove enough, then the client app is going to have to sanitize the HTML itself anyway. I would argue that we should primarily be trying to prevent [1] and leave it up to web pages to prevent [2]. [2] is no different than using data from any other untrusted source, like dragging HTML or data from an XHR. It doesn't make sense to special-case HTML pastes. However, fragment parsing algorithm as spec'ed in HTML5 already prevents [2]. It removes event handler, script element, etc... To me, it doesn't make sense to remove the other elements: - OBJECT: Could be used for SVG as I understand. - FORM: Essentially harmless once the action attribute is cleared. - INPUT (non-hidden, non-password): Content is already available via text/plain. - TEXTAREA: See above. - BUTTON, INPUT buttons: Most of the content is already available via text/plain. We can scrub the value attribute if there is concern about that. - SELECT/OPTION/OPTGROUP: See above. I'm also curious as to why these elements are being removed. Hallvord? Should this sanitization be done during a copy as well to prevent data a paste in a non-conforming browser from pasting unexpected things? We already do some of this stuff in WebKit. For example, we avoid serializing non-rendered contents. - Ryosuke
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wednesday, May 11, 2011, Eric U er...@google.com wrote: I've grouped responses to bits of this thread so far below: Glenn said: If *this API's* concept of filenames is case-insensitive, then IMAGE.JPG and image.jpg represent the same file on English systems and two different files on Turkish systems, which is an interop problem. Timeless replied: no, if the api is case insensitive, then it's case insensitive *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. I also don't think having code that works in one locale and not another [Glenn's image.jpg example] is fantastic. It was what we were stuck with when I was trying to allow implementers the choice of a pass-through implementation, but given that that's fallen to the realities of path lengths on Windows, I feel like we should try to do better. Glenn: This can be solved at the application layer in applications that want it, without baking it into the filesystem API. This is mostly true; you'd have to make sure that all alterations to the filesystem went through a single choke-point or you'd have the potential for race conditions [or you'd need to store the original-case filenames yourself, and send the folded case down to the filesystem API]. Glenn: A virtual FS as the backing for the filesystem API does not resolve that core issue. It makes sense to encourage authors to gracefully handle errors thrown by creating files and directories. Such a need has already been introduced via Google Chrome's unfortunate limitation of a 255 byte max path length. That limitation grew out of the OS-dependent passthrough implementation. We're fixing that right now, with this proposal. The one take-away I have from that bug: it would have been nice to have a more descriptive error message. It took awhile to figure out that the path length was too long for the implementation. I apologize for that--it was an oversight. If we can relax the restrictions to a small set, it'll be more obvious what the problems are. IIRC this problem was particularly confusing because we were stopping you well short of the allowed 255 bytes, due to the your profile's nesting depth. I'd like to obviate the need for complicated exceptions or APIs that suggest better names, by leaving naming up to the app developer as much as possible. [segue into other topics] Glenn asked about future expansions of IndexedDB to handle Blobs, specifically with respect to FileWriter and efficient incremental writes. Jonas replied: A combination of FileWriter and IndexedDB should be able to handle this without problem. This would go beyond what is currently in the IndexedDB spec, but it's this part that we're planning on experimenting with. The way I have envisioned it to work is to add a function called createFileEntry somewhere, for example the IDBFactory interface. This would return a fileEntry which you could then write to using FileWriter as well as store in the database using normal database operations. As Jonas and I have discussed in the past, I think that storing Blobs via reference in IDB works fine, but when you make them modifiable FileEntries instead, you either have to give up IDB's transactional nature or you have to give up efficiency. For large mutable Blobs, I don't think there's going to be a clean interface there. Still, I look forward to seeing what you come up with. Why not simply make the API case sensitive and allow *any* filename that can be expressed in JavaScript strings. Implementations can do their best to make the on-filesystem-filename match as close as they can to the filename exposed in the API and keep a map which maps between OS filename and API filename for the cases when the two can't be the same. So if the pake creates two files named Makefile and makefile on a system that is case insensitive, the implementation could call the second file makefile(2) and keep track of that mapping. This removes any concerns about case, internationalization and system limitation issues and thereby makes things very easy for web authors. I might be missing something obvious as I haven't followed the discussion in detail. Appologies if that's the case. / Jonas
Re: [File API: FileSystem] Path restrictions and case-sensitivity
On Wed, May 11, 2011 at 7:14 PM, Jonas Sicking jo...@sicking.cc wrote: On Wednesday, May 11, 2011, Eric U er...@google.com wrote: I've grouped responses to bits of this thread so far below: Glenn said: If *this API's* concept of filenames is case-insensitive, then IMAGE.JPG and image.jpg represent the same file on English systems and two different files on Turkish systems, which is an interop problem. Timeless replied: no, if the api is case insensitive, then it's case insensitive *everywhere*, both on Turkish and on English systems. Things could only be case sensitive when serialized to a real file system outside of the API. I'm not proposing a case insensitive system which is locale aware, i'm proposing one which always folds. You're proposing not just a case-insensitive system, but one that forces e.g. an English locale on all users, even those in a Turkish locale. I don't think that's an acceptable solution. I also don't think having code that works in one locale and not another [Glenn's image.jpg example] is fantastic. It was what we were stuck with when I was trying to allow implementers the choice of a pass-through implementation, but given that that's fallen to the realities of path lengths on Windows, I feel like we should try to do better. Glenn: This can be solved at the application layer in applications that want it, without baking it into the filesystem API. This is mostly true; you'd have to make sure that all alterations to the filesystem went through a single choke-point or you'd have the potential for race conditions [or you'd need to store the original-case filenames yourself, and send the folded case down to the filesystem API]. Glenn: A virtual FS as the backing for the filesystem API does not resolve that core issue. It makes sense to encourage authors to gracefully handle errors thrown by creating files and directories. Such a need has already been introduced via Google Chrome's unfortunate limitation of a 255 byte max path length. That limitation grew out of the OS-dependent passthrough implementation. We're fixing that right now, with this proposal. The one take-away I have from that bug: it would have been nice to have a more descriptive error message. It took awhile to figure out that the path length was too long for the implementation. I apologize for that--it was an oversight. If we can relax the restrictions to a small set, it'll be more obvious what the problems are. IIRC this problem was particularly confusing because we were stopping you well short of the allowed 255 bytes, due to the your profile's nesting depth. I'd like to obviate the need for complicated exceptions or APIs that suggest better names, by leaving naming up to the app developer as much as possible. [segue into other topics] Glenn asked about future expansions of IndexedDB to handle Blobs, specifically with respect to FileWriter and efficient incremental writes. Jonas replied: A combination of FileWriter and IndexedDB should be able to handle this without problem. This would go beyond what is currently in the IndexedDB spec, but it's this part that we're planning on experimenting with. The way I have envisioned it to work is to add a function called createFileEntry somewhere, for example the IDBFactory interface. This would return a fileEntry which you could then write to using FileWriter as well as store in the database using normal database operations. As Jonas and I have discussed in the past, I think that storing Blobs via reference in IDB works fine, but when you make them modifiable FileEntries instead, you either have to give up IDB's transactional nature or you have to give up efficiency. For large mutable Blobs, I don't think there's going to be a clean interface there. Still, I look forward to seeing what you come up with. Why not simply make the API case sensitive and allow *any* filename that can be expressed in JavaScript strings. That's the way I'm leaning. Implementations can do their best to make the on-filesystem-filename match as close as they can to the filename exposed in the API and keep a map which maps between OS filename and API filename for the cases when the two can't be the same. We're not speccing out anything outside the sandbox yet, and we've decided that a pass-through implementation is impractical, so we don't need this approach yet, there being no on-filesystem-filename. It certainly could work for the oft-mentioned My Photos extension, when we get around to that. So if the pake creates two files named Makefile and makefile on a system that is case insensitive, the implementation could call the second file makefile(2) and keep track of that mapping. This removes any concerns about case, internationalization and system limitation issues and thereby makes things very easy for web authors. I might be missing something obvious as I haven't
Re: SpellCheck API?
Greetings Aryeh, et al, Thank you for your alternative suggestion. In my honest opinion, I do not stick to my interfaces so much if there are better alternatives. My proposal is just based on my prototype, which has been uploaded to http://webkit.org/b/59693, and I wish someone in this ML provides better alternatives. On Thu, May 12, 2011 at 5:42 AM, Aryeh Gregor simetrical+...@gmail.com wrote: Hmm, okay. This means authors will have to reimplement a lot of things: * Word-breaking. * Handling changes: they want to make sure to re-check only the text the user changed, not the whole textarea, to avoid their checking being O(N) in the length of the text. * When text is preloaded, the custom spellchecker will have to check all the text, not just visible text. Maybe this is fast enough to be okay, though, if it's only on load and not on every change. However, maybe this API will only be useful to very large sites anyway, which can do all these things. Other sites can use the built-in spellchecker, or rely on a library that did all the hard work. Then we want to be flexible, even if it's harder to use. But also, we'll have to specify extra things, like: how should markers change when the text changes? If I type Foo bar and the author's spellchecker marks Foo, and I type baz so it's now Foo bar baz, does the marker on Foo get cleared automatically? What if I change it to Fooo bar? Or Floo bar? Yes, it is a difficult question and it was out of my scope when I sent my original e-mail. When a web application need to use my API to handle this case, the web application needs to compare text in the focused node when we receive a DOM event (such as keydown, and a keyup), clean up all markers, re-check all text, and add markers. (It is indeed inefficient even when th web application has a cache.) Anyway, here's some more detailed feedback on your original idea, taking the above into account: 2011/5/9 Hironori Bono (坊野 博典) hb...@google.com: This example adds two methods. * The window.spellCheckController.removeMarkers() method Removes the all misspelled underlines and suggestions in the specified node. The node parameter represents the DOM node in which a web application like to remove all the misspelling underlines and suggestions. Why do you want to put it on a new global object? Wouldn't it make more sense as a method on the node itself? Like HTMLElement.removeSpellingMarkers(). Also, what if the author wants to remove only one spelling marker? If markers don't get automatically cleared, and the user changed some text, maybe the author wants to only clear a few existing markers without recalculating all the others. Thank you for noticing it. I do not assume it since I have added this method just before sending my original e-mail. As written in your alternative, it is much better to have a method that removes a misspelled marker. * The window.spellCheckController.addMarker() method Attaches a misspelled underline and suggestions to the specified range of a node. The node parameter represents a DOM node in which a user agent adds a misspelled underline. The start and length parameters represent a range of text in the DOM node specified by the node parameter. (We do not use a Range object here because it is hard to specify a range of text in a textarea element or an input element with it.) The suggestions parameter represents a list of words suggested by the custom spellchecker. When a custom spellchecker does not provide any suggestions, this parameter should be an empty list. Do we want this to be usable for contenteditable/designMode documents as well as textarea/input? If so, we also need an API that supports Ranges, or something equivalent. This example adds two more methods to merge the results of the spellcheckers integrated to user agents. * The window.spellCheckController.checkWord() method Checks the spellings of the specified word with the spellchecker integrated to the hosting user agent. When the specified word is a well-spelled one, this method returns true. When the specified word is a misspelled one or the user agent does not have integrated spellcheckers, this method returns false. The word parameter represents the DOM string to check its spelling. The language parameter represents a BCP-47 http://www.rfc-editor.org/rfc/bcp/bcp47.txt tag indicating the language code used by the integrated spellchecker. * The window.spellCheckController.getSuggestionsForWord() method Returns the list of suggestions for the specified word. This method returns a DOMStringList object consisting of words suggested by the integrated spellchecker. When the specified words is a well-spelled word, this method returns an empty list. When the user agent does not have integrated spellcheckers, this method returns null. The word parameter represents the DOM string to check its spelling. The