On 05/09/2011 11:58 AM, Hironori Bono (坊野 博典) wrote:
Greetings,

I'm Hironori Bono, a software engineer for Google Chrome.
We recently received requests from web-application developers (and
extension developers) that they would like to use the spellchecker
Quite different targets.

integrated into Google Chrome and to replace the spellchecker with
their spellcheckers implemented in JavaScript as written in the
following document. To satisfy their requests, I would like to propose
to add an API that controls spellcheckers integrated into a user agent
if it has. Even though I'm wondering if all user agents need this API,
it would be great to give me feedback.

Thank you for your interest in advance.

1. Introduction
HTML5 provides a spellcheck attribute to enable or disable the
spellcheckers integrated into user agents in an editable element. This
attribute prevents the spellcheckers from checking text in an editable
element where web applications do not like it, e.g. e-mail addresses,
URLs, etc. Some user agents provide scripting access to spellcheckers.
Providing scripting access to built-in spellchecker is a privacy
violation (this has been discussed in @whatwg mailing list) -
web page could know which language users uses/has for spellchecking
and if user has added new word to the known-words list.



For example, Internet Explorer allows using the spellchecker
integrated into Microsoft Word via ActiveX as listed in the following
code snippet.

function CheckText(text) {
   var result = new Array;
   var app = new ActiveXObject('Word.Application');
   var doc = app.Documents.Add();
   doc.Content = text;
   for (var i = 1; i<= doc.SpellingErrors.Count; i++) {
     var spellingError = doc.SpellingErrors.Item(i);
     for (var j = 1; j<= spellingError.Words.Count; j++) {
       var word = spellingError.Words.Item(j);
       var error = {};
       error.word = word.Text;
       error.start = word.Start;
       error.length = word.Text.length;
       error.suggestions = new Array;
       var suggestions = word.GetSpellingSuggestions();
       for (var k = 1; k<= suggestions.Count; k++) {
         error.suggestions.push(suggestions.Item(k).Name);
       }
       result.push(error);
     }
   }
   return result;
}

On the other hand, it is not so easy for web-application developers to
integrate custom spellcheckers (e.g. a spellchecker that uses a
contact list to check e-mail addresses, names, street addresses, etc.)
into their web applications. Even though several web applications
(such as GMails)
Oh, I didn't know that if I teach my browser's spellchecker to know
the words I use commonly, GMail can't handle that. Interesting.


 have integrated custom spellcheckers, such web
applications use content-editable<div>  elements to render misspelled
underlines and the ‘z-index’ properties to show suggestions,
respectively. Unfortunately, it is not so easy to apply these
techniques when web applications use<textarea>  elements or<input>
elements for user input because it is pretty hard to identify the
position of misspelled words in these elements. To solve this problem,
it would be great for user agents to provide scripting access to their
spell-checker framework so web-application developers can integrate
their custom spellcheckers
Adding support for custom spellcheckers seems reasonable.
Need to just make sure that web page doesn't get access to the native
spellcheck data (at least not without permission).


to their web applications as listed in the
following code snippet.

function CheckTextOfNode(node) {
   // Remove all the previous spellchecking results.
   window.spellCheckController.removeMarkers(node);

   // Check the text in the specified node.
   var result = CheckText(node.innerText ? node.innerText : node.value);
   for (var i = 0; i<  result.length; i++) {
     // Add a misspelled underline and suggestions to the specified word.
     window.spellCheckController.addMarker(
         node, result[i].start, result[i].length, result[i].suggestions);
   }
}

This example adds two methods.
   * The window.spellCheckController.removeMarkers() method
     Removes the all misspelled underlines and suggestions in the specified 
node.
     The node parameter represents the DOM node in which a web
application like to remove all the misspelling underlines and
suggestions.
   * The window.spellCheckController.addMarker() method
     Attaches a misspelled underline and suggestions to the specified
range of a node.
     The node parameter represents a DOM node in which a user agent
adds a misspelled underline.
     The start and length parameters represent a range of text in the
DOM node specified by the node parameter. (We do not use a Range
object here because it is hard to specify a range of text in a
<textarea>  element or an<input>  element with it.)
     The suggestions parameter represents a list of words suggested by
the custom spellchecker. When a custom spellchecker does not provide
any suggestions, this parameter should be an empty list.

Even though these functions are sufficient for web-application
developers who use only their custom spellcheckers, they are not
sufficient for ones who use both their custom spellcheckers and the
one integrated to user agents. (For example, web applications that use
the integrated spellcheckers only for words which their custom
spellcheckers treat as misspelled.)

function CheckTextOfNode(node) {
   // Reset all the previous spellcheck results.
   Window.spellCheckController.removeMarkers(node);

   // Check the text with our custom spellchecker.
   var result = CheckText(node.innerText ? node.innerText : node.value);
   for (var i = 0; i<  result.length; i++) {
     // Use the intergrated spellchecker to check a misspelled word.
     if (!window.spellCheckController.checkWord(result.word)) {
       result[i].suggestions.concat(
           window.spellCheckController.getSuggestionsForWord(result.text));
       window.spellCheckController.addMarker(
           node, result[i].start, result[i].length, result[i].suggestions);
     }
   }
}

This example adds two more methods to merge the results of the
spellcheckers integrated to user agents.
   * The window.spellCheckController.checkWord() method
     Checks the spellings of the specified word with the spellchecker
integrated to the hosting user agent. When the specified word is a
well-spelled one, this method returns true. When the specified word is
a misspelled one or the user agent does not have integrated
spellcheckers, this method returns false.
     The word parameter represents the DOM string to check its spelling.
     The language parameter represents a BCP-47
<http://www.rfc-editor.org/rfc/bcp/bcp47.txt>  tag indicating the
language code used by the integrated spellchecker.

This is the privacy violation, and not acceptable as such.
I wonder how to not expose native spellchecker data to web page, yet
support this use case. Or do we need yet another permission, which user
has to give to the page before the spellchecker API fully working.



   * The window.spellCheckController.getSuggestionsForWord() method
     Returns the list of suggestions for the specified word. This
method returns a DOMStringList object consisting of words suggested by
the integrated spellchecker.  When the specified words is a
well-spelled word, this method returns an empty list. When the user
agent does not have integrated spellcheckers, this method returns
null.
     The word parameter represents the DOM string to check its spelling.
     The language parameter represents a BCP-47
<http://www.rfc-editor.org/rfc/bcp/bcp47.txt>  tag indicating the
language code used by the integrated spellchecker.
This is also part of the privacy problem.




2. Intefaces

Window implements SpellCheckController;

[Supplemental, NoInterfaceObject]
interface SpellCheckController {
   void removeMarkers(Node node);
   bool addMarker(Node node, long start, long length, DOMStringList 
suggestions);
   void checkWord(DOMString word, DOMString language);
   DOMStringList getSuggestionsForWord(DOMString word, DOMString language);
};

Regards,

Hironori Bono
E-mail: hb...@google.com





So the API itself looks reasonable, but the privacy problem is quite major one.


-Olli


Reply via email to