On 8/17/2013 5:16 AM, Brendan Long wrote:
On 05/01/2013 10:57 PM, Brett Zamir wrote:
I wanted to propose (if work has not already been done in this area)
creating an HTTP extension to allow querying for retrieval and
updating of portions of HTML (or XML) documents where the server is so
capable and enabled, obviating the need for a separate database (or
more accurately, bringing the database to the web server layer).
Can't you use JavaScript to do this already? Just put each part of the
page in a separate HTML or XML files, then have JavaScript request the
parts it needs and put insert them into the DOM as needed.

Yes, one can, but:

1. It won't allow users to have their browser (or privileged add-on code) make such universal, cross-domain partial-document-obtaining requests to any webpage they wish (at least to any webpage which is on a server where a drop-in server module or script aware of this standard protocol had been employed).

Imagine, for example, if all a government had to do to release their data online was to save a Word doc, Excel file, Access database, etc. as HTML and FTP it to a publicly-accessible directory on their server (and add a server module aware of the HTML Query API which intercepts such queries sent to files in their public directory to handle XPath/CSS Selector query processing and send back CORS headers with the modified response). Bam, there is now a genuine, queryable database on the Web which is available to the world for querying.

One could obtain subsets of such data stores without the document owner (in this case, the government) needing to go through hoops to ensure their documents/data are converted into JSON/XML/etc., have custom REST APIs provided, has a search interface created, etc. (though this protocol would let people store their data in a JSON database, etc. if they wished, but they could also just upload static HTML files).

Consumers of this data (whether web developers or users of the browser/add-on concept mentioned above) would have no need to do inefficient screen scraping which first had to grab entire documents to be able to extract useful data. There would be no need for server-side-only solutions (at least if one is coming from a privileged environment such as a browser/add-on, if the document owner enabled CORS on their server, or if the site is one's own).

2. Such JavaScript solutions as you mention are custom and require developers to learn different client-side (and server-side) libraries and learn different server APIs. With a standard HTML Query API, one would need know nothing more than the URL of the data store (and the structure of the contents one was seeking) to get away with bare XMLHttpRequest (or $.ajax) calls that do what one wants against the data store--no need to know what specific query strings to add to meet the requirements of a custom server-side API. (In some cases, that may admittedly be more convenient to have a succinct query syntax optimized for the specific document format, but it is nice to always have the generic query option.)

3. Custom JavaScript requires sites to include such code in every file and to write scripts. Of course, SOME data necessitates customized access control such as a website's user database (though even here, one could use http://en.wikipedia.org/wiki/Basic_access_authentication to avoid scripting).

But even with scripts determining access control, many sites could still benefit, by being able to say create, upload, and manage a Word document saved as HTML with a table whose (WYSIWYG) columns were "user" and "password" and then, as per #2 above, use a single reusable server-side library implementing the standard to query this document. The site could, if they wished, later switch to importing their document into a database while still keeping the HTML Query API library calls. And if a server-side script wanted to say let authenticated administrators query or alter the user table, client-side JavaScript could be output to them by the server-side code which conducted queries against the user table in the same familiar manner.

4. If markup would be added to HTML which coordinated intelligently with this query scheme, say for example to allow querying of documents with known paragraph numbering (there are more interesting and frequently needed use cases than this with tables and lists as I'm planning to explain in my response to Ian, but I'll use a simpler example in this response)...

a. The document creator could create:

<article paragraphRange="">
<p>This is par. 1</p>
<p>This is par. 2</p>
...
<p>This is par. 500</p>
</article>

b. an intermediary server plugin would detect the "paragraphRange" attribute and then auto-strip out all of the inner paragraphs before delivering the document to the user (unless say other markup were present on <article> such as `showRange="1-20"` in which case it would only strip out paragraphs 21-500, or if `paragraphsPerPage="5"` were set (without showRange), it would strip out pars. 6-500).

c. the browser when it received the document could then recognize the "paragraphRange" attribute to know that it should add its own search interface widget at this point in the document which might contain: 1. A generic browser-localized label, e.g. "Choose a range of paragraphs" 2. Two numeric text boxes to allow the user to request a paragraph range, e.g., 23-45 from the server 3. A "get all paragraphs" button or link (as an alternative to the range) to obtain all paragraphs beneath the widget (or "get the remaining paragraphs" had the "showRange" attribute been used). 4. If the "paragraphsPerPage" attribute were present, the browser could also add a link to "get the next 5 paragraphs"

Although custom scripts could do this, it requires the markup creator, including those on any public content creation sites such as wikis, blogs, and discussion forums, to include such a custom script as well.

Even if the WhatWG did not wish to engage in adopting such specific markup conventions until seeing experience gained and demand for these widgets assessed, having an official HTML Query Language by which widget creators could pass information back-and-forth between client and server in a uniform manner would still facilitate the development process as per #2 above.

Reply via email to