On Apr 8, 2009, at 2:51 PM, Vladimir Vukicevic wrote:
There's been a lot of interest around the Web Storage spec (formerly
part of whatwg HTML5), which exposes a SQL database to web
applications to use for data storage, both for online and offline
use. It presents a simple API designed for executing SQL statements
and reading result rows. But there's an interesting problem with
this; unlike the rest of the HtML5, this section defines a core
piece of functionality in terms of an undefined chunk referenced as
"SQL".
Treating SQL as an undefined chunk is not unprecedented. Most database
APIs platforms do not require a restricted syntax of SQL to be
supported in the underlying database. For example, X/OPEN SQL CLI [1]
was based on SQL 92 but its successors (JDBC, ODBC) go beyond this and
support any additional SQL syntax supported by the underlying data
source.
The initial implementations of Web Storage are both based on SQLite,
and expose the dialect of SQL understood by SQLite to web content.
I'm actually a big fan of SQLite, and was one of the advocates for
pulling it into the Gecko platform. However, SQLite implements a
variant of SQL, with a number of deviations from other SQL engines,
especially in terms of the types of data that can be placed in
columns.
Data types are certainly relevant here because with JavaScript you
never know what arguments will translate to which values and types.
For example, what does NULL translate to and what about undefined? My
observation is that undefined is translated to the text value
"undefined" and NULL translates to the SQL NULL. But there is no
specification for this behavior.
Web content that is created to use database storage with SQLite as
the backing is unlikely to work with any other backend database.
Similarly, if another database was chosen as a browser's backing
implementation, web content that works with it is unlikely to work
with anything else. This is a serious interop problem, the root of
which is that there really isn't a useful core SQL standard. SQL92
is generally taken as a base, but is often extended or altered by
implementations. Even beyond the parser issues (which could be
resolved by defining a strict syntax to be used by Web Storage), the
underlying implementation details will affect results.
There is inherent challenge in embedding as potent a capability as
data access inside a platform since there is a lot of variation in its
own design and use. Still, the question in my mind is not whether an
unanchored reference to SQL is fine as much as whether SQL is the
right way (and for years to come, the only structured way) to think of
Web application's (locally persistent) data.
So, the only option is for the Web Storage portion of the spec to
state "do what SQLite does". This isn't specified in sufficient
detail anywhere to be able to reimplement it from the documents, so
it would be even worse — "do what this exact version of SQLite
does", because there are no guarantees that SQLite won't make any
incompatible changes. For example, a future SQLite 4 may introduce
some changes or some new syntax which wouldn't be supported by
earlier versions. Thus, it requires every single browser developer
to accept SQLite as part of their platform. This may not be
possible for any number of reasons, not the least of which is it
essentially means that every web browser is on the hook for
potential security issues within SQLite.
Instead of all of this, I think it's worth stepping back and
consider exactly what functionality web developers actually want.
Oracle certainly supports this endeavor to understand exactly what
kind of local storage capabilities are required.
It's certainly much easier to say "well, server developers are used
to working with SQL, so let's just put SQL into the client", but
it's certainly not ideal — most people working with SQL tend to end
up writing wrappers to map their database into a saner object API.
There is no end to how much Oracle and various other companies shield
developers using their platforms from using raw SQL. There are more
reasons for that than I can list here, but suffice it to say that the
Web Storage spec should consider techniques that are better matched to
the Web as a data access platform - i.e., in terms of URLs and HTTP
methods.
So, I would propose stepping back from Web Storage as written and
looking at the core pieces that we need to bring to web developers.
I believe that the solution needs to have a few characteristics.
First, it should be able to handle large data sets efficiently; in
particular, it should not require that the entire data set fit into
memory at one time. Second, it should be able to execute queries
over the entire dataset. Finally, it should integrate well with the
web, and in particular with JavaScript.
With these needs in mind, I think there are other options that
should be considered, even beyond a subset of SQL; for example, an
object-oriented database approach might serve those needs better. A
good prototype example of what such a system could look like is
jLINQ, which implements client-side querying on JavaScript objects
and arrays. As such, a basic implementation is simple; more complex
ones can have browser support for efficient indexing, triggers,
rapid serialization to and deserialization from disk, etc. An
implementation could even map all of this on top of an underlying
SQL engine. Another option is something like CouchDB. I was also
just pointed at Persevere, which looks quite cool; much in the same
way as CouchDB, the same API could be implemented both client-side
and server-side, for efficient online/offline switching.
I have been drawing attention towards the lack of adequate thinking
about synchronization before pushing a half-hearted data access
specification. I was all set to highlight this at the TPAC last year
in a lightning talk but could not deliver it due to a family emergency.
An approach such as one of these could well serve the web better
than just throwing a SQL dialect over the web content fence.
This is a conversation that I think is worth having, both to figure
out what could be done about the issues with directly exposing SQL/
SQLite, and also to step back and explore alternate approaches to
getting the same functionality in web developers' hands.
I am glad you are espousing this as it seems that no real debate has
taken place about this at all. I tried, in vain [3], to raise the
awareness about the lack of adequate vetting of choices about
structured storage and am glad to re-engage with everyone to go
through the choices to end up with a stable (and hopefully gracefully
evolving) API for local data access.
- Vlad
[1] http://www.opengroup.org/products/publications/catalog/c451.htm
[2] http://people.ok.ubc.ca/rlawrenc/teaching/304/Notes/jdbc_3_spec.pdf
[3] http://lists.w3.org/Archives/Public/public-webapps/2009JanMar/0121.html