Re: Futures and transactions

2013-04-09 Thread Joran Greef
The problem with IndexedDB transactions is when you need to start doing any 
kind of streaming, where there is the potential for the stream write buffer to 
fill up, e.g. syncing over the network:

1. Get references to objects within a collection within a transaction.
2. Compare these to objects over the network.
3. Start writing objects to the network, waiting for the network to drain 
(assuming web sockets) before writing more data.

While this is essentially a long-lived read transaction, this won't work with 
IDB.

Some have argued that the design goal was to avoid long-lived transactions, but 
there is a difference between long-lived read transactions and long-lived write 
transactions.

For MVCC transactions, which I think IDB was once supposed to be aiming for, 
there is by definition no problem with long running readers, since they do not 
block each other or writers, they simply read the database at a snapshot in 
time.

The browser is starting to support stream apis, and I think with that, we need 
transactions that can be "retained". That is, keep the same semantics as per 
IDB transactions, but with an additional method "retain(milliseconds)" that 
would keep the transaction alive for a certain amount of time.

Joran Greef



Re: Sandbox

2012-09-17 Thread Joran Greef
On 17 Sep 2012, at 2:33 PM, Florian Bösch  wrote:

> Security is a pretty serious concern if you're distributing apps without any 
> oversight to billions of users automatically upon a single link click.

You are conflating web apps (trusted, installed) with web pages (single link 
click).

> No TCP.
> Wrong, see websockets which upgrade to plain old TCP after the handshake.

No, WebSockets are not "plain old TCP".

> 
> No UDP.
> Coming with WebRTC in the form of unreliable data channels.

WebRTC is above UDP. It's not UDP. WebRTC is a massive conglomeration of 
protocols and codecs and opinions.

> No POSIX.
> Why would you need cross-OS posix standards and operating system shells when 
> you already have a browser which abstracts cross-OS APIs in its own fashion?

How do you fsync in a browser?

> Tim Berners-Lee raised this point first awhile back on Public Web Apps: 
> http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html
> I believe his point was subtly different. He was arguing for vendors to come 
> up with ways to solve the usecases he mentioned, not arguing to just blast 
> the OS at the JS developer and let the ensuing security armageddon sort 
> itself out.

No, not at all. Nowhere did he ask for browser vendors "to solve the use cases 
he mentioned".



Re: Sandbox

2012-09-17 Thread Joran Greef
Apps (native/web) need direct access to bare metal.

Browser vendors need to move away from the "we do all the thinking and 
designing and implementing" top-down model of innovation.

Browser vendors need to provide minimal core OS APIs and get out of the way and 
let open source grow around and do the rest.

For too long now the typical response to this kind of proposal has been "how do 
you propose solving the security problems?"

That is to say, we should not do any of this unless we can perfectly solve the 
security problems. As if they can be perfectly solved.

And so our most perfect solution has been to completely cripple web apps:

No TCP.

No UDP.

No POSIX.

No Hardware.

Tim Berners-Lee raised this point first awhile back on Public Web Apps: 
http://lists.w3.org/Archives/Public/public-webapps/2012JanMar/0464.html

As a user, I want to write a web app. I trust it. I want to give it UDP, TCP 
and POSIX anointing. I want it to use the resources of my machine to act on my 
behalf and assist me in my work. The browser won't let me. Why?



Drag & Drop Web Apps

2012-08-10 Thread Joran Greef
Given the advance of HTML 5, and in the interest of developing web apps with 
average functionality, would it now be possible to:

1. Drag files and folders into a web app?
2. Drag files and folders out of a web app?
3. Drag a spreadsheet out of a web app onto the icon of Excel in the dock and 
have it open in Excel?
4. Monitor that same spreadsheet's content (originally provided by the web app) 
for changes when the user edits it and presses CTRL+S?

Or is it only possible to drag things into a browser window but not back out 
and nothing else?

Can a user drag a piece of data into a browser window… and then drag it back 
out?

For example, a user may want to use a Contacts web app, and drag a contact out 
the browser window as a piece of vcard data and land this onto the Contacts app 
in the dock, which would then import the contact, all in a single mouse gesture?

Or is it not possible to provide that kind of user experience?

For example, a user may want to use a PDF web app, and transfer a piece of PDF 
data to the Preview app, but be forced to click a link to download the PDF, 
click the very small "Keep" button next to the "This type of file can harm your 
computer. Do you want to keep anyway?" warning, and then drag the PDF onto the 
Preview app, and then go to the Downloads folder to delete the "download". At 
least 5 mouse clicks and then a CMD+backspace to accomplish what (from the 
user's point of view at least) should have only taken one drag and drop?

And then this may be vendor specific, but if a user created a piece of PDF data 
and dragged it into the browser window in the first place, does it still make 
sense to warn them that "this type of file can harm your computer"?

The browser takes on too much responsibility for things it can't possibly 
reason about, and seeks not enough advice from the user where it could. It 
often seems that the browser is built to lecture the user, rather than the 
other way round. I use the browser everyday at work, and sometimes you have to 
ask yourself: who's serving who. Does the user serve the browser, or does the 
browser serve the user?



Non-persistent in-memory storage accessible by same domain tabs

2012-05-24 Thread Joran Greef
Web applications need a way to communicate between two same domain
tabs without polling LocalStorage and without hitting the disk.

It would be useful to have an in-memory get/set/compare_and_set hash
table exposed to scripts running same domain tabs, that is discarded
by the browser when those tabs are closed.

Use cases:

1. Coordinate replication between tabs for an offline app, i.e. one
tab takes responsibility for syncing a user's data to and from
IndexedDB.
2. Sign out from one tab triggers sign out from all other tabs.
3. If something like LevelDB were exposed directly to JS, one could
implement MVCC on top using the shared hash.
4. Library authors would be able to implement their own cross-tab postMessage.

It's difficult to implement these use cases with LocalStorage, without
a coarse resolution, and risky at that, due to the lack of compare and
set primitive in LocalStorage.



IndexedDB: Binary Keys

2012-05-21 Thread Joran Greef
IndexedDB supports binary values as per the structured clone algorithm
as implemented in Chrome and Firefox.

IndexedDB needs to support binary keys (ArrayBuffer, TypedArrays).

Many popular KV stores accept binary keys (BDB, Tokyo, LevelDB). The
Chrome implementation of IDB is already serializing keys to binary.

JS is moving more and more towards binary data across the board
(WebSockets, TypedArrays, FileSystemAPI). IDB is not quite there if it
does not support binary keys.

Binary keys are more efficient than Base 64 encoded keys, e.g. a 128
bit key in base 256 is 16 bytes, but 22 bytes in base 64.

Am working on a production system storing 3 million keys in IndexedDB.
In about 6 months it will be storing 60 million keys in IndexedDB.

Without support for binary keys, that's 330mb wasted storage
(60,000,000 * (22 - 16)) not to mention the wasted CPU overhead spent
Base64 encoding and decoding keys.



IndexedDB: Retrieving a slice of a record value.

2012-04-17 Thread Joran Greef
It would be great if there was a way to use IndexedDB to get just a
slice of a record value, not the entire value.

For example, when storing many large binary values, there may be
useful meta or header info at the start or end of each value, which
could be retrieved directly.

It would be a waste to have to store this data twice, or to read the
entire value.



Re: Installing Web Apps

2012-02-17 Thread Joran Greef
The problem is we're framing the discussion in terms of "installing" web apps.

We're answering the wrong question.

The real question is whether we want to start seeing powerful applications 
running in the browser.

If we do, then we'll figure out a way to get there. Be it "installing", 
"permissions", or letting apps use as much storage as they need, but just 
giving me a way to keep tabs on what they're using so I can uninstall them if I 
want. Or letting apps use as much bandwidth or CPU or whatever they need, but 
just giving me a way to keep tabs. Or if I'm really security conscious there 
could be a firewall to let me as user defend certain "system calls" or 
whitelist specific apps but only if I want.

But none of that is really the issue now. The issue now is that some are 
unimaginatively saying "what, browser in a browser?". It's the "nobody would 
ever want a personal computer" attitude and this needs to change so that the 
next unforeseen innovation can take place.

What do you want to build in the browser?

1. Dropbox (e.g. drag and drop files into the browser, click a link in the app 
to open them in native applications such as Excel, poll the file for changes 
from the browser and sync the chunks that changed)?
2. Web browser?
3. Proxyless POP and SMTP clients that don't waste server bandwidth and let 
users go direct?
4. Spotify client?
5. Skype client?

I want to be building all of the above.




Re: Enable Compression Of A Blob To .zip File

2011-11-30 Thread Joran Greef
It would be great to have a native binding to Zlib and Snappy exposed to 
Javascript in the browser. Zlib covers the expensive disk use-cases, Snappy 
covers the expensive CPU use-cases.

Also a native binding to basic crypto primitives, even if that means just SHA1 
to start, and even if the Node.js crypto api is copied verbatim.

TypedArrays are in current implementations are too slow to help with these, as 
far as I have tried.




Re: [IndexedDB] Transaction Auto-Commit

2011-08-04 Thread Joran Greef
> On 03 Aug 2011, at 7:33 PM, Jonas Sicking wrote:
> 
 "Note that reads are also blocked if the long-running transaction is a 
 READ_WRITE transaction."
>> 
>> Is it acceptable for a writer to block readers? What if one tab is 
>> downloading a gigabyte of user data (using a workload-configurable Merkle 
>> tree scheme), and another tab for the same application needs to show data?
> 
> This is exactly why transactions are auto-committing. We don't want
> someone to start a transaction, download a gigabyte of data, write it
> to the database, and only after commit the transaction. The
> auto-committing behavior forces you to download the data first, only
> then can you start a transaction to insert that data into the
> database.

If someone were syncing a gigabyte of data using a Merkle tree scheme they 
would probably not consider using a single transaction to persist the data nor 
would they find it necessary. Rather the point was made to emphasize that a 
write-intensive task may take place where many write transactions are required, 
one after the other. For instance, in the previous example, a gigabyte of data 
may likely consist of a million 1KB text objects, or 250,000 4KB objects, each 
of which may require a write transaction to update a few parts of the database. 
Any implementation of IDB where writers blocked readers would perform poorly in 
this case.

But all of this is orthogonal to the question of auto-commit. Are there other 
reasons in favor of auto-committing transactions? I'm not sure that library 
writers stand to gain from it, and it forces one to use other methods of 
concurrency control to match the semantics of server-side databases.

> IndexedDB allows MVCC in that it allows writers to start while there
> are still reading transactions running. Firefox currently isn't
> implementing this though since our underlying storage engine doesn't
> permit it.
> 
> IndexedDB does however not allow readers to start once a writing
> transaction has started. I thought that that was common behavior even
> for MVCC databases. Is that not the case? Is it more common that
> readers can start whenever and always just see the data that was
> committed by the time the reading transaction started?

If your database supports MVCC, then by definition there is no reason for 
writers to block readers.


Re: [IndexedDB] Transaction Auto-Commit

2011-08-02 Thread Joran Greef
I have been spending time on IDB lately and wanted to give feedback as to the 
transaction auto-commit interface:

I am trying to write a wrapper around IDB to match the interface of my 
server-side data store, which allows you to:

1. Request a read or write transaction asynchronously.
2. GET, MGET, EXISTS or SET against that transaction asynchronously.
3. COMMIT when done to release and commit the transaction or ABORT to release 
but not commit the transaction.
4. Have many concurrent read transactions.
5. Have one write transaction at a time (without blocking readers - MVCC).

As you can imagine, IDB does not support this, since it forces you to issue 
requests against an IDB transaction synchronously (from the viewpoint of the 
rest of the application). In other words, once you have obtained an IDB 
transaction, it is automatically released when your code returns control so 
there is no way to do something such as get a value from IDB, do something 
taking a millisecond or two such as reading from WebSQL and then writing the a 
value back to IDB, all within the same IDB transaction. You'd have to use 
multiple IDB transactions which would be fine if the user only had your 
application open in one tab, but not in multiple tabs.

To get around this, I thought one could use optimistic concurrency control to 
write a nonce to IDB whenever a write transaction is requested from my IDB 
wrapper, use separate IDB transactions, and when writing, generate a conflict 
error if the nonce has changed.

The problem is it's significantly slower to do each GET, MGET, EXISTS, or SET 
on a separate IDB transaction. I think it works out to an extra millisecond or 
two overhead. If you're doing 10 or 20 operations, however small, that's an 
extra 10-20ms wasted overhead.

So then I thought I would request an IDB transaction when a transaction is 
requested from my wrapper, and then check the active flag when it's needed, and 
if active is set to false then re-request the transaction. The trouble is that 
the active flag does not appear to be exposed to JS as far as I can see.

Then I tried using a try/catch whenever an object store is requested from an 
IDB transaction so as to reset the IDB transaction if it's expired. Chrome 
returns "NOT_ALLOWED_ERR" instead of "...INACTIVE…" as it should. But I also 
found that the UA sometimes updates the active flag when my code has not 
returned control so there's a race condition somewhere in there I think, which 
may make this trick impossible. It works fine if I schedule a delay between 
operations of 10ms or more. When it gets down to 1ms though, it starts failing 
every now and then.

I tried the same thing using transaction.oncomplete to set my own active flag, 
but this did not work either.

Throughout, IDB in Chrome performs at least an order of magnitude slower than 
the same code running against an in-house mvcc database on the same machine. 
Firefox is significantly slower than Chrome. Would anyone know what the LevelDB 
benchmark would look like if through IDB on Chrome?

>> "Note that reads are also blocked if the long-running transaction is a 
>> READ_WRITE transaction."

Is it acceptable for a writer to block readers? What if one tab is downloading 
a gigabyte of user data (using a workload-configurable Merkle tree scheme), and 
another tab for the same application needs to show data?

On 25 Jul 2011, at 8:38 PM, Jonas Sicking wrote:

On Mon, Jul 25, 2011 at 6:28 AM, Joran Greef  wrote:
> Regarding transactions in the IndexedDB specification (3.1.7 Transaction):
> 
>>> "Once a transaction no longer can become active, and if the transaction 
>>> hasn't been aborted, the implementation must automatically attempt to 
>>> commit it. This usually happens after all requests placed against the 
>>> transaction has been executed and their returned results handled, but no 
>>> new requests has been placed against the transaction."
> 
> What does "no longer can become active" mean?

Well.. generally it's exactly the text you are quoting. "after all
requests placed against the transaction has been executed and their
returned results handled, but no new requests has been placed against
the transaction".

If you want the full exact definition, look for all the places that
references the "active" flag for transactions.

>>> "Authors can still cause transactions to run for a long time, however this 
>>> is generally not a usage pattern which is recommended and can lead to bad 
>>> user experience in some implementations."
> 
> How exactly can an author still cause a transaction to span several 
> asynchronous events?

All transactions span all the asynchronously firing events that are
fired against the requests placed against the transaction. So as lon

[IndexedDB] Transaction Auto-Commit

2011-07-25 Thread Joran Greef
Regarding transactions in the IndexedDB specification (3.1.7 Transaction):

>> "Once a transaction no longer can become active, and if the transaction 
>> hasn't been aborted, the implementation must automatically attempt to commit 
>> it. This usually happens after all requests placed against the transaction 
>> has been executed and their returned results handled, but no new requests 
>> has been placed against the transaction."

What does "no longer can become active" mean?

>> "Authors can still cause transactions to run for a long time, however this 
>> is generally not a usage pattern which is recommended and can lead to bad 
>> user experience in some implementations."

How exactly can an author still cause a transaction to span several 
asynchronous events? For example, start a transaction, read a value, use that 
value to do something asynchronous outside of IDB (perhaps for a millisecond or 
two or up to a second), and then write the result of that back to the 
transaction?

If it is indeed possible for an author to prolong a transaction, does that mean 
the UA is implementing a delay to give transactions with asynchronous 
dependencies the chance to add requests?

Surely an explicit commit in this case would be preferable for performance 
reasons (with a UA timeout protecting against developer forgetfulness)? Then 
again, if a developer forgot an explicit commit, it would only block writes for 
his particular application.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:42 PM, Boris Zbarsky wrote:

> On 4/6/11 10:30 AM, Joran Greef wrote:
>> If Mozilla enjoys using the latest version of SQLite (and I assume they are 
>> not planning on replacing internal SQLite embeddings with IndexedDB - not at 
>> this stage at least), then web developers deserve the latest version.
> 
> This is not obvious a priori, for what it's worth.

The point was made with reference to Mozilla expecting web developers to run 
production client code on IndexedDB, when Mozilla themselves run production 
code on SQLite.

Boris, Jonas and Shaun, we could talk round and round in circles. It seems 
you're not too concerned by any of the performance and design problems re: 
indexedDB that I have raised. You ask for "proposals" but it's clear you're not 
sold on these issues. If you were, I am sure you would be among the first to 
provide them.

Do you have real-world experience developing web-based applications, targeting 
mobile and desktop, with offline support for storing, indexing, migrating and 
synchronizing several million objects? Or are we all arguing in the realm of 
conjecture ("it should be able to") without having encountered any of these 
issues ourselves, or having any basis for our claims?


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:24 PM, Tab Atkins Jr. wrote:

> When a security bug is encountered, either the browsers update to a
> new version of sqlite (if it's already been fixed), thus potentially
> breaking sites, or they patch sqlite and then upgrade to the patched
> version, thus potentially breaking sites, or they fork sqlite and
> patch the error only in their forked version, still potentially
> breaking sites but also forking the project.  The only thing that is
> *not* a valid possibility is the browsers staying on the single fixed
> version, thus continuing to expose their users to the security bug.
> 
> ~TJ

Browser vendors are moving to shorter and shorter release cycles. People have 
stopped viewing these things through the "IE6-here-forever" lens. Browsers are 
starting to update themselves automatically, even nightly. If a security issue 
were to be found, it would be highly unlikely that its patch would break any 
SQL interface of SQLite.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 7:14 PM, Shawn Wilsher wrote:

> On 4/6/2011 9:44 AM, Joran Greef wrote:
>> We only need one fixed version of SQLite to be shipped across Chrome, 
>> Safari, Opera, Firefox and IE. That in itself would represent a tremendous 
>> goal for IndexedDB to target and to try and achieve. When it actually does, 
>> and surpasses the fixed version of SQLite, those developers requiring the 
>> raw performance and reliability of SQLite could then switch over.
> I don't believe any browser vendor would be interested in shipping two 
> different version of SQLite (one for internal use, and one for the web).  I 
> can say, with certainty, that Mozilla is not.
> 
> Cheers,
> 
> Shawn

If Mozilla enjoys using the latest version of SQLite (and I assume they are not 
planning on replacing internal SQLite embeddings with IndexedDB - not at this 
stage at least), then web developers deserve the latest version.

Ship the latest version of SQLite (even with the -moz prefix). Developers 
targeting "HTML 5" are used to API changes, waiting on browsers and trying to 
reason about broken implementations. The library writers will quickly grow over 
any SQLite version changes should they even ever arise.

Would you run the Mozilla production database on any browser's implementation 
of IndexedDB? How can you expect developers to run their production client code 
on IndexedDB? It's simply not ready and will not be for at least a year or two 
or three. How likely is it that SQLite (given it's history) will remove the 
SELECT, INSERT, UPDATE, DELETE statements before then?


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 6:49 PM, Shawn Wilsher wrote:

> On 4/4/2011 10:18 AM, Joran Greef wrote:
>> How would you create an index on an existing object store in IndexedDB 
>> containing more than 50,000 objects on an iPad, without incurring any object 
>> deserialization/serialization overhead, without being an order of magnitude 
>> slower than SQLite, and without bringing the iPad to its knees? If you can 
>> do it with even one IndexedDB implementation out there then kudos and hats 
>> off to you. :)
> You keep bringing this point up, but only a naive implementation of IndexedDB 
> would bring a device to it's knees (or a poorly implemented thread scheduler, 
> which I don't expect the iPad to have).  The API is asynchronous, which means 
> it doesn't need to (nor should it) happen on any thread that the UI is being 
> drawn on.
> 
> You still have a point about it possibly taking longer, but even then, that 
> will be implementation dependent.
> 
> Cheers,
> 
> Shawn
> 

I bring up the iPad example because I had experience with a LocalStorage 
implementation (I think it was Safari) loading the contents of LocalStorage 
into memory synchronously on first access, blocking the UI thread. I am 
probably wrong on this one but I think I remember reading on Web Apps that this 
was one of the motivations behind limiting LocalStorage quota to around 10mb. 
At the time I was one of those who believed that LocalStorage would support 
storage of at least 10 GB as a matter of course. I hope you can understand my 
slight distrust of subsequent storage APIs (other than those of proven track 
record) in this light.

It would still take longer (easily 30-50 seconds per 50,000 objects more than 
an opaque key-value store built on SQLite) even if the IndexedDB implementation 
was asynchronous. The developer would also have a tough time reasoning about 
when index migrations would be finished, since IndexedDB offers no control over 
the migration process and provides no way to modify index memberships directly. 
For those that care about these things, IndexedDB does not provide sufficient 
low-level storage primitives.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 6:26 PM, Shawn Wilsher wrote:

> On 4/4/2011 8:07 AM, Joran Greef wrote:
>> SQLite has a fantastic track record of maintaining backwards compatibility.
> Sort of.  They didn't between SQLite 2 and SQLite 3.  There also have been 
> some (albeit minor) backwards compatibility issues with SQLite 3.x releases.  
> The most serious of which deal with performance characteristics changing 
> because they changed how the optimizer works.
> 
> These type of things are acceptable to deal with in browser code because you 
> can change your code unlike on the web (unless you want to have different 
> code for each browser, and then each browser version).  It's that, or 
> browsers can ship one version of SQLite for all eternity.
> 
> Cheers,
> 
> Shawn

We only need one fixed version of SQLite to be shipped across Chrome, Safari, 
Opera, Firefox and IE. That in itself would represent a tremendous goal for 
IndexedDB to target and to try and achieve. When it actually does, and 
surpasses the fixed version of SQLite, those developers requiring the raw 
performance and reliability of SQLite could then switch over.

It is too soon to deprecate SQLite in the browser. IndexedDB is only getting 
started. It is beta and nowhere near the performance and test coverage of 
SQLite.

A fixed version of SQLite across browsers would be helpful at this stage. If 
Mozilla could lead the way on this it would be fantastic. Perhaps that would 
satisfy all parties on these issues?

It would also give IndexedDB implementors sufficient incentive to optimize 
their implementations, and developers the safety net of SQLite until such time 
as they do.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-06 Thread Joran Greef
On 06 Apr 2011, at 8:56 AM, Jonas Sicking wrote:
> 
>> 1. Treat object values as opaque (necessary to avoid 
>> deserialization/serialization overhead, this is mandatory for storing 
>> anything over 50,000 objects on a device like an iPad or iPhone).
> 
> Please explain this in more detail as I have no idea what you mean by
> "treat as opaque". Are you saying that we should not allow storing
> objects but rather only allow storing strings? If not, surely any type
> of object needs to be serialized upon storage. If you are simply
> suggesting forbidding storing objects, then this doesn't seem like a
> blocker. Simply store a string and we won't serialize anything.
> 
> I'm also interested in what you are basing the claim on overhead on.
> Have you profiled a IndexedDB implementation? If so, which? And if
> Firefox, did you do so before or after we switched away from using a
> JSON serializer?

Yes, it must accept a string value and store that directly. The "opaque" 
terminology comes from some of the BDB papers.

I tested both Chrome and Firefox implementations 3 weeks ago. Both were an 
order of magnitude slower than using SQLite as a key-value store (storing 
strings as blobs). You can use whatever serializer you like, but it will always 
be slower than avoiding serialization completely (this is possible by the way, 
my application does not deserialize objects received from the server before 
storing them). Even if your serializer takes only 1ms per serialize call, 
that's 50 seconds for 50,000 objects. For my use-case that is unacceptable, 
considering that SQLite is available in Chrome and Safari. I will encourage my 
users to use those browsers and continue developing for SQLite until IndexedDB 
resolves this issue.

How would you support indices (see below) if you say "Simply store a string and 
we won't serialize anything."?

>> 2. Enable indices to be modified at time of putting/deleting objects (index 
>> references provided by application at time of putObject/deleteObject call).
> 
> I don't believe that this is a blocker. You can simply modify the
> object you are storing to add properties and then index of these
> properties. What you are suggesting only has the advantage that it
> allows storing objects without modifying them. While that can be
> important, it isn't a blocker to at least creating a prototype
> implementation.

How would you index objects passed to putObject as a string (see above)? Plus 
you have the unnecessary object creation overhead. How fast is it to create 
50,000 objects on an iPad? What would that do to the GC and why would you want 
to do that if you don't need to?

I would like to see Mozilla "do as they say": re-implement a SQLite on 
IndexedDB themselves, that is just as fast and memory efficient as the 
original, before suggesting that this is possible, that the web therefore be 
deprived of SQLite. Furthermore, that Mozilla stop using SQLite for all 
internal use, and rely solely on IndexedDB instead. That is essentially the 
request that Mozilla are making of web developers today.

It's clear that scores of web developers are upset with the decision to 
deprecate WebSQL. It's not clear that IndexedDB provides anything close in 
terms of actual raw performance. This surprised me greatly since I assumed 
IndexedDB would naturally leverage established indexed key-value ideas (for 
instance to quote BDB - "In Berkeley DB, the key and value in a record are 
opaque to Berkeley DB") which would give it an edge over SQLite.

Pragmatically speaking, would it really be so hard for Mozilla to join Chrome, 
Safari and Opera and provide an embedding of SQLite along with IndexedDB?

If IndexedDB is as good as you suggest it is, then I am sure developers will 
flock to it, and you won't need to speculate as to whether or not SQLite will 
take over the web and then break backwards compatibility (despite a stated 
objective and proven track record of not doing so). And if SQLite did ever 
break backwards compatibility then developers would have IndexedDB. And if 
applications relying on SQLite are abandoned by their authors and broken as a 
result of not upgrading, then arguably those applications should be deprecated 
and not SQLite.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-05 Thread Joran Greef
On 06 Apr 2011, at 2:53 AM, Pablo Castro wrote:

> The goal of IndexedDB has always been to enable things like RelationalDB and 
> CouchDB to be built on top, while maintaining a reasonable level of 
> functionality for those that wanted to use it directly. I really like the 
> idea of thinking of RelationalDB as something that's built as a library on 
> top of IndexedDB. Are there specific tweaks we can make to IndexedDB so it 
> can be a good lower-layer for RelationalDB, such that RelationalDB could be 
> built as a pure JavaScript library?
> 
> Thanks
> -pablo

1. Treat object values as opaque (necessary to avoid 
deserialization/serialization overhead, this is mandatory for storing anything 
over 50,000 objects on a device like an iPad or iPhone).
2. Enable indices to be modified at time of putting/deleting objects (index 
references provided by application at time of putObject/deleteObject call).
3. Provide a simpler, more powerful locking mechanism, opaque to IndexedDB, to 
provide finer-grained application-specific locking (i.e. have we just entered 
into a sync process with the master database).

If I may say so, it does seem odd that some would advocate the difficulties of 
speccing merely the interface of something like SQLite, and then advise others 
to suggest re-implementing it entirely. If there was a specific BTree API in 
the browser and a powerful asynchronous sLocalStorage mechanism this might be 
something for the brave, but IndexedDB is a little too tightly coupled to it's 
own interface agenda at the moment to make this goal possible.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 7:28 PM, Mikeal Rogers wrote:

> the biggest bottleneck here in the current implementation would be the 
> transaction overhead on a database this size, which is because of performance 
> problems in sqlite which is underlying the implementation. sqlite can't fix 
> this, it's currently the problem. the object serialization is not a huge 
> performance issue, performance issues in databases are almost always do to IO 
> or transaction locks.


You do not have me convinced. I have tried these things (and was once an avid 
CouchDB user), and one of the first things I learnt was that object 
deserialization/serialization incurs a massive performance penalty. Just 
measure the time it takes to JSON.parse/JSON.stringify 50,000 objects on an 
iPad and then implement an indexing scheme that avoids this overhead and 
compare the performance times.

> you should most definitely be able build sqlite on top of IDB, there would be 
> a performance penalty of course, which we can address, but you should be able 
> to do it. if you can't then we need to extend the specification.

Trust me on this Mikeal, you cannot build SQLite on top of IDB, the primitives 
are simply not there. I have been asking for the specification to be extended 
(namely with regards to schema-less index operation, set operations on indices, 
and opaque objects) and one or two of the contributors have expressed interest 
but Mozilla do not appear to be enthralled.

Read up on SQLite if you have not yet had the chance to understand the mammoth 
collective effort it represents: http://www.sqlite.org (it's a stellar project)


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 6:04 PM, Tab Atkins Jr. wrote:

> It's new.

Do you think it would be wise then to advocate doing away with SQLite before 
IndexedDB has had a chance to prove itself? Surely two competing APIs would be 
the fastest way to bring IndexedDB up to speed?

> Ironically, the poor performance is because it's using sqlite as a
> backing-store in the current implementation.  That's being fixed by
> replacing sqlite.

Yes I am aware of this. There are some design flaws in IndexedDB. For instance, 
it does not regard objects as opaque (as would a typical key-value store), 
which means that creating an index on an existing object store would require 
deserializing/serializing every object therein. Doing that for 50,000 objects 
on an iPad would be breathtaking.

I have written object stores on top of SQLite and they are already an order of 
magnitude faster than IndexedDB with a more powerful and memory efficient API 
to boot.

> Kinda the point, in that the power/complexity of SQL confuses a huge
> number of develoeprs, who end up coding something which doesn't
> actually use the relational model in any significant way, but still
> pays the cost of it in syntax.

I was not referring to SQL but to the underlying primitives exposed through the 
SQL interface. For example, set operations on indices, or the ability to index 
objects with array values.



Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 6:10 PM, Mikeal Rogers wrote:

> it's not very hard to write the abstraction you're talking about on top of 
> IndexedDB, and until you do it i'm going to have a hard time taking you 
> seriously because it's clearly doable.


You assume I have not written the abstraction I am talking about on top of 
IndexedDB?

> the constructs in IndexedDB are pretty low level but sufficient if you know 
> how to implement databases. performance is definitely an issue, but making 
> these constructs faster would be much easier than trying to tweak an off the 
> shelf SQL implementation to your use case.


How exactly would you make a schema-enforcing interface faster than a stateless 
interface?

How would you implement application-managed indices on top of IndexedDB without 
being slower than SQLite?

How would you implement set operations on indices in IndexedDB without being 
slower or less memory efficient than SQLite?

How would you create an index on an existing object store in IndexedDB 
containing more than 50,000 objects on an iPad, without incurring any object 
deserialization/serialization overhead, without being an order of magnitude 
slower than SQLite, and without bringing the iPad to its knees? If you can do 
it with even one IndexedDB implementation out there then kudos and hats off to 
you. :)

I understand your point of view. I once thought the same. You would think that 
IndexedDB would be more than satisfactory for these things. The question is 
whether IndexedDB provides adequate and performant database primitives, to the 
same degree as SQLite (and of course SQL is merely an interface to database 
storage primitives, I do not recalling saying otherwise).

You can build IndexedDB on top of SQLite (as some browsers are indeed doing), 
but you cannot build SQLite on IndexedDB.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 5:26 PM, Keean Schupke wrote:

> This is ignoring the possibility that something like RelationalDB could be 
> used, where a well defined common subset of SQL can be used (and I use 
> well-defined in the formal sense). This would allow a relatively thin wrapper 
> on top of most SQL implementations and would allow SQLite (or BDB) to be used 
> as the backend.

Yes, if an implementation of RelationalDB arrives which is solid and fast with 
support for set operations that would be great. The important thing is that we 
have two competing APIs (and preferably a strong API with a great track record).


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-04 Thread Joran Greef
On 04 Apr 2011, at 4:39 PM, Jonas Sicking wrote:

> Hence it would still be the case that we would be relying on the
> SQLite developers to maintain a stable SQL interpretation...

SQLite has a fantastic track record of maintaining backwards compatibility.

IndexedDB has as yet no track record, no consistent implementations, no 
widespread deployment, only measurably poor performance and a lukewarm indexing 
and querying API.

If anything it's the other way round. You have yet to convince developers that 
IndexedDB will be faster, more stable, more powerful, more memory efficient 
than SQLite and with better test coverage at that.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-02 Thread Joran Greef
> I am incredibly uncomfortable with the idea of putting the
> responsibility of the health of the web in the hands of one project.
> In fact, one of the main reasons I started working at Mozilla was to
> prevent this.
> 
> / Jonas

I agree with you. All the more reason to support both WebSQL and IndexedDB. It 
is not a case of either/or. It would be healthy to have competing APIs.


Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-04-02 Thread Joran Greef
On Sat, Apr 2, 2011 at 00:42:40, Glenn Maynard wrote:

> You can certainly ask if they're interested in doing so, not for "our"
> benefit (whoever "our" means), but for the benefit of the Web as a whole,
> and there's nothing at all rude in asking.  I'd say the opposite: it's rude
> to assume they wouldn't be interested, rather than asking and letting them
> come to their own decision.  (I don't know where the notion of "forcing"
> them to do anything came from.)

I have been reading up more on the history of SQLite. It is a stellar 
implementation, just to highlight a few points:

1. "Most of the SQLite source code is devoted purely to testing and 
verification. An automated test suite runs millions and millions of test cases 
involving hundreds of millions of individual SQL statements and achieves 100% 
branch test coverage."

2. "SQLite can also be made to run in minimal stack space (4KiB) and very 
little heap (100KiB), making SQLite a popular database engine choice on memory 
constrained gadgets such as cellphones, PDAs, and MP3 players."

3. "Faster than popular client/server database engines for most common 
operations."

4. "Supports terabyte-sized databases and gigabyte-sized strings and blobs."

5. "The developers continue to expand the capabilities of SQLite and enhance 
its reliability and performance while maintaining backwards compatibility with 
the published interface spec, SQL syntax, and database file format."

It is easier to build a performant IndexedDB on SQLite than to build a 
performant SQLite on IndexedDB. Maybe that is something to think about. 
Developers need working database primitives, more than they need convenience.

There may be conjectural reasons for Mozilla not implementing WebSQL, but the 
track history of SQLite is hard to ignore. Mozilla is already embedding SQLite 
for other uses, and appears to be a sponsor of the project.

SQLite may not be a specification in "our" sense of the word, but in a Web 
sense of the word, it is so widely deployed already that it would be hard not 
to call it a standard.



Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 10:07 PM, Shawn Wilsher wrote:

> On 3/31/2011 11:47 AM, Joran Greef wrote:
> Let those who introduced these design flaws be among the first to take 
> responsibility and fix them.
> You aren't being constructive, and that's a surefire way to be ignored.  You 
> have yet to convince the working group that these are "design flaws" in the 
> first place.
> 
> /sdwilsh

Agreed. I am actively using the API with real-world data and I am providing 
feedback. You are welcome to use it or not. It is not for me to convince 
anyone. As I said, if people think there is a problem, let those who introduced 
it fix it.

Joran Greef


Re: Mail List Etiquette [Was: WebSQL] Any future plans, or has IndexedDB replaced WebSQL?]

2011-03-31 Thread Joran Greef
Thank you Art.

To clarify, I have heard from a contributor to the specification in question 
who referred to LocalStorage himself as "little more than a toy", expressing 
his frustrations at the specification. It is well known that most LocalStorage 
implementations do not support more than 10mb, some load the entire contents 
into memory synchronously on first access, and there were some issues around 
locking that were not addressed as far as I recall. LocalStorage does not work 
as advertised. Many developers, including myself, got excited, spent hours with 
it, only to see these issues left unresolved. It would be true to say that most 
LocalStorage implementations are "crippled" in this sense. No one need be 
offended since specification and implementation are two separate things. I do 
wish however, that the specification would have addressed large quota support, 
and encouraged certain implementation practices, and in this sense I feel that 
not enough was done. The same with WebSQL. And recently I learned that IDB 
prevents applications from managing indices? These things are disappointing to 
us developers. I think we have a right to be critical on these issues where 
criticism is due. If the specification is inadequate, or burdened by politics, 
we should be free to say so (respectfully and professionally of course, but 
also honestly and directly and with the right measure of urgnency), without 
fear of offending anyone or being policed for it.

Joran Greef

On 31 Mar 2011, at 9:37 PM, Arthur Barstow wrote:

>> This is painful to read.  WebSQL development died because SQLite, the most 
>> widely-deployed database software in the world, was too good?  That sounds 
>> like a catastrophic failure of the W3C process.
>> 
>> -- 
>> Glenn Maynard
> Hear.
> 
> I am starting to think that Mozilla will step up and provide an embedding of 
> SQLite, even if it has to only think of it as such. It will have to.
> 
> People would rather use a working database than something crippled albeit 
> "specced" (see LocalStorage or IndexedDB).
> 
> It was things like XHR in all their unspecced glory that brought the web to 
> where it is today.

Joran - as one of the moderators of public-webapps, I find your comments above 
offensive to those that work on the specs you mention.

All - this is a reminder that all e-mails on this list are expected to be 
respectful and professional.

Please see the following for more information about the etiquette and usage of 
this list:

  http://lists.w3.org/Archives/Public/public-webapps/2009JulSep/1216.html

-Regards, Art Barstow






Re: [WebSQL] Any future plans, or has IndexedDB replaced WebSQL?

2011-03-31 Thread Joran Greef
> This is painful to read.  WebSQL development died because SQLite, the most 
> widely-deployed database software in the world, was too good?  That sounds 
> like a catastrophic failure of the W3C process.
> 
> -- 
> Glenn Maynard

Hear.

I am starting to think that Mozilla will step up and provide an embedding of 
SQLite, even if it has to only think of it as such. It will have to.

People would rather use a working database than something crippled albeit 
"specced" (see LocalStorage or IndexedDB).

It was things like XHR in all their unspecced glory that brought the web to 
where it is today.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 7:41 PM, Jonas Sicking wrote:

> So pretty please, with sugar on top, please come up with a proposal
> for the full API rather than bits and pieces.

Let those who introduced these design flaws be among the first to take 
responsibility and fix them.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 7:27 PM, Jeremy Orlow wrote:

>> 1. Provide the application with a first-class means to manage indexes at 
>> time of putting/deleting objects.
> 
> I'm OK with doing this for v1 if the others are.  It doesn't seem like that 
> big of an addition and it would give a decent amount of additional 
> flexibility.

Thanks Jeremy that would be great.

>> (reduces serialization/deserialization overhead where application already 
>> has the object as a string)
> 
> I'm not sure why you think this would reduce overhead.

How long would it take an iPad to JSON deserialize/serialize 500 / 5,000 / 
50,000 / 500,000 / 5,000,000 2KB objects? That's a reasonable device and those 
are reasonable workloads. In it's present state, IndexedDB needs to do this 
every time setVersion is called with a createIndex in there... you see the 
problem is there's no way for the application to control this. The application 
would arguably be able to find better ways of migrating indexes than using key 
paths which necessitate deserialization/serialization to be performed on the 
client. For instance, you could use batch jobs on the server to do this on 
behalf of clients, and this would make sense especially where many 
clients/devices share the same objects. With IndexedDB this is not possible. 
With pure storage primitives it would have been possible. This is just one 
use-case, and for every one of these there will be plenty more.

> Like I said above, although I think we should make it possible to operate 
> more statelessly, I don't see a reason we need to remove stuff like this. 
> Some users will find it more convenient to work this way.

Agreed on both counts. It is clearly too late to remove it now. But it may be a 
good idea in future to keep the focus on providing low-level primitives rather 
than convenience features, since the latter often get in the way of the former.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 12:52 PM, Keean Schupke wrote:

> I totally agree with everything so far...
> 
>> 3. This requires an adjustment to the putObject and deleteObject interfaces 
>> (see previous threads).
> 
> I disagree that a simple API change is the answer. The problem is 
> architectural, not just a superficial API issue.

Yes, for IndexedDB to be stateless with respect to application schema, one 
would need to:

1. Provide the application with a first-class means to manage indexes at time 
of putting/deleting objects.
2. Treat objects as opaque (remove key path, structured clone mechanisms, 
application must provide an id and JSON value to put/delete calls, reduces 
serialization/deserialization overhead where application already has the object 
as a string).
3. Remove setVersion (redundant, application migrates objects and indexes using 
transactions as it needs to).
4. Remove createIndex.

This would rip so much from the spec as to reduce it to a bunch of tatters, 
defining nothing more than an interface for index/key/value primitives in terms 
of well-established interfaces.

Essentially, we need LocalStorage with asynchronous IO (based on Node's 
callback style), large quota support, and a BTree API. Failing that, a decent 
FileSystem API on which to build these.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 9:34 AM, Jeremy Orlow wrote:

> We have made an effort to understand other "contributions to the field".
> 
> I'm not convinced that these are "essential database concepts" and having 
> personally spent quite some time working with the API in JS and implementing 
> it, I feel pretty confident that what we have for v1 is pretty solid.  There 
> are definitely some things I wouldn't mind re-visiting or looking at closer, 
> possibly even for v1, but they all seem reasonable to study further for v2 as 
> well.
> 
> We've spent a lot of time over the last year and a half talking about 
> IndexedDB.  But now it's shipping in Firefox 4 and soon Chrome 11.  So 
> realistically v1 is not going to change much unless we are convinced that 
> what's there is fundamentally broken.
> 
> We intentionally limited the scope of v1, which is why we know there'll be a 
> v2.  We can't solve all the problems at once, and the difficulty of speccing 
> something is typically exponential to the size of the API.
> 
> Maybe a constructive way to discuss this would be to look at what use cases 
> will be difficult or impossible to achieve with the current design?

Application-managed indices for starters. I would consider that to be essential 
when designing indexed key/value stores, and I would consider that to be the 
contribution made by almost every other indexed key/value store to date. If we 
have to use IDB the way FriendFeed used MySQL to achieve application-managed 
indices then I would argue that the API is in fact "fundamentally broken" and 
we would be better off with an embedding of SQLite by Mozilla.

Regarding "the difficulty of speccing something is typically exponential to the 
size of the API", if people want to build a Rube Goldberg device then they must 
deal with the spec issues of that.

If we were provided with the primitives for an indexed key/value store with 
application-managed indices (as Nikunj suggested at the time), we would have 
been well out of the starting blocks by now, and issues such as "computed 
indexes", "indexing array values" etc. would have been non-issues.

Summary:

1. There's a problem.
2. It can still be fixed with a minimum of fuss.
3. This requires an adjustment to the putObject and deleteObject interfaces 
(see previous threads).


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 9:53 AM, Jonas Sicking wrote:

> I previously have asked for a detailed proposal, but so far you have
> not supplied one but instead keep referring to other unnamed database
> APIs.

I have already provided an adequate interface proposal for putObject and 
deleteObject.

I have already referenced at least Redis and Tokyo Cabinet as examples of 
"stateless" database interfaces, on numerous occasions.

> For example, you've asked for callbacks to
> implement collations, but what do we do if those callbacks don't
> return consistent results?

I have not once asked for callbacks, let alone callbacks to implement 
collations. You have jumped to this conclusion from my previous post, and 
missed the point of it entirely.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-31 Thread Joran Greef
On 31 Mar 2011, at 1:01 AM, Jonas Sicking wrote:

> Anyhow, I do think that the idea of passing in index values at the
> same time as a entry is created/modified is an interesting idea. And I
> have said so in the past on this list. It's definitely something we
> should consider for v2.

> Oh, and if we did this, I wouldn't really know how to support things
> like collations. Neither if you did collations using built in sets of
> locales (like in Pablo's recent proposal), nor if you used some sort
> of callback to do collation.
> 
> / Jonas

That's fine. You don't need to figure it out. Just look at how stateless 
databases have done it (or not done it) and do likewise.

I submit to you that there is inadequate understanding of the concerns raised, 
hence the lack of urgency in trying to address them. That there is even a need 
for a "V2" is symptomatic of this.

It may be a good idea to start looking at these things not as "interesting 
ideas" but as essential database concepts.

If someone were trying to build some kind of transactional indexed key value 
store for the web, and they wanted to do a truly great job of it, they would 
certainly want to learn everything they could from databases that have made 
contributions to the field.


Re: [IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-26 Thread Joran Greef
> On 26 Mar 2011, at 10:14 AM, Nikunj Mehta wrote:
> 
> What is the minimum that can be in IDB? I am guessing the following:
> 
> 1. Sorted key-opaque value transactional store
> 2. Lookup of keys by values (or parts thereof)

Yes, this is what we need. In programmer speak: objects (opaque strings), sets 
(hash indexes), sorted sets (range indexes).

> I know of no efficient way of doing callbacks with JS. Moreover, avoiding 
> indices completely seems to miss the point.

Callbacks are unnecessary. This is what you would want to do as a developer 
using the current form of IDB:

objectStore.putObject({ name: "Joran", emails: ["jo...@gmail.com", 
"jo...@ronomon.com"] }, { id: 'arbitraryObjectIdProvidedByTheApplication', 
indexes: ["emails=jo...@gmail.com", "emails=jo...@ronomon.com", "name=Joran"] 
});

IDB would then store the user object using the id provided by the application, 
and make sure it's referenced by this id in the "emails=jo...@gmail.com", 
"emails=jo...@ronomon.com", "name=Joran" index references provided (creating 
these indexes along the way if need be). The application is responsible for 
passing in the extra "id" and "indexes" options to putObject.

Supporting range indexes would be a question of expanding the above to let the 
developer pass in a sort score along with the index reference.

> Next, originally, I also had floated the idea of application managed indices, 
> but implementors thought of it as cruft.

I can understand how application managed indices would lead to less work on the 
part of the spec committee. There seems to be some perverse human 
characteristic that likes to make easy things difficult. Ships will sail around 
the world but the Flat Earth Society will flourish.

> I, for one, am not enamored by key paths. However, I am also morbidly aware 
> of the perils in JS land when using callback like mechanisms. Certainly, I 
> would like to hear from developers like you how you find IDB if you were to 
> not use any createIndex at all. Or at least that you would like to manage 
> your own indices.

I am begging to be able to manage my indices. I know my data. I do not want to 
use any createIndex to declare indexes in advance of when I may or may not use 
them. What advantage would that give me? I want to create/update indexes only 
when I put or delete objects and I want to have control over which indexes to 
update accordingly. With one small change to the putObject and deleteObject 
interfaces, in the form of the "indexes" option, we can make that possible.

We need these primitives in IDB: opaque strings, sets, sorted sets. Ideally, 
IDB need simply store these things and provide the standard interfaces (see 
Redis) to them along with a transactional mechanism. That's the perfect 
low-level API on which to build almost any database wrapper.


[IndexedDB] Design Flaws: Not Stateless, Not Treating Objects As Opaque

2011-03-20 Thread Joran Greef

> On 20 Mar 2011, at 4:54 AM, Jonas Sicking wrote:
> 
> I don't understand what you are saying about application state though,
> so please do start that as a separate thread.

At present, there's no way for an application to tell IDB what indexes to 
modify w.r.t. an object at the exact moment when putting or deleting that 
object. That's because this behavior is defined in advance using "createIndex" 
in a "setVersion" transaction. And then how IDB extracts the referenced value 
from the object is done using an IDB idea of "key paths". But right there, in 
defining the indexes in advance (and not when the index is actually modified, 
which is when the object itself is modified), you've captured application state 
(data relationships that should be known only to the application) within IDB. 
Because this is done in advance (because IDB seems to have inherited this 
assumption that this is just the way MySQL happens to do it), there's a 
disconnect between when the index is defined and when it's actually used. And 
because of "key paths" you now need to spec out all kinds of things like how to 
handle compound keys, multiple values. It's becoming a bit of a spec-fest.

That this bubble of state gets captured in IDB, it also means that IDB now 
needs to provide ways of updating that captured state within IDB when it 
changes in the application (which will happen, so essentially you now have your 
indexing logic stuck in the database AND in the application and the application 
developer now has to try and keep BOTH in sync using this awkward pre-defined 
indexes interface), thus the need for a setVersion transaction in the first 
place. None of this would be necessary if the application could reference 
indexes to be modified (and created if they don't exist, or deleted if they 
would then become empty) AT THE POINT of putting or deleting an object. Things 
like data migrations would also be better served if this were possible since 
this is something the application would need to manage anyway. Do you follow?

The application is the right place to be handling indexing logic. IDB just 
needs to provide an interface to the indexing implementation, but not handle 
extracting values from objects or deciding which indexes to modify. That's the 
domain of the application. It's a question of encapsulation. IDB is crossing 
the boundaries by demanding to know ABOUT the data stored, and not just 
providing a simple way to put an object, and a simple way to put a reference to 
an object to an index, and a simple way to query an index and intersect or 
union an index with another. Essentially an object and its index memberships 
need to be completely opaque to IDB and you are doing the opposite. Take a look 
at the BDB interface. Do you see a setVersion or createIndex semantic in there? 
Take a look at Redis and Tokyo and many other things. Do you see a setVersion 
or createIndex semantic in there? Do these databases have any idea about the 
contents of objects? Any concept of key paths? No, and that's the whole reason 
these databases were created in the first place. I'm sure you have read the BDB 
papers. Obviously this is not the approach of MySQL. But if IDB is trying to be 
MySQL but saying it wants to be BDB then I don't know. In any event, Firefox 
would be brave to also embed SQLite. Let the better API win.

How much simpler could it be? At the end of the day, it's all objects and sets 
and sorted sets, and see Redis' epiphany on this point. IDB just needs to 
provide transactional access to these sets. The application must decide what 
goes in and out of these sets, and must be able to do it when it wants to, not 
some time in advance. I bring this up because I once wrote the exact same kind 
of database that you are writing now (where one thinks it would be good if the 
database did NOT treat objects as opaque... that the database should be smart 
about the contents of objects and share control for how objects relate to each 
other etc.) and I have since seen how much better, simpler, faster the 
alternative is. So unless you have formidable reasons for maintaining the 
status quo in light of the above, even if you don't understand this concept of 
application state getting stuck in IDB, and even though you advocate that 
WebSQL is not deprecated and that we can consider LocalStorage to be an 
alternative, then it is my hope that you will heed this and make something of 
it. I'm sorry if this is not the kind of feedback you want to hear at this 
stage, but IDB needs to be good for more than just HTML 5 todo list demos.


Re: [IndexedDB] Compound and multiple keys

2011-03-18 Thread Joran Greef
> On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote:
> 
> The best way to do this is likely to start a new thread (as the changes you 
> are
> suggesting isn't limited to "Compound and multiple keys"), and put a
> draft proposal there.
> 
> It by no means has to be perfect (it took us a long time to polish IDB
> into what it is today), but it needs to be more detailed than what you
> are saying above.
> 

More thoughts:

Firstly, my proposal for handling compound and multiple keys has already been 
put forward in a previous thread (i.e. adding the option to specify indexes to 
be modified when putting/deleting objects) so I see no need to create yet 
another thread.

Secondly, in terms of IDB storing parts of application state, it is clear that 
this is a problem that needs to be addressed. I think you have said as much 
yourself? If so, then those drafting the IDB specification must take 
responsibility for fixing this, since it is an issue they created in the first 
place. Unless, of course they do not really believe it to be an issue, in which 
case it would be a filibuster to ask for a "draft proposal".


Re: [IndexedDB] Compound and multiple keys

2011-03-16 Thread Joran Greef
> On 16 Mar 2011, at 7:59 PM, Jonas Sicking wrote:
> 
> It seems like you are suggesting pretty big changes. The best way to
> do this is likely to start a new thread (as the changes you are
> suggesting isn't limited to "Compound and multiple keys"), and put a
> draft proposal there.

Not necessarily. Adding the option to specify indexes to be modified when 
putting or deleting an object would go a long way already, solving the problem 
of compound and multiple keys in the process.

The next step after that, supporting compose-able set operations on indexes, 
would take some work, in terms of figuring out the best interface for doing it, 
hopefully keeping it fairly tightly coupled to the standard set operations 
themselves.

> It by no means has to be perfect (it took us a long time to polish IDB
> into what it is today), but it needs to be more detailed than what you
> are saying above.

Will do. The proposed changes have the potential to reduce the spec and 
implementation of IDB. The problem of IDB being exposed to a dose of 
application state certainly needs to be addressed.

> Also, I should mention that time is running out on major changes. We
> already have two database APIs, WebSQL and IDB, (three if you count
> localStorage), so there both needs to be significant advantages over
> the already existing APIs, and you would make yourself a favor by
> acting fast as the other specifications are gaining momentum literally
> by the day.
> 
> / Jonas

Do you really consider LocalStorage to be a database and what do you mean by 
database then? And how can you say that we "have" a database API in WebSQL if 
it is currently deprecated? Are there plans afoot to embed SQLite in Firefox? 
That would be a great idea by the way.

As far as I am aware, LocalStorage cannot be used as a database. I have tried. 
Most browsers do not permit more than 10mb and do not provide a means for the 
user to adjust storage quota. Browsers provide no locking mechanism (although 
you could simulate a lock service on top of LocalStorage if you could tolerate 
the latency) and some implementations (Safari as far as I can recall) load the 
entire contents of LocalStorage into memory on first access, blocking the UI. 
As you know, WebSQL is deprecated and only available in WebKit and Opera. 
Chrome as far as I am aware provides no mechanism to adjust WebSQL quota limits.

So that means we actually only have one potential cross-browser database API 
(and not three as you have stated), and that is IDB. It may be a good idea to 
slow down and get it right.




Re: [IndexedDB] Compound and multiple keys

2011-03-16 Thread Joran Greef
> On 3/9/2011 09:45:51 Shawn Wilsher wrote:
> 
> That makes sense since the original proposal was heavily based on BDB. 
> It's shifted a bit as we have made tweaks to improve it for the web.
> 
> Cheers
> 
> Shawn

I agree. If I may add my two cents worth: one thing that IDB has not yet 
learned from BDB is statelessness. At the moment IDB requires a bit of 
application state to be mixed up in IDB (i.e. by predefining indexes as opposed 
to allowing the application to specify indexes to be modified when putting or 
deleting objects). So it's not a pure data+indexes store, it's actually a 
data+indexes+application state store. This is making IDB more complex than it 
needs to be and is making the IDB interface less powerful (things like compound 
keys etc. would already be possible if IDB were stateless). For instance, if 
IDB is to store application state, then the spec needs to define what happens 
when the application state changes. If IDB were stateless, this would not be 
necessary. After the web having had no options for offline storage for so many 
years, it is probably safe to say that web applications do not need help with 
things like migrations, pre-defined schemas or anything fancy or "helpful" like 
that, they just need a pure data+indexes solution (but they need this to be 
comprehensive: at least set operations supported on indexes, and indexes 
defined by the application when putting or deleting objects and NOT before). In 
my honest opinion, IDB is not yet there and from the discussions does not seem 
to be headed in that direction. It's trying to make unnecessary things easy 
when it really needs to be just a powerful low-level data store with 
first-class indexing. I'm not sure how many users of IDB are actively involved 
in this discussion, but after spending hours on it over the past few months, 
and having built databases over LocalStorage and WebSQL, as a real-world user, 
may I ask that these concerns begin to be addressed?


Re: [IndexedDB] Two Real World Use-Cases

2011-03-07 Thread Joran Greef
On 08 Mar 2011, at 7:23 AM, Dean Landolt wrote:

> This doesn't seem right. Assuming your WebSQL implementation had all the same 
> indexes isn't it doing pretty much the same things as using separate 
> objectStores in IDB? Why would it be an order of magnitude slower? I'm sure 
> whatever implementation you're using hasn't seen much optimization but you 
> seem to be implying there's something more fundamental? The only thing I can 
> think of to blame would be the fat in the objectStore interface -- like, for 
> instance, the index building facilities. It seems to me your proposed 
> solution is to add yet more fat to the interface (more complex indexing), but 
> wouldn't it be just as suitable to instead strip down objectStores to their 
> bare essentials to make them more suitable to act as indexes? Then the 
> indexing functionality and all the hard decisions could be punted to 
> libraries where they'd be free to innovate.

Exactly. It's not what one would expect, and indication of the poor state of 
the IDB implementation (which is essentially a wrapper around SQLite anyway).

If someone is advising that object stores be used to handle indexes then may I 
be the first to raise a red flag and say that IDB is failing us (and it would 
have been better for the spec team to provide a locking mechanism for 
LocalStorage so it could be used in that way). The whole point of IDB as far as 
I can see is to provide transactional indexed access to a key value store.

> Why? You wouldn't necessarily have to store the whole object in each index, 
> just the index key, a value and some pointer to the original source object. 
> Something to resolve this pointer to the source would need to be spec'd (a la 
> couchdb's include_docs), but that's simple. Even better, say it were possible 
> to define a link relation on an object store that can resolve to its source 
> object -- you could define a source link relation and the property to use -- 
> and this would have the added bonus of being more broadly applicable than 
> just linking an index record to its source instance.

Think of the object creation and JSON serialization/deserialization overhead 
for putting 50 indexes and you have got more than enough waste there already.

> We can fix all of this right now very simply:
> 
> 1. Enable objectStore.put and objectStore.delete to accept a setIndexes 
> option and an unsetIndexes option. The value passed for either option would 
> be an array (string list) of index references.
> 
> This would only work for indexes arrays of strings, right? Things can get 
> much more complicated than that, and when they do you'd have to use an 
> objectStore to do your indexing anyway, right?

No it would work for pretty much anything. The application would be free to 
determine the indexes, and also to convert query parameters into indexes when 
querying. It's essentially "computed indexes" without the hassles of IDB trying 
to do it (there was an interesting thread last year on the challenges of 
storing am index computing function in IDB).

> Why is it more theoretically performant than using objectStores in the raw?

It's a more direct interface. Think about it for a second. Using objectStores 
in the raw is interpolating O(n) complexity with multiple function calls, to 
give just one reason. If IDB can receive a list of indexes to add and remove an 
object to and from, then it can also do things like perform a set difference 
first to save unnecessary IO. I have written a database or two with this 
technique and it's certainly faster.

> I don't necessarily understand the stateful vs. stateless distinction here. I 
> don't see how your proposed solution removes the requirement for IDB to 
> enforce constraints when certain indexes are present. Developers would 
> already be able to use IDB statefully (with predefined schemas) -- they'd 
> just use a library that has a schema mechanism. I doubt such a library for 
> IDB already exists, but it'd be quite easy to port perstore, for instance, 
> which is derived from the IDB API and already has this functionality using 
> json-schema. There will no doubt be many ORM-like libraries that will pop up 
> as soon as IDB starts to stabilize (or as soon as it gets a node.js 
> implementation).

The trouble is you always think a database would "be quite easy" until you 
actually try to do it yourself. At first when I dug into IDB I didn't think 
there would be any problems that could not be handled in some way. I have 
actually switched back to WebSQL now and will encourage my users to use Safari 
or Chrome as long as these browsers support WebSQL (and I hope Chrome will at 
least finish up by adding a quota interface for WebSQL). IDB right now is like 
a completely neutered slower SQLite without any of the benefits to be expected 
of a transactional indexed KV store. It's really sad.

For examples of stateless databases see the interfaces for Redis (the best 
example, and a perfect target

Re: [IndexedDB] Two Real World Use-Cases

2011-03-06 Thread Joran Greef
> On 05 Mar 2011, at 3:50 AM, Jonas Sicking wrote:
> 
> What we do need to do sooner rather than later though is allowing
> multiple index values for a given entry using arrays. We also need to
> add support for compound keys. But lets deal with those issues in a
> separate thread.

Multiple index values for a given entry using arrays, as well as compound keys, 
can be handled by letting the application provide an array of index references 
when putting or deleting objects. There is no need to make a Rube Goldberg 
device out of it.

Regards

Joran Greef


Re: [IndexedDB] Two Real World Use-Cases

2011-03-03 Thread Joran Greef
Hi Jonas

I have been trying out your suggestion of using a separate object store to do 
manual indexing (and so support compound indexes or index object properties 
with arrays as values).

There are some problems with this approach:

1. It's far too slow. To put an object and insert 50 index records (typical 
when updating an inverted index) this way takes 100ms using IDB versus 10ms 
using WebSQL (with a separate indexes table and compound primary key on index 
name and object key). For instance, my application has a real requirement to 
replicate 4,000,000 emails between client and server and I would not be 
prepared to accept latencies of 100ms to store each object. That's more than 
the network latency.

2. It's a waste of space.

Using a separate object store to do manual indexing may work in theory but it 
does not work in practice. I do not think it can even be remotely suggested as 
a panacea, however temporary it may be.

We can fix all of this right now very simply:

1. Enable objectStore.put and objectStore.delete to accept a setIndexes option 
and an unsetIndexes option. The value passed for either option would be an 
array (string list) of index references.

2. The object would first be removed as a member from any indexes referenced by 
the unsetIndexes option. Any referenced indexes which would be empty thereafter 
would be removed.

3. The object would then be added as a member to any indexes referenced by the 
setIndexes option. Any referenced indexes which do not yet exist would be 
created.

This would provide the much-needed indexing capabilities presently lacking in 
IDB without sacrificing performance.

It would also enable developers to use IDB statefully (MySQL-like pre-defined 
schemas with the DB taking on the complexities of schema migration and data 
migration) or statelessly (See Berkeley DB with the application responsible for 
the complexities of data maintenance) rather than enforcing an assumption at 
such an early stage.

Regards

Joran Greef


Re: [IndexedDB] Two Real World Use-Cases

2011-03-02 Thread Joran Greef
On 02 Mar 2011, at 1:31 PM, Jonas Sicking wrote:

> I agree that we are currently enforcing a bit of schema due to the way
> indexes work. However I think it's a good approach for an initial
> version of this API as it covers the most simple use cases. Note that
> the more complex use cases are still very possible by simply using a
> separate objectStore as an index and manually add/remove things there.
> 
> I still believe that using a function, which is persisted in the
> database, is very doable. And yes, the function needs to be stateless
> and it needs to be possible to change the set of functions which
> manage the set of indexes associated with a given objectStore
> (probably by simply allowing indexes to be created and removed, which
> is already the case).
> 
> / Jonas

Thank you Jonas, I'm using your multi objectStore trick at the moment to store 
indexes.

It just seems that the most direct way of doing all of this, would just be to 
let the application pass in the relevant index references when it makes put or 
delete calls. IDB is almost becoming a Rube Goldberg device trying to find 
other ways of doing this.

The reason I bring it up, is because I just made this same change with my 
server database, which used to require schema knowledge, so it could compute 
indexes etc., and then I realized this could all be eliminated completely by 
just passing indexes per put and delete call.

I really don't think IDB should try and dip it's toes into application state in 
the first place, let alone try and keep up with application state thereafter. 
What is the motivation for doing that? It's not absolutely necessary. It's an 
assumption that is bloating almost every part of the spec. It's not the killer 
feature of IDB, and it's getting in the way of things that could be, such as 
indexing and querying. If version 1 is done right, there will be no need for 
version 2. There's been a tremendous amount of discussion regarding IDB and 
people like yourself and Jeremy have certainly contributed massively, but I do 
get the feeling (as may you) that version 2 is becoming a stopover for things 
that have not been thought through completely, for which a solution is not yet 
clear, something's not right. I only say this from recently re-writing a 
database after making the same mistake.




Re: [IndexedDB] Two Real World Use-Cases

2011-03-01 Thread Joran Greef
On 01 Mar 2011, at 7:27 PM, Jeremy Orlow wrote:

> 1. Be able to put an object and pass an array of index names which must 
> reference the object. This may remove the need for a complicated indexing 
> spec (perhaps the reason why this issue has been pushed into the future) and 
> give developers all the flexibility they need.
> 
> You're talking about having multiple entries in a single index that point 
> towards the same primary key?  If so, then I strongly agree, and I think 
> others agree as well.  It's mostly a question of syntax.  A while ago we 
> brainstormed a couple possibilities.  I'll try to send out a proposal this 
> week.  I think this + compound keys should probably be our last v1 features 
> though.  (Though they almost certainly won't make Chrome 11 or Firefox 4, 
> unfortunately, hopefully they'll be done in the next version of each, and 
> hopefully that release with be fairly soon after for both.)

Yes, for example this user object { name: "Joran Greef", emails: 
["jo...@ronomon.com", "jorangr...@gmail.com"] } with indexes on the "emails" 
property, would be found in the "jo...@ronomon.com" index as well as in the 
"jorangr...@gmail.com" index.

What I've been thinking though is that the problem even with formally 
specifying indexes in advance of object put calls, is that this pushes too much 
application model logic into the database layer, making the database enforce a 
schema (at least in terms of indexes). Of course IDB facilitates migrations in 
the form of setVersion, but most schema migrations are also coupled with 
changes to the data itself, and this would still have to be done by the 
application in any event. So at the moment IDB takes too much responsibility on 
behalf of the application (computing indexes, pre-defined indexes, pseudo 
migrations) and not enough responsibility for pure database operations (index 
intersections and index unions).

I would argue that things like migrations and schema's are best handled by the 
application, even if this is more work for the application, as most people will 
write wrappers for IDB in any event and IDB is supposed to be a core-level API. 
The acid-test must be that the database is oblivious to schemas or anything 
pre-defined or application-specific (i.e. stateless). Otherwise IDB risks being 
a database for newbies who wouldn't use it, and a database that others would 
treat as a KV anyway (see MySQL at FriendFeed).

A suggested interface then for putting or deleting objects, would be: 
objectStore.put(object, ["indexname1", "indexname2", "indexname3"]) and then 
IDB would need to ensure that the object would be referenced by the given index 
names. When removing the object, the application would need to provide the 
indexes again (or IDB could keep track of the indexes associated with an 
object).

Using a function to compute indexes would not work as this would entrap 
application-specific schema knowledge within the function (which would need to 
be persisted) and these may subsequently change in the application, which would 
then need a way to modify the function again. The key is that these things must 
be stateless.

The objects must be opaque to IDB (no need for serialization/deserialization 
overhead at the DB layer). Things like key-paths etc. could be removed and the 
object id just passed in to put or delete calls.

> 2. Be able to intersect and union indexes. This covers a tremendous amount of 
> ground in terms of authorization and filtering.
> 
> Our plan was to punt some sort of join language to v2.  Could you give a more 
> concrete proposal for what we'd add?  It'd make it easier to see if it's 
> something realistic for v1 or not.

If you can perform intersect or union operations (and combinations of these) on 
indexes (which are essentially sets or sorted sets), then this would be the 
join language. It has the benefit that the interface would then be described in 
terms of operations on data structures (set operations on sets) rather than a 
custom language which would take longer to spec out.

I've written databases over append-only files, S3, WebSQL and even LocalStorage 
(!) and from what I've found with my own applications, you could handle 
everything from multi-tenant authorization to adequate filtering with the 
following operations:

1. intersect([ index1, index2 ])
2. union([ index1, index2 ])
3. intersect([ union([ index1, index2 ]), index3, index4, index5, index6, 
index7 ])

Hopefully, a join language described in terms of pure set operations would be 
much simpler to implement and easier to use and reason with.

In fact I think if IDB offered only a single object store and an indexing 
system described above, it would be completely perfect. That's all that's 
needed. No need for a V2. Just a focus on high-performance thereafter.




[IndexedDB] Two Real World Use-Cases

2011-03-01 Thread Joran Greef
I have been following the development behind IndexedDB with interest. Thank you 
all for your efforts.

I understand that the initial version of IndexedDB will not support indexing 
array values.

May I suggest an alternative derived from my home-brew server database evolved 
from experience using MySql, WebSql, LocalStorage, CouchDb, Tokyo Cabinet and 
Redis?

1. Be able to put an object and pass an array of index names which must 
reference the object. This may remove the need for a complicated indexing spec 
(perhaps the reason why this issue has been pushed into the future) and give 
developers all the flexibility they need.

2. Be able to intersect and union indexes. This covers a tremendous amount of 
ground in terms of authorization and filtering.

These two needs are critical.

Without them, I will either carry on using WebSql for as long as possible, or 
be forced to use IndexedDb as a simple key value store and layer my own 
indexing on top.

I am writing an email application and have to deal with secondary indexes of up 
to 4,000,000 keys. It would not be ideal to do intersects and unions on these 
indexes in the application layer.

Regards

Joran Greef


FileSystem API: Avoiding Upload Forms And Temporary Downloads

2010-11-29 Thread Joran Greef
I have some questions regarding the FileSystem API: 

1. It would be great to be able to let the user choose where they want their 
sandboxed directory located for the web app, i.e. on the desktop for quick 
access. That way they can drag files directly to the directory, which could be 
used as a dropbox for synching to a server. Would this be possible (or at least 
a mechanism to link to the directory wherever the browser may choose to place 
it)? Otherwise, apps like Dropbox would not be possible in a browser.

2. It seems that dragging a file out of a web app is currently copy-on-write? 
So you drag a file out the web app into Excel but subsequent changes in Excel 
would be lost to the web app (it seems like it's already possible for the web 
app to poll the sandboxed directory for file changes)? If so, it means that the 
FileSystem API would force the following work flow: the user saves a temp file 
somewhere (probably on the desktop) then re-uploads using a web form, and then 
minimizes the browser and deletes the temp file, and then maximizes the browser 
again? That would be a bad case of Fitts' Law and quickly become a show-stopper 
if the user needs to frequently edit files using native applications.

3. It must be possible to link to files within the sandboxed directory and have 
them open in the default native application. I can understand that .exe's need 
to be neutered, but content files such as .doc and .xls must have a method for 
opening in the default application. Would this be possible? Otherwise the only 
solution would be to trigger the Download window, creating a temp file in the 
Downloads folder for a file that already exists on the filesystem?

4. In my mind, the FileSystem Api has a shot at improving user experience by 
helping to avoid file upload forms, and temp file downloads. I'm not sure these 
goals are possible with the current spec?

These use-cases may prove to be vital building blocks for the next wave of 
networked applications and it would be great to see them in the new FileSystem 
API.

Regards

Joran Greef



Web Storage Mutex

2009-12-11 Thread Joran Greef
"The use of the storage mutex to avoid race conditions is currently considered 
by certain implementors to be too high a performance burden, to the point where 
allowing data corruption is considered preferable. Alternatives that do not 
require a user-agent-wide per-origin script lock are eagerly sought after."

It's not a question of mutex versus data corruption, but of implementation:

Database storage is served by SQLite. LocalStorage would be better served by 
Tokyo Cabinet: http://1978th.net/tokyocabinet/. I doubt the current 
localStorage implementation is better than the current Tokyo Cabinet 
implementation.

Joran Greef