Hi everyone! I recently joined the Chrome team at Google and have been tasked with making window.localStorage work within Chromium. I've spent a good deal of time trying to understand how Chromium and WebKit interact, how the current WebKit localStorage implementation works, and thinking about how to make it all work together.
Unfortunately, I'm having a really hard time coming up with an elegant solution to the problem. In WebKit, localStorage works by maintaining a string to string map for each origin and periodically syncing it to a SQLite database. This is great when render windows are all in the same memory space, but there's no simple and efficient way to make this work in a multi-process environment. (Note that even 2 windows from the same origin are usually in 2 processes within Chromium.) If you have any ideas or advice, I'd really appreciate it! Below is a dump of my ideas so far. Jeremy ============================= (1) The easiest solution is re-sync the in-memory version of the data with what's on disk. We'd pull in the data on the first access, push the data to disk when finished, and wrap the whole thing in a transaction to block other processes form accessing localStorage until we're done. This is obviously going to be very slow and defeat the entire purpose of having an in-memory version. Doing events right and avoiding deadlocks may also be difficult as well. (2) A similar idea is to ditch the in-memory version and just work 100% from the database. In other words, every localStorage operation would result in one or more SQLite calls. We'd still need to wrap the entire thing in transactions, and I think there might still be some potential issues with events and deadlocks. It's certainly cleaner and more efficient from Chromium's perspective, but it'll be much slower than the current implementation in the single process case. (I assume this unacceptable?) (3) Another idea I had was to add a replayable log to the localStorage database. This would allow each process to efficiently "catch up" with the SQLite localStorage state. One complication is garbage collecting the log: we can't delete log entries until every applicable process has consumed them, but we also can't leave them lying around forever. We can't just wait until the window closes because of long running web apps (think gmail) and we have to handle corner cases like a process crashing. All three options seem viable until we add events into the equation. After that, a replay system seems necessary, and thus we're left with only the third option I mentioned. (We can't just compare what's in the database with what's in memory to generate events since a value might have changed multiple times and there's no sense of order to the events occurring.) It's also worth mentioning that, in order to make the implementation self-contained (i.e. not have hooks that Chromium or other multi-process environments must implement), WebKit would need to monitor the database file for changes. This can be tricky to do in a cross platform, scalable (i.e. not poling based) manor. It's also worth doing a back-of-the-napkin calculation for performance: I don't know the actual number of seeks for a SQLite write, but let's go with a very conservative/optimistic guess and say 3. A normal user has a 7200rpm hard drive which has an average seek of 7ms. Let's say we have n processes all accessing the same origin. So the number of times a process can modify localStorage per second before completely saturating the hard disk with seeks is something like 1000/(3*7*n) or about 50/n. So, if we have 2 windows open, the app could make the browser IO bound with just 25 updates per second per window. And that's with a very conservative estimated number of seeks, assuming nothing else is touching the disk, and assuming only one process per window. For example, if iframes were in their own process, n would be much greater for some applications. Also, if we're not careful about how "garbage collection" is implemented for replay log entries, even windows doing read-only access or implementing window.onstorage event handlers could slow things down. Unless I'm missing something here, it seems to me that doing interprocess communicatoin via SQLite database (or any file backing) just isn't practical. So what other options are there? (4) We do the IPC and sharing of data within a shared memory segment, and just have one process take care of the disk serialization. In order to do this, the mappings and the locks would need to be malloc'ed into this memory space. We'd also need to have some way for the memory space to be resized (for example, if the user increases its quota), which means we'd need 2 shared memory spaces per origin: a fixed size chunk for control (at very least containing a global lock and info about the other chunk) and a variable size chunk for the map. There are of course security implications when using shared memory (we're punching holes in our walls). How to handle a WebKit process crashing is also an interesting question; realistically, we need to treat all the localStorage memory as corrupt if that process had the orign's lock when it crashed. One way to handle this is to have the process work on a copy of the data and then flip a bit at the end to change which copy is live. Other options include clearing the localStorage (the spec allows, but frowns upon this) or reloading the last version serialized from disk (could cause unpredictable behavior), but neither of these seem very good. Also note that there's no way for this to be 100% self-contained within WebCore, so there'll have to be some external hooks for things like getting a shared memory space. (5) Since all of the solutions (that I've thought of, anyhow) so far have either been very slow or required hooks into the process that contains WebKit (a Chromium render process in this case), perhaps this points to a real need. Maybe localStorage needs some way to separate a "front end" (WebCore) and a "back end" implementation? Of course, localStorage is not unique; it's one of many parts of HTML 5 which are not necessarily bound to a single event loop.... As I said, I'm new here, so I'm trying really hard to reign in my my ideas, but perhaps a more generic front-end/back-end separation for these interfaces makes sense? That way multi-process implementations can optionally implement a proxy between the two halves and single process implementations can implement those directly as function calls? For example, for localStorage you could create a localStorageArea proxy that would serialize data, send it across the process boundary, then have a host on the other end that actually calls localStorageArea, and then repeat the process for the response. All the processes could then share one localStorageArea per origin. Obviously I'm overly simplifying it, but I think it could work, and maybe it can be done in a way that translates to other APIs that (potentially) need to cross the process boundary. If you've read all the way to this point, I want you to know I really appreciate it. I'm hopeful there is an elegant solution to be had here. :-) _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev