Re: DataCache API - editor's draft available

Nikunj R. Mehta Mon, 20 Jul 2009 09:26:29 -0700

Hi Mark,

I am happy to see your feedback on DataCache. Forgive me for the delayin responding.


On Jul 17, 2009, at 4:50 PM, Mark Nottingham wrote:

I think this work is in an interesting space but, unfortunately,it's doing it without reference to the existing HTTP caching model,resulting in a lot of duplicated work, potential conflicts andambiguities, as well as opportunity cost.

I don't understand this fully, can you please explain? From what Iknow, the Gears implementation can be easily extended to supportDataCache. Of course, one doesn't need all of Gears - only LocalServerand browser integration is required. I don't see that as a lot ofduplicated work.

Furthermore, it's specifying an API as the primary method ofcontrolling caches. While that's understandable if you look at theworld as a collection of APIs, it's also quite limiting; itprecludes reuse of information, unintended uses, and caching byanything except the browser.

FWIW, DataCache is not the first attempt at obtaining an API tocontrol a browser's HTTP cache. That was already the case withApplicationCache in HTML5.

I don't quite understand what problems you foresee with DataCache'sapproach. It does not ask the implementor to violate any HTTP cachingsemantics. If anything, it suggests that the implementation can offeran off-line response should an on-line response be infeasible.

This is based on my reading of the following pieces of text fromRFC2616.


From §13,
[[

Requirements for performance, availability, and disconnected operationrequire us to be able to relax the goal of semantic transparency.

...

Protocol features that allow a cache to attach warnings to responsesthat do not preserve the requested approximation of semantictransparency.

]]

From §13.1.6
[[

A client MAY also specify that it will accept stale responses, up tosome maximum amount of staleness. This loosens the constraints on thecaches, and so might violate the origin server's specified constraintson semantic transparency, but might be necessary to supportdisconnected operation, or high availability in the face of poorconnectivity.

]]

Can you please correct me if I have misinterpreted or misapplied theseprovisions of HTTP? Alternatively, can you point me to a validinterpretation of these portions in the context of an openimplementation/application?c

A much better solution would be to declaratively define what URIsdelineate an application (e.g., in response headers and/or aseparate representation), and then allow clients to request anentire application to be fetched before they go offline (forexample). I'm aware that there are other use cases and capabilitieshere, this is just one example.

Am I correct in understanding that you find pre-fetching the entireapplication to be better than pre-fetching parts of it. In any case,are you also suggesting a data format for specifying a collection ofsuch URIs that the user agent should pin down in cache? How does adata format form a better solution as opposed to an API?

Additionally, it is not always possible to statically define thecollection of URIs that are of interest to an application. Let me takean example -


*Sales force automation*

My sales reps work in parts of the world where assuming a reliablenetwork connection is not a good assumption to make. Still I wouldlike to deploy order entry applications that work reliably in the faceof poor network connection on a small mobile computer with a Webbrowser. Today I am going on a round of my customers in Fallujah and Ineed to have information about customers in that area, including theirnames, addresses, and order history (and status). This informationchanges regularly and my sales reps benefit from up-to-the-minuteorder history information if I can connect to the server at the time Iam at the customer's office. If I don't have network access, I atleast have up-to-the-date information. Finally, I want to enable thesales rep to take orders when they are out in the field and providedthey don't lose the device, I want to assure them that their orderswill make it to the company's servers. If connectivity is available atthat instant, then the order will be confirmed immediately andprocessing would begin. If not, it would be kept pending.

Developers until now have developed and deployed such off-lineapplications outside the context of the Web architecture - i.e., noURIs, no uniform methods, etc. They will continue to do the same withSQL databases inside Web browsers - still no URIs, a single method -POST - and an off-line only solution (meaning it cannot takeopportunistic advantage of available networks). Is this a moredesirable approach than to provide an API to a subset of the browser'sHTTP cache?

Doubtless there's still a need for some new APIs here, but I thinkthey should be minimal (e.g., about querying the state of the cache,in terms of offline/online, etc.), not re-defining the cache itself.

Can you elaborate a little more? What do you mean by re-defining thecache? Can you provide specific reasons why the DataCache API seemslike redefining the cache?

FWIW, I'd be very interested in helping develop protocols and APIsalong what's outlined above.

Sorry, but I didn't see any outline. May be I missed something andwould appreciate if you can specifically provide an outline.

In any case, I welcome you to offer your counsel on better addressingthe requirements of DataCache that I have previously stated [1]. Itwould be best if these requirements can be addressed through thecorrect use of HTTP as opposed to API magic.

Cheers,
P.S. This draft alludes to automatic prefetching without userintervention. However, there is a long history of experimentationwith pre-fetching on the Web, and the general consensus is that it'sof doubtful utility at best, and dangerous at worst (particularly inbandwidth-limited deployments, where bandwidth is charged for, aswell as when servers are taken down because of storms of prefetchrequests).

There is now also a fairly large amount of experience with prefetchingoutside of the regular HTTP ambit. Siebel CRM (one of the most popularenterprise non-productivity off-line applications) as well as MySpaceand GMail both pre-fetch thousands if not more pieces of data andstore them locally. Have you considered this experience as relevant?

I may not be wrong in saying that the general observation you aremaking is not the relevant in DataCache's case.

While in the general case, pre-fetching is not a good idea, but whykill the messenger? Let programmers make the right choice for theirapplications and learn from their own experience. IMHO, not doingDataCache like things turns people away from using (and I mean notabusing) the Web for more brittle and less widely deployable as wellas far more laboriously crafted architectures.


[1] http://lists.w3.org/Archives/Public/public-webapps/2008OctDec/0104.html

Re: DataCache API - editor's draft available

Reply via email to