Hi Mark,
I am happy to see your feedback on DataCache. Forgive me for the delay
in responding.
On Jul 17, 2009, at 4:50 PM, Mark Nottingham wrote:
I think this work is in an interesting space but, unfortunately,
it's doing it without reference to the existing HTTP caching model,
resulting in a lot of duplicated work, potential conflicts and
ambiguities, as well as opportunity cost.
I don't understand this fully, can you please explain? From what I
know, the Gears implementation can be easily extended to support
DataCache. Of course, one doesn't need all of Gears - only LocalServer
and browser integration is required. I don't see that as a lot of
duplicated work.
Furthermore, it's specifying an API as the primary method of
controlling caches. While that's understandable if you look at the
world as a collection of APIs, it's also quite limiting; it
precludes reuse of information, unintended uses, and caching by
anything except the browser.
FWIW, DataCache is not the first attempt at obtaining an API to
control a browser's HTTP cache. That was already the case with
ApplicationCache in HTML5.
I don't quite understand what problems you foresee with DataCache's
approach. It does not ask the implementor to violate any HTTP caching
semantics. If anything, it suggests that the implementation can offer
an off-line response should an on-line response be infeasible.
This is based on my reading of the following pieces of text from
RFC2616.
From §13,
[[
Requirements for performance, availability, and disconnected operation
require us to be able to relax the goal of semantic transparency.
...
Protocol features that allow a cache to attach warnings to responses
that do not preserve the requested approximation of semantic
transparency.
]]
From §13.1.6
[[
A client MAY also specify that it will accept stale responses, up to
some maximum amount of staleness. This loosens the constraints on the
caches, and so might violate the origin server's specified constraints
on semantic transparency, but might be necessary to support
disconnected operation, or high availability in the face of poor
connectivity.
]]
Can you please correct me if I have misinterpreted or misapplied these
provisions of HTTP? Alternatively, can you point me to a valid
interpretation of these portions in the context of an open
implementation/application?c
A much better solution would be to declaratively define what URIs
delineate an application (e.g., in response headers and/or a
separate representation), and then allow clients to request an
entire application to be fetched before they go offline (for
example). I'm aware that there are other use cases and capabilities
here, this is just one example.
Am I correct in understanding that you find pre-fetching the entire
application to be better than pre-fetching parts of it. In any case,
are you also suggesting a data format for specifying a collection of
such URIs that the user agent should pin down in cache? How does a
data format form a better solution as opposed to an API?
Additionally, it is not always possible to statically define the
collection of URIs that are of interest to an application. Let me take
an example -
*Sales force automation*
My sales reps work in parts of the world where assuming a reliable
network connection is not a good assumption to make. Still I would
like to deploy order entry applications that work reliably in the face
of poor network connection on a small mobile computer with a Web
browser. Today I am going on a round of my customers in Fallujah and I
need to have information about customers in that area, including their
names, addresses, and order history (and status). This information
changes regularly and my sales reps benefit from up-to-the-minute
order history information if I can connect to the server at the time I
am at the customer's office. If I don't have network access, I at
least have up-to-the-date information. Finally, I want to enable the
sales rep to take orders when they are out in the field and provided
they don't lose the device, I want to assure them that their orders
will make it to the company's servers. If connectivity is available at
that instant, then the order will be confirmed immediately and
processing would begin. If not, it would be kept pending.
Developers until now have developed and deployed such off-line
applications outside the context of the Web architecture - i.e., no
URIs, no uniform methods, etc. They will continue to do the same with
SQL databases inside Web browsers - still no URIs, a single method -
POST - and an off-line only solution (meaning it cannot take
opportunistic advantage of available networks). Is this a more
desirable approach than to provide an API to a subset of the browser's
HTTP cache?
Doubtless there's still a need for some new APIs here, but I think
they should be minimal (e.g., about querying the state of the cache,
in terms of offline/online, etc.), not re-defining the cache itself.
Can you elaborate a little more? What do you mean by re-defining the
cache? Can you provide specific reasons why the DataCache API seems
like redefining the cache?
FWIW, I'd be very interested in helping develop protocols and APIs
along what's outlined above.
Sorry, but I didn't see any outline. May be I missed something and
would appreciate if you can specifically provide an outline.
In any case, I welcome you to offer your counsel on better addressing
the requirements of DataCache that I have previously stated [1]. It
would be best if these requirements can be addressed through the
correct use of HTTP as opposed to API magic.
Cheers,
P.S. This draft alludes to automatic prefetching without user
intervention. However, there is a long history of experimentation
with pre-fetching on the Web, and the general consensus is that it's
of doubtful utility at best, and dangerous at worst (particularly in
bandwidth-limited deployments, where bandwidth is charged for, as
well as when servers are taken down because of storms of prefetch
requests).
There is now also a fairly large amount of experience with prefetching
outside of the regular HTTP ambit. Siebel CRM (one of the most popular
enterprise non-productivity off-line applications) as well as MySpace
and GMail both pre-fetch thousands if not more pieces of data and
store them locally. Have you considered this experience as relevant?
I may not be wrong in saying that the general observation you are
making is not the relevant in DataCache's case.
While in the general case, pre-fetching is not a good idea, but why
kill the messenger? Let programmers make the right choice for their
applications and learn from their own experience. IMHO, not doing
DataCache like things turns people away from using (and I mean not
abusing) the Web for more brittle and less widely deployable as well
as far more laboriously crafted architectures.
[1] http://lists.w3.org/Archives/Public/public-webapps/2008OctDec/0104.html