I've been told caching proxies do not normally cache URLs with a query string. Sometimes it's configurable, but that only works if you control the intermediate. And an intermediate may be remapping from resource to query string.

That gets us to Adrian's point about being up-to-date and that the ideal cache setup is going to involve application semantics.

Whatever cache control Fuseki has must work for everyone by default. Conditional GET would be better, using epochs to note updates (that is e-tags). That isn't there ATM, and anyway it assumes a more sophisticated client.

Also, knowing about read-only data service could influence the caching ... when the application use case is tolerant of stale data should the backing data be updated out-of-band.

    Andy

On 19/03/2020 14:08, Martynas Jusevičius wrote:
Adrian,

indeed, I'm asking because I'm looking at using Varnish as a proxy
cache in front of Fuseki.

However, best practices [1] say:

6.2 Cache policy

Define a cache policy
A cache / expiration policy is the rationale behind cache control for
every resource served by HTTP/1.1 servers.. Content managers should
decide, globally and/or locally, what can or can not be cached, how
long caches should keep the document before trying to get a new
version, etc. These decisions may be made depending on the frequency
at which the documents may be updated.

Allow the Content Manager to set up cache control according to a Cache Policy
The content manager should be able to set the max-age parameter for
any resource served according to a cache policy.

6.3: Caching generated content

Provide actual caching information for content generated dynamically
Most dynamic content generation systems act as if the documents they
generate and serve were "fresh" (i.e as if the resource was last
modified at the date it is served), whether the information itself is,
or not.
This is a harmful lie for caching engines and should be avoided.
Regardless of the technology used, it should be possible to provide
age information by retrieving the actual information from whatever
source is used to generate the dynamic content: file,database, etc.

https://www.w3.org/TR/chips/#gl6

On Thu, Mar 19, 2020 at 2:50 PM Adrian Gschwend <[email protected]> wrote:

On 19.03.20 12:05, Martynas Jusevičius wrote:

Cache-Control: must-revalidate,no-cache,no-store
Pragma: no-cache

YMMV, but my take here is:

- a SPARQL endpoint should always return the latest results
- If caching is needed, it should be transparent for the user, as in the
SPARQL endpoint can have its caching indexes internally
- it is the middleware/developers job to add HTTP caching layers where
appropriate

We do that with our SPARQL proxy for example, the middleware there sets
caching headers that are configurable.

And as usual, cache invalidation is the hard part :)

regards

Adrian

Reply via email to