Re: [SMW-devel] Query result caching and invalidation

Markus Krötzsch Thu, 26 Jul 2012 10:00:40 -0700

On 26/07/12 17:22, Jeroen De Dauw wrote:
> Hey,
>
>  > On the other hand, it would be even more useful to cache all results
> per (sub)query, ignoring the limit
>
> This can reduce computing overlapping results, but on the other hand is
> likely to compute results we'll never actually use. And it makes the
> implementation more complex. Since I'm not convinced the actual result
> would be better (I suspect that in fact it's be worse), I prefer to keep
> it simple for now. And if you have a case where the "store everything"
> approach really makes sense, you can always use a concept right?
>
>  > I was thinking of caching the query result only, not the printouts.
> One could cache a list of results, instead of caching all data needed to
> display the query result.
>
> Similar arguments apply here. Any query obtaining a single property
> would automatically fetch all properties for all matching objects. Again
> I don't think it's that much of an improvement. Esp considering the
> following:
>
>  > Having the lists of query results that are displayed in one query now
> could be useful for updating (if you have a data blob, you cannot check
> quickly for which queries a page occurs as a displayed result).
>
> Sure, I'd make it easier to figure this out. At least, if you go
> invalidate it whenever a single property changes. So now our query
> obtaining a single property does not only result into all properties
> getting obtained, but it'll also have it's cache invalidated whenever
> one of those other properties is changed.


Not necessarily. One can still store the printout properties and look at 
the diff to see if any of them was affected.

> This seems like something we
> really should avoid, so we'll have to hold into account the affected
> properties anyway, making the "just store all properties" approach not
> simpler to implement.
>

Not sure what "just store all properties" means. I was arguing for the 
opposite: not to store the properties again, since the printouts can 
easily be fetched from DB in the (relatively rare) cases that the parser 
cache needs to be rebuild. Mirroring all printout properties in the 
query cache would require more frequent updates to it and make it more 
specific to one single page.

But it does not matter much for now. The big issue with all of the query 
result caching is to limit the amount of cache invalidation that happens 
on updates. We need to think about how to get more specific information 
about queries than the properties that they refer to. Some wikis have 
thousands of pages with very similar queries, always using the same 
property (from a template), where each query has only a few results 
(referata gives a good example). A property-based cache invalidation 
would kill most of the query caches on almost every property edit (there 
are often just a handful of properties). Storing results for 
(sub)conditions as an exhaustive list could allow a much more 
fine-grained control of cache invalidation. The challenge is to keep 
these sets small. Maybe there are other approaches as well, such as 
singling out certain "selective" subqueries for this purpose.

Markus

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Re: [SMW-devel] Query result caching and invalidation

Reply via email to