On 26/07/12 15:39, Jeroen De Dauw wrote:
> Hey,
>
> Nischay, Markus and me where discussing how to implement caching for ask
> queries and inadvertently ended up discussing the whole query
> invalidation project again. Since this fits in with Nischay his project
> and is something I also want to poke at since we'll have to implement
> something similar in Wikidata I decided to write up my current thoughts
> on how to implement this.
>
>
> I propose having a table "queries" where each row an identifier for a
> query (for instance a hash of the conditions, printouts and relevant
> params). Entries would be added on page save in case they are not there
> yet. Each entry can contain the computed results for the query. There
> would also be a table to map each query to the pages on which it's used.
> Flow would look like this:
>
>
> * People use a single query on multiple pages, first usage inserts a new
> entry in queries with the freshly obtained result
>
> * Successive usages just get the result from the cache in the query table
>
> * When someone changes data, we figure out what queries can be affected
> and remove their cache, plus invalidate the parser,html,whatever caches
> of all pages containing any of the queries that had their cache removed
>
> * On next view of such a page SMW find an empty cache for the query and
> recomputes it
>
> Note: we would not necessarily need to wait for people to view a page to
> have the cache rebuild (both the query cache and the page specific
> caches). We could create jobs to do this, so that on the next view of
> the page, it's there immediately. This only makes sense for wikis where
> most pages get visited often though, since else you might be doing a lot
> of work for nothing.
>
>
> The only difficult problem left to solve here is how to best figure out
> which queries have changed (or could have changed), but this does not
> appear to affect the rest of the design.
>
> Anyone objections against such an approach or suggestions of any sort?

I think we should cache results per query condition rather than per 
query. Various queries can have the same condition but different 
parameters (e.g., showing it in a table vs. showing it in a list). We 
should only have one cache for two queries that have the same results. 
We can still have a query table (it would be interesting to store 
information about individual queries beyond the condition they have).

For change propagation and cache update, it might help to decompose the 
query condition (along its PHP object structure as an SMWDescription). 
Every single description part is relatively simple, yet precise enough 
to check if it was affected by a change. In this way, changes could be 
pushed through the description objects. This would also allow much 
stronger reuse of subconditions that occur in many query descriptions. 
The details need to be worked out ...

Markus

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Semediawiki-devel mailing list
Semediawiki-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel

Reply via email to