On 26/07/12 15:39, Jeroen De Dauw wrote: > Hey, > > Nischay, Markus and me where discussing how to implement caching for ask > queries and inadvertently ended up discussing the whole query > invalidation project again. Since this fits in with Nischay his project > and is something I also want to poke at since we'll have to implement > something similar in Wikidata I decided to write up my current thoughts > on how to implement this. > > > I propose having a table "queries" where each row an identifier for a > query (for instance a hash of the conditions, printouts and relevant > params). Entries would be added on page save in case they are not there > yet. Each entry can contain the computed results for the query. There > would also be a table to map each query to the pages on which it's used. > Flow would look like this: > > > * People use a single query on multiple pages, first usage inserts a new > entry in queries with the freshly obtained result > > * Successive usages just get the result from the cache in the query table > > * When someone changes data, we figure out what queries can be affected > and remove their cache, plus invalidate the parser,html,whatever caches > of all pages containing any of the queries that had their cache removed > > * On next view of such a page SMW find an empty cache for the query and > recomputes it > > Note: we would not necessarily need to wait for people to view a page to > have the cache rebuild (both the query cache and the page specific > caches). We could create jobs to do this, so that on the next view of > the page, it's there immediately. This only makes sense for wikis where > most pages get visited often though, since else you might be doing a lot > of work for nothing. > > > The only difficult problem left to solve here is how to best figure out > which queries have changed (or could have changed), but this does not > appear to affect the rest of the design. > > Anyone objections against such an approach or suggestions of any sort?
I think we should cache results per query condition rather than per query. Various queries can have the same condition but different parameters (e.g., showing it in a table vs. showing it in a list). We should only have one cache for two queries that have the same results. We can still have a query table (it would be interesting to store information about individual queries beyond the condition they have). For change propagation and cache update, it might help to decompose the query condition (along its PHP object structure as an SMWDescription). Every single description part is relatively simple, yet precise enough to check if it was affected by a change. In this way, changes could be pushed through the description objects. This would also allow much stronger reuse of subconditions that occur in many query descriptions. The details need to be worked out ... Markus ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Semediawiki-devel mailing list Semediawiki-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/semediawiki-devel