> - I find the name dataguide misleading because it's a guide on the query and
> not on the data. Maybe QueryPruneGuide would be more meaningful

The query itself is not pruned, the data is. I think "dataguide" is the 
established term -- see for example this paper: 
http://ilpubs.stanford.edu:8090/264/1/1997-50.pdf . 

> - Can the user also use the zann_explores_json annotation?

Yes, the users can use it as well. But does it make sense for them to use it? 
If they have an external function -- it is automatically handled as if it has 
the annotation. For a UDF it doesn't really make any sense to add it. 

> - Why is the dataguide parameter on the Store's getCollection() function?
> Shouldn't it be on the function that returns the iterator? The problem is that
> a Collection object within the simplestore exists only once per collection.
> What's the semantics if multiple queries access the collection (possibly in
> parallel)?

It very much depends on how the collections are handled. Currently for Zorba 
collections it doesn't make sense to have any dataguides at all, because 
they're in-memory collections. I have not taken a look at the Sausalito code 
and have not seen how e.g. the MongoDB "collections" are managed. 
getCollection() seemed the most logical place where it should be passed, but 
the dataguide parameter could be easily propagated to any Store class, 
including the function that returns the iterator.

Currently each and every db:collection() call has its own dataguide, even if 
they might refer to the same collection. If the collection manager currently 
"caches" or reuses the collection iterators, then it might make sense to forbid 
that so that the dataguide for each individual db:collection call could be 

Or alternatively, an "union" on the dataguides that refer to the same 
collection could be performed. But I think it is not always possible to 
determine if that is the case. 

I think this could be investigated and decided upon when implementing the 
Dataguide push-down into MongoDB or when I would take a better look at the 
Sausalito's collection manager code.

> - Did you measure the performance impact of the optimizer on some larger
> queries?

The expression tree is traversed in its entirety once and only once, visiting 
each node, so the performance should not be very different from any other 
dataflow computation, e.g. ignores sorts/order/etc. If there are no "sources", 
i.e. db:collection() or jn:parse() calls, then the dataguide computation just 
propagates NULLs, doing no calculations and almost no memory allocations (at 
most one dataguide_cb allocation per fo_exprs and several others). If there are 
"sources" in the tree -- there will be some union operations being performed 
for some of the nodes. 

I will check if any of our larger queries have longer compilation times, but 
because none of them have db:collection() or jn:parse() calls, I do not expect 
any differences. 

It would make sense to have a specially constructed query that would do a 
stress-test of the dataguide code -- e.g. a 
db:collection().navigation.navigation. ... .navigation several thousand times 
or something similar. I will try that out and see if it manages to slow down 
the compilation.


Your team Zorba Coders is subscribed to branch lp:zorba.

Mailing list: https://launchpad.net/~zorba-coders
Post to     : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp

Reply via email to