> - I find the name dataguide misleading because it's a guide on the query and
> not on the data. Maybe QueryPruneGuide would be more meaningful
The query itself is not pruned, the data is. I think "dataguide" is the
established term -- see for example this paper:
> - Can the user also use the zann_explores_json annotation?
Yes, the users can use it as well. But does it make sense for them to use it?
If they have an external function -- it is automatically handled as if it has
the annotation. For a UDF it doesn't really make any sense to add it.
> - Why is the dataguide parameter on the Store's getCollection() function?
> Shouldn't it be on the function that returns the iterator? The problem is that
> a Collection object within the simplestore exists only once per collection.
> What's the semantics if multiple queries access the collection (possibly in
It very much depends on how the collections are handled. Currently for Zorba
collections it doesn't make sense to have any dataguides at all, because
they're in-memory collections. I have not taken a look at the Sausalito code
and have not seen how e.g. the MongoDB "collections" are managed.
getCollection() seemed the most logical place where it should be passed, but
the dataguide parameter could be easily propagated to any Store class,
including the function that returns the iterator.
Currently each and every db:collection() call has its own dataguide, even if
they might refer to the same collection. If the collection manager currently
"caches" or reuses the collection iterators, then it might make sense to forbid
that so that the dataguide for each individual db:collection call could be
Or alternatively, an "union" on the dataguides that refer to the same
collection could be performed. But I think it is not always possible to
determine if that is the case.
I think this could be investigated and decided upon when implementing the
Dataguide push-down into MongoDB or when I would take a better look at the
Sausalito's collection manager code.
> - Did you measure the performance impact of the optimizer on some larger
The expression tree is traversed in its entirety once and only once, visiting
each node, so the performance should not be very different from any other
dataflow computation, e.g. ignores sorts/order/etc. If there are no "sources",
i.e. db:collection() or jn:parse() calls, then the dataguide computation just
propagates NULLs, doing no calculations and almost no memory allocations (at
most one dataguide_cb allocation per fo_exprs and several others). If there are
"sources" in the tree -- there will be some union operations being performed
for some of the nodes.
I will check if any of our larger queries have longer compilation times, but
because none of them have db:collection() or jn:parse() calls, I do not expect
It would make sense to have a specially constructed query that would do a
stress-test of the dataguide code -- e.g. a
db:collection().navigation.navigation. ... .navigation several thousand times
or something similar. I will try that out and see if it manages to slow down
Your team Zorba Coders is subscribed to branch lp:zorba.
Mailing list: https://launchpad.net/~zorba-coders
Post to : firstname.lastname@example.org
Unsubscribe : https://launchpad.net/~zorba-coders
More help : https://help.launchpad.net/ListHelp