Re: DocumentMK: improving the query API

Marcel Reutegger Tue, 17 Feb 2015 00:50:23 -0800

Hi,

On 03/02/15 14:55, "Julian Reschke" <[email protected]> wrote:
>Marcel and I chatted about this, and here are two API improvements we
>could do; these are independent, and add some complexity - in the
>optimal case we'll find out that doing one of these two would be
>sufficient.
>
>Proposal #1: improve declarative constraints
>
>Add a variant of query() such as:
>
>   <T extends Document> List<T> query(Collection<T> collection,
>                                      List<Constraint> constraints,
>                                      int limit);
>
>This would return all documents where all of the listed constraints are
>true (we currently do not seem to have a use case for a disjunction). A
>constraint would apply to an indexed property (such as "_id") and would
>allow the common comparisons, plus an "in" clause.
>
>This would be straightforward to support both in the Mongo- and
>RDBDocumentStore.


even though not strictly required, above method signature does not
have a start id for paging through a bigger set of matching documents.
the start id for the next batch needs to be added as a constraint, just
like any other regular constraint. From a client POV, I would probably
prefer an explicit parameter.

I also don't particularly like the List as return value. we used it
for other methods and it always turns out to be problematic to find
a reasonable number for the limit (aka batch size). the number depends
very much on the size of the returned documents.

>Proposal #2: add Java-based filtering and "sparse" documents
>
>This would add a "QueryFilter" parameter to queries. A filter would have
>
>- an optional way of selecting certain properties, and
>- an accept(Docucment) method
>
>Advantages:
>
>- if the filter only selects certain properties (say "_id",
>"_deletedOnce", and "_modified"), the persistence may not need to fetch
>the complete document representation from storage (in RDB, this would be
>true for any system property that has it's own column)
>
>- the accept method could have "arbitrary" complexity and would be
>responsible for generating the result set; for instance, it might only
>build a list of Strings containing the identifiers of matching documents
>(which would be sufficient for a subsequent delete operation).
>
>
>Note: Proposal #2 is more flexible, but as it's only partly declarative
>it makes it impossible to pass the selection constraints down to the
>persistence.

I think this is a major drawback of this approach. depending on the
selectivity of the filter, we may have to read a lot of documents
from the store just to find out they don't match.


we could also implement a combination of both. something like this:

<T extends Document> void query(Collection<T> collection,
                                List<Constraint> constraints,
                                ResultCollector<T> collector);

interface ResultCollector<T extends Document> {
    public boolean collect(T document);
}


Advantages:

- no need for limit and closeable. a client either collects
all results or interrupts by returning false in collect(). this
indicates to the DocumentStore that resources can be freed.

Drawback:

- does not work well with clients exposing results through
an iterator (pull vs. push).


Regards

 Marcel

Re: DocumentMK: improving the query API

Reply via email to