Re: DocumentMK: improving the query API

Julian Reschke Tue, 17 Feb 2015 02:43:30 -0800

On 2015-02-17 09:49, Marcel Reutegger wrote:

Hi,


On 03/02/15 14:55, "Julian Reschke" <[email protected]> wrote:

Marcel and I chatted about this, and here are two API improvements we
could do; these are independent, and add some complexity - in the
optimal case we'll find out that doing one of these two would be
sufficient.

Proposal #1: improve declarative constraints

Add a variant of query() such as:

   <T extends Document> List<T> query(Collection<T> collection,
                                      List<Constraint> constraints,
                                      int limit);

This would return all documents where all of the listed constraints are
true (we currently do not seem to have a use case for a disjunction). A
constraint would apply to an indexed property (such as "_id") and would
allow the common comparisons, plus an "in" clause.

This would be straightforward to support both in the Mongo- and
RDBDocumentStore.


even though not strictly required, above method signature does not
have a start id for paging through a bigger set of matching documents.
the start id for the next batch needs to be added as a constraint, just
like any other regular constraint. From a client POV, I would probably
prefer an explicit parameter.

OK.

I also don't particularly like the List as return value. we used it
for other methods and it always turns out to be problematic to find
a reasonable number for the limit (aka batch size). the number depends
very much on the size of the returned documents.

Well, there are two questions here: List vs Iterator, and what type toactually use.

Proposal #2: add Java-based filtering and "sparse" documents

This would add a "QueryFilter" parameter to queries. A filter would have

- an optional way of selecting certain properties, and
- an accept(Docucment) method

Advantages:

- if the filter only selects certain properties (say "_id",
"_deletedOnce", and "_modified"), the persistence may not need to fetch
the complete document representation from storage (in RDB, this would be
true for any system property that has it's own column)

- the accept method could have "arbitrary" complexity and would be
responsible for generating the result set; for instance, it might only
build a list of Strings containing the identifiers of matching documents
(which would be sufficient for a subsequent delete operation).


Note: Proposal #2 is more flexible, but as it's only partly declarative
it makes it impossible to pass the selection constraints down to the
persistence.


I think this is a major drawback of this approach. depending on the
selectivity of the filter, we may have to read a lot of documents
from the store just to find out they don't match.


Indeed.

we could also implement a combination of both. something like this:

<T extends Document> void query(Collection<T> collection,
                                 List<Constraint> constraints,
                                 ResultCollector<T> collector);

interface ResultCollector<T extends Document> {
     public boolean collect(T document);
}


Advantages:

- no need for limit and closeable. a client either collects
all results or interrupts by returning false in collect(). this
indicates to the DocumentStore that resources can be freed.

But it lacks the declarative part of "limit" (it's useful to be able totell the DB upfront how many results we want to see).

Drawback:

- does not work well with clients exposing results through
an iterator (pull vs. push).


Indeed.

Best regards, Julian

Re: DocumentMK: improving the query API

Reply via email to