DocumentMK: improving the query API

Julian Reschke Tue, 03 Feb 2015 05:57:13 -0800

The DocumentMK (formerly "MongoMK") uses the DocumentStore API(org.apache.jackrabbit.oak.plugins.document) for persistence. Wecurrently have three implementations of this API:


1) MemoryDocumentStore (mainly for testing),
2) MongoDocumentStore, and
3) RDBDocumentStore (only in trunk for now).

In theory, the DocumentMK code should be persistence-agnostic; inpractice it has a few hardwired optimizations for Mongo. These are usedfor recovery and maintenance tasks.

Mongo-specific optimizations are mainly there because of the way theDocumentStore API handles queries:

/**

* Get a list of documents where the key is greater than a startvalue and* less than an end value <em>and</em> the given "indexed property"is greater

   * or equals the specified value.
   * <p>

* The indexed property can either be a {@link Long} value, in whichcase numeric* comparison applies, or a {@link Boolean} value, in which case"false" is mapped

   * to "0" and "true" is mapped to "1".
   * <p>
   * The returned documents are sorted by key and are immutable.
   *
   * @param <T> the document type
   * @param collection the collection
   * @param fromKey the start value (excluding)
   * @param toKey the end value (excluding)
   * @param indexedProperty the name of the indexed property (optional)
   * @param startValue the minimum value of the indexed property
   * @param limit the maximum number of entries to return
   * @return the list (possibly empty)
   */
  @Nonnull
  <T extends Document> List<T> query(Collection<T> collection,
                                     String fromKey,
                                     String toKey,
                                     String indexedProperty,
                                     long startValue,
                                     int limit);

So the following criteria can be used to constrain a query:

a) range of IDs
b) a single greater-Or-equals condition

In the maintenance tasks however we need additional constraints, such as:

- a condition other than greater-or-equals
- a conjunction of multiple constraints

Also, for big result sets the response type (a list) is sub-optimalbecause a store might contain large NodeDocuments. Finally, there arefilter criteria that are hard/impossible to express declaratively.

Marcel and I chatted about this, and here are two API improvements wecould do; these are independent, and add some complexity - in theoptimal case we'll find out that doing one of these two would be sufficient.



Proposal #1: improve declarative constraints

Add a variant of query() such as:

  <T extends Document> List<T> query(Collection<T> collection,
                                     List<Constraint> constraints,
                                     int limit);

This would return all documents where all of the listed constraints aretrue (we currently do not seem to have a use case for a disjunction). Aconstraint would apply to an indexed property (such as "_id") and wouldallow the common comparisons, plus an "in" clause.

This would be straightforward to support both in the Mongo- andRDBDocumentStore.



Proposal #2: add Java-based filtering and "sparse" documents

This would add a "QueryFilter" parameter to queries. A filter would have

- an optional way of selecting certain properties, and
- an accept(Docucment) method

Advantages:

- if the filter only selects certain properties (say "_id","_deletedOnce", and "_modified"), the persistence may not need to fetchthe complete document representation from storage (in RDB, this would betrue for any system property that has it's own column)

- the accept method could have "arbitrary" complexity and would beresponsible for generating the result set; for instance, it might onlybuild a list of Strings containing the identifiers of matching documents(which would be sufficient for a subsequent delete operation).

Note: Proposal #2 is more flexible, but as it's only partly declarativeit makes it impossible to pass the selection constraints down to thepersistence.


Feedback appreciated...

DocumentMK: improving the query API

Reply via email to