Re: DocumentNodeStore collection paging

Marcel Reutegger Wed, 29 Oct 2014 03:24:08 -0700

Hi,

On 25/10/14 12:24, "Julian Reschke" <[email protected]> wrote:
>Hi there.
>
>I was looking at the performance of code that creates large collections
>(CreateManyChildNodesTest, XmlImportTest), and found:
>
>>     Iterable<NodeDocument> readChildDocs(@Nonnull final String path,
>>                                          @Nullable String name,
>>                                          int limit) {
>>         String to = Utils.getKeyUpperLimit(checkNotNull(path));
>>         String from;
>>         if (name != null) {
>>             from = Utils.getIdFromPath(PathUtils.concat(path, name));
>>         } else {
>>             from = Utils.getKeyLowerLimit(path);
>>         }
>>         if (name != null || limit > NUM_CHILDREN_CACHE_LIMIT) {
>>             // do not use cache when there is a lower bound name
>>             // or more than 16k child docs are requested
>>             return store.query(Collection.NODES, from, to, limit);
>>         }
>
>So *if* we use paging, only the first page use the cache.
>
>Paging appears to use maximally 1600 entries at once (DocumentNodeState):
>
>>     /**
>>      * The number of child nodes to fetch initially.
>>      */
>>     static final int INITIAL_FETCH_SIZE = 100;
>>
>>     /**
>>      * The maximum number of child nodes to fetch in one call. (1600).
>>      */
>>     static final int MAX_FETCH_SIZE = INITIAL_FETCH_SIZE << 4;
>
>The maximum number of entries that could be cached however seems to be
>bigger:
>
>>     /**
>>      * Do not cache more than this number of children for a document.
>>      */
>>     static final int NUM_CHILDREN_CACHE_LIMIT =
>>Integer.getInteger("oak.documentMK.childrenCacheLimit", 16 * 1024);
>
>Maybe it's just me, but something seems to be weird here:
>
>1) why the different limits that do not seem to be consistent?


the constants are used for different purposes. The fetch size is used when
paging through the children list. the NUM_CHILDREN_CACHE_LIMIT is
an upper limit when the number of children is counted.

>2) why disabling caching for all but the first page?

the idea behind this is, we don't want to put pressure on the cache when
large child node lists are read. But we can definitively revisit this
if it turns out it was a bad decision.

Regards

 Marcel

Re: DocumentNodeStore collection paging

Reply via email to