[ 
https://issues.apache.org/jira/browse/OAK-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-1726:
---------------------------------

    Attachment: OAK-1726-onheap-cache.patch

[patch|^OAK-1726-onheap-cache.patch] which uses an on heap LIRS Cache to store 
the byte content of files. Only those files are stored whose size is less than 
16 KB (size of Lucene blob sizes).

* Requires blob length encoded (OAK-1667)
* Only blob less than 16 KB are stored in cache
* Default max limit is 32

With this the numbers improve quite a bit

{noformat}
Oak-Mongo-FDS                      1       1       1       2       2    1247   
30168
Oak-Mongo-FDS (*)                  1       0       0       0       1       8  
129772
{noformat}

[~tmueller] Can you review the patch

> Improve support for local caching in BlobStore
> ----------------------------------------------
>
>                 Key: OAK-1726
>                 URL: https://issues.apache.org/jira/browse/OAK-1726
>             Project: Jackrabbit Oak
>          Issue Type: Wish
>          Components: blob
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>         Attachments: OAK-1726-onheap-cache.patch
>
>
> As notes in OAK-1702 currently BlobStore and FileDataStore do not perform 
> well when large number of small blobs are accessed frequently. 
> * FileDataStore - It creates a new instance of LazyInputStream [1] which has 
> finalize method implemented (by extending AutoCloseInputStream). This causes 
> slow GC [1] when large number of such streams are created. Further reading 
> lots of such small blob frequently causes lots of os calls for IO which are 
> slow
> * BlobStore - When binary content is stored remotely then accessing it 
> frequently would be costly if it is not cached locally
> To better support such access patterns we should have a caching BlobStore for 
> reads. At minimum blobs can be cached on heap. However a better approach 
> would be to save such blob content in a bigger file and memory map it. 
> Possibly using the Segment TarFile. In this mode the blobs would be saved off 
> heap and would not put pressure on GC
> [1] 
> https://github.com/apache/jackrabbit/blob/trunk/jackrabbit-data/src/main/java/org/apache/jackrabbit/core/data/LazyFileInputStream.java
> [2] 
> http://stackoverflow.com/questions/2954948/performance-implications-of-finalizers-on-jvm



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to