[ 
https://issues.apache.org/jira/browse/OAK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-1925:
---------------------------------

    Attachment: OAK-1925.patch

Patch for using buffered stream io for writes. Couple of observations

* With this patch no significant difference is seen in WikipediaImport 
benchmark. Which is kind of expected as TarWrite is not writing very small size 
of byte data and hence perf would be similar
* Running an AEM 6 instance with this change show a minor improvement of 40 
sec. From ~360 sec to ~320 sec
* As reads for wriiten tar entries would not interfere with writes it might 
give a minor improvement 
* However there is a downside that in memory TarEntry cache would eat up at max 
256 MB of RAM

So no strong reason to change the existing impl as per numbers but my hunch is 
that streamed io might provide better throughput!! Recently Lucene removed all 
usage of RandomAccessFile with Stream (LUCENE-5678)

[~jukkaz] [~alex.parvulescu] Can you have a look?


> Use streamed io instead of RandomAccessFile in TarWriter
> --------------------------------------------------------
>
>                 Key: OAK-1925
>                 URL: https://issues.apache.org/jira/browse/OAK-1925
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segmentmk
>            Reporter: Chetan Mehrotra
>            Priority: Minor
>         Attachments: OAK-1925.patch
>
>
> TarWriter currently uses RandomAccessFile to 
> * Write the tar entries
> * Read written entries
> The write however are currently sequential. It might be better to use 
> streamed buffered io for the write and maintain an in memory cache of written 
> tar entries to serve the reads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to