[
https://issues.apache.org/jira/browse/OAK-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-1925:
---------------------------------
Attachment: OAK-1925.patch
Patch for using buffered stream io for writes. Couple of observations
* With this patch no significant difference is seen in WikipediaImport
benchmark. Which is kind of expected as TarWrite is not writing very small size
of byte data and hence perf would be similar
* Running an AEM 6 instance with this change show a minor improvement of 40
sec. From ~360 sec to ~320 sec
* As reads for wriiten tar entries would not interfere with writes it might
give a minor improvement
* However there is a downside that in memory TarEntry cache would eat up at max
256 MB of RAM
So no strong reason to change the existing impl as per numbers but my hunch is
that streamed io might provide better throughput!! Recently Lucene removed all
usage of RandomAccessFile with Stream (LUCENE-5678)
[~jukkaz] [~alex.parvulescu] Can you have a look?
> Use streamed io instead of RandomAccessFile in TarWriter
> --------------------------------------------------------
>
> Key: OAK-1925
> URL: https://issues.apache.org/jira/browse/OAK-1925
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segmentmk
> Reporter: Chetan Mehrotra
> Priority: Minor
> Attachments: OAK-1925.patch
>
>
> TarWriter currently uses RandomAccessFile to
> * Write the tar entries
> * Read written entries
> The write however are currently sequential. It might be better to use
> streamed buffered io for the write and maintain an in memory cache of written
> tar entries to serve the reads.
--
This message was sent by Atlassian JIRA
(v6.2#6252)