[
https://issues.apache.org/jira/browse/OAK-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663186#comment-15663186
]
Takahito Kikuchi commented on OAK-4903:
---------------------------------------
Please let me add some comments.
Regarding Plans A and B of 1.6, I have some concerns due to speed of uploading
binary and writing the index. Oak index merge policy has maxMergedSegmentMB
5120.
Uploading 5GB file takes much time (several minutes in general), consequently
it would take much time until both of cluster nodes work fine again, I guess.
The following is debug message during index merge.
mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30,
maxMergedSegmentMB=5120.0, floorSegmentMB=2.0,
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0,
maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1
indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@6973da3c
Actually, I saw bigger index files than 5GB generated on Jackrabbit2. That
caused performance issue even on local filesystem. So, there are some
possibilities that large index files would occur on even oak. That's why I'm
concerned. Hence, if there is not any mitigation for S3, index merge policy
should be changed.
Also, I'm curious how much "Open index asynchronously" of LuceneIndexProvider
and CopyOnRead, CopyOnWrite and "Persisting indexes to FileSystem" on [1]
affect this issue of S3 for cluster.
Just my guess is cluster system with S3 might work fine if both of "Persisting
indexes to FileSystem" and shared file system for index only are configured. If
there is not any exclusive control that index merge or update is managed by
either of instance, some conflict might occur on shared file system. I'm not
sure whether the control is needed, though. What do you think?
[1] http://jackrabbit.apache.org/oak/docs/query/lucene.html
> Async uploads in S3 causes issues in a cluster
> ----------------------------------------------
>
> Key: OAK-4903
> URL: https://issues.apache.org/jira/browse/OAK-4903
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: blob
> Reporter: Amit Jain
> Assignee: Amit Jain
> Priority: Critical
> Fix For: 1.6
>
>
> S3DataStore and CachingFDS through the CachingDataStore enable async uploads.
> This causes problems in clustered setups where uploads can sometimes be
> visible after a delay. During this time any request for the corresponding
> asset/file would return errors.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)