[ 
https://issues.apache.org/jira/browse/OAK-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15656542#comment-15656542
 ] 

Thomas Mueller edited comment on OAK-4903 at 11/11/16 9:24 AM:
---------------------------------------------------------------

What happens now:

* Queries don't see new data. The problem is that at startup, queries don't see 
any Lucene index, so they use traversal. It seems hard to make this work 
reliably.

Options for Oak 1.6:

* Plan A: Wait until the binary is available (retry loop), in the thread that 
prepares opening index readers.
* Plan B: Delay writing the index nodes for some time, until the binaries are 
on S3. We could use a "CallbackInputStream" that can get notified when the 
binary is on S3.

Options for Oak 1.8:

* Directly stream to S3 (in addition to, or instead of, writing to the local 
file system), and "put copy" the S3 entry afterwards to the right file name; if 
possible using a channel instead of a stream; maybe do this only for Lucene 
binaries.
* Don't use a content hash, but instead use a UUID.

Rejected:

* Store those binaries in MongoDB (until they are on S3): this can cause 
secondaries to get detached, which can cause big problems.
* Chunk binaries so they are not stored in the datastore. That would be even 
worse than storing the binaries in MongoDB, because that would need more space.
* Use the broadcast cache to distribute binaries. This would need changes in 
the broadcasting cache, and we would rely on this optional feature.



was (Author: tmueller):
Options:

* Don't use a content hash, but instead use a UUID
* Directly stream to S3 (in addition to, or instead of, writing to the local 
file system), and "put copy" the S3 entry afterwards to the right file name; if 
possible using a channel instead of a stream; maybe do this only for Lucene 
binaries
* On the client side, wait until the binary is available
* Store those binaries in MongoDB (until they are on S3)
* Chunk binaries so they are not stored in the datastore
* Delay writing the index nodes for some time, until the binaries are on S3
* Use the broadcast cache to distribute binaries


> Async uploads in S3 causes issues in a cluster
> ----------------------------------------------
>
>                 Key: OAK-4903
>                 URL: https://issues.apache.org/jira/browse/OAK-4903
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: blob
>            Reporter: Amit Jain
>            Assignee: Amit Jain
>            Priority: Critical
>             Fix For: 1.6
>
>
> S3DataStore and CachingFDS through the CachingDataStore enable async uploads. 
> This causes problems in clustered setups where uploads can sometimes be 
> visible after a delay. During this time any request for the corresponding 
> asset/file would return errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to