[ 
https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746919#comment-16746919
 ] 

Anurag Khandelwal commented on ARROW-4294:
------------------------------------------

cc [~pcmoritz]

> [Plasma] Add support for evicting objects to external store
> -----------------------------------------------------------
>
>                 Key: ARROW-4294
>                 URL: https://issues.apache.org/jira/browse/ARROW-4294
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++, Plasma (C++)
>    Affects Versions: 0.11.1
>            Reporter: Anurag Khandelwal
>            Priority: Minor
>              Labels: features, pull-request-available
>             Fix For: 0.13.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, when Plasma needs storage space for additional objects, it evicts 
> objects by deleting them from the Plasma store. This is a problem when it 
> isn't possible to reconstruct the object or reconstructing it is expensive. 
> Adding support for a pluggable external store that Plasma can evict objects 
> to will address this issue. 
> My proposal is described below.
> *Requirements*
>  * Objects in Plasma should be evicted to a external store rather than being 
> removed altogether
>  * Communication to the external storage service should be through a very 
> thin, shim interface. At the same time, the interface should be general 
> enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.)
>  * Should be pluggable (e.g., it should be simple to add in or remove the 
> external storage service for eviction, switch between different remote 
> services, etc.) and easy to implement
> *Assumptions/Non-Requirements*
>  * The external store has practically infinite storage
>  * The external store's write operation is idempotent and atomic; this is 
> needed ensure there are no race conditions due to multiple concurrent 
> evictions of the same object.
> *Proposed Implementation*
>  * Define a ExternalStore interface with a Connect call. The call returns an 
> ExternalStoreHandle, that exposes Put and Get calls. Any external store that 
> needs to be supported has to have this interface implemented.
>  * In order to read or write data to the external store in a thread-safe 
> manner, one ExternalStoreHandle should be created per-thread. While the 
> ExternalStoreHandle itself is not required to be thread-safe, multiple 
> ExternalStoreHandles across multiple threads should be able to modify the 
> external store in a thread-safe manner. These handles are most likely going 
> to be wrappers around the external store client interfaces.
>  * Replace the DeleteObjects method in the Plasma Store with an EvictObjects 
> method. If an external store is specified for the Plasma store, the 
> EvictObjects method would mark the object state as PLASMA_EVICTED, write the 
> object data to the external store (via the ExternalStoreHandle) and reclaim 
> the memory associated with the object data/metadata rather than remove the 
> entry from the Object Table altogether. In case there is no valid external 
> store, the eviction path would remain the same (i.e., the object entry is 
> still deleted from the Object Table).
>  * The Get method in Plasma Store now tries to fetch the object from external 
> store if it is not found locally and there is an external store associated 
> with the Plasma Store. The method tries to offload this to an external worker 
> thread pool with a fire-and-forget model, but may need to do this 
> synchronously if there are too many requests already enqueued.
>  * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, 
> which can be appended to with implementations of the ExternalStore and 
> ExternalStoreHandle interfaces, which will then be compiled into the 
> plasma_store_server executable.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to