Francois Saint-Jacques created ARROW-6854:
---------------------------------------------

             Summary: [Dataset] RecordBatchProjector is not thread safe
                 Key: ARROW-6854
                 URL: https://issues.apache.org/jira/browse/ARROW-6854
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Francois Saint-Jacques


While working on ARROW-6769 I noted that RecordbBatchProjector is not thread 
safe. My goal is to use this class to wrap the ScanTaskIterator in another 
ScanTaskIterator that projects, so producer (fragments) don't have to know 
about this schema. The issue is that ScanTask are expected to run on concurrent 
thread. The projector will be invoked by multiple thread.

The lack of concurrency safety is due to adaptivity of input schemas and 
`SetInputSchema` stores in a local cache. I suggest we refactor into 2 classes. 
 # `RecordBatchProjector` which will work with a static `from` schema, i.e. no 
adaptivity. The schema is defined at construct time. This class is thread safe 
to invoke after construction since no local modification is done.
 # `AdaptiveRecordBatchProjector` which will have a cache map[schema_hash, 
std::shared_ptr<RecordBatchProjector>] protected with a mutex. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to