Hi,

I have a question on how to use the Acero push model to write streaming
data as hive partitioning Parquet in a single thread program. Can anyone
guide what's the best practice here and if my below understandings are
correct:

   - I receive streaming data via a callback function which gives me data
   row by row. To my best knowledge, Subclassing RecordBatchReader is
   preferred?
   - Should I batch a fixed number rows in some in memory data structure
   first, then flush them to acero? Then how could acero know it's time to
   push data in ReadNext
   
<https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow17RecordBatchReader8ReadNextEPNSt10shared_ptrI11RecordBatchEE>
    function?

I'm not clear on how to connect a call back function from streaming data
with Aecro push model. Any suggestions will be appreciated.


Thanks.

Best,
Haocheng

Reply via email to