Hi, I have a question on how to use the Acero push model to write streaming data as hive partitioning Parquet in a single thread program. Can anyone guide what's the best practice here and if my below understandings are correct:
- I receive streaming data via a callback function which gives me data row by row. To my best knowledge, Subclassing RecordBatchReader is preferred? - Should I batch a fixed number rows in some in memory data structure first, then flush them to acero? Then how could acero know it's time to push data in ReadNext <https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4N5arrow17RecordBatchReader8ReadNextEPNSt10shared_ptrI11RecordBatchEE> function? I'm not clear on how to connect a call back function from streaming data with Aecro push model. Any suggestions will be appreciated. Thanks. Best, Haocheng
