Also, if you haven't seen it yet, the 13.0.0 release adds considerably more
documentation around Acero, including the scheduler:
https://arrow.apache.org/docs/dev/cpp/acero/developer_guide.html#scheduling-and-parallelism
On Wed, Jul 26, 2023 at 10:13 AM Li Jin wrote:
> Thanks Weston! Very
Thanks Weston! Very helpful explanation.
On Tue, Jul 25, 2023 at 6:41 PM Weston Pace wrote:
> 1) As a rule of thumb I would probably prefer `async_scheduler`. It's more
> feature rich and simpler to use and is meant to handle "long running" tasks
> (e.g. 10s-100s of ms or more).
>
> The
> Replacing ... with ... works as expected
This is, I think, because the RecordBatchSourceNode defaults to implicit
ordering (note the RecordBatchSourceNode is a SchemaSourceNode):
```
struct SchemaSourceNode : public SourceNode {
SchemaSourceNode(ExecPlan* plan, std::shared_ptr schema,
> I think the key problem is that the input stream is unordered. The
> input stream is a ArrowArrayStream imported from python side, and then
> declared to a "record_batch_reader_source", which is a unordered
> source node. So the behavior is expected.
> I think the