Re: scheduler() and aync_scheduler() on QueryContext

2023-07-26 Thread Weston Pace
Also, if you haven't seen it yet, the 13.0.0 release adds considerably more documentation around Acero, including the scheduler: https://arrow.apache.org/docs/dev/cpp/acero/developer_guide.html#scheduling-and-parallelism On Wed, Jul 26, 2023 at 10:13 AM Li Jin wrote: > Thanks Weston! Very

Re: scheduler() and aync_scheduler() on QueryContext

2023-07-26 Thread Li Jin
Thanks Weston! Very helpful explanation. On Tue, Jul 25, 2023 at 6:41 PM Weston Pace wrote: > 1) As a rule of thumb I would probably prefer `async_scheduler`. It's more > feature rich and simpler to use and is meant to handle "long running" tasks > (e.g. 10s-100s of ms or more). > > The

Re: how to make acero output order by batch index

2023-07-26 Thread Weston Pace
> Replacing ... with ... works as expected This is, I think, because the RecordBatchSourceNode defaults to implicit ordering (note the RecordBatchSourceNode is a SchemaSourceNode): ``` struct SchemaSourceNode : public SourceNode { SchemaSourceNode(ExecPlan* plan, std::shared_ptr schema,

Re: how to make acero output order by batch index

2023-07-26 Thread Weston Pace
> I think the key problem is that the input stream is unordered. The > input stream is a ArrowArrayStream imported from python side, and then > declared to a "record_batch_reader_source", which is a unordered > source node. So the behavior is expected. > I think the