Re: [C++] MakeReaderGenerator Behavior using GetCPUThreadPool

2022-07-25 Thread Weston Pace
1) Yes, that sounds correct. The file readers will read from files in parallel (even if there is one file it can read from row groups in parallel). There is no guarantee these reads will finish sequentially. 2) Hmm, this one will work for now, because the executor==nullptr behavior is to borrow

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Remzi Yang
Should we also do this change in arrow-rs? Remzi On Tue, 26 Jul 2022 at 11:25, Neal Richardson wrote: > Many of the subtasks on https://issues.apache.org/jira/browse/ARROW-15689 > have already been done. What's left is to update archery and crossbow, then > we can ask Infra to make the switch.

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Neal Richardson
Many of the subtasks on https://issues.apache.org/jira/browse/ARROW-15689 have already been done. What's left is to update archery and crossbow, then we can ask Infra to make the switch. Is anyone able to take those subtasks on? Neal On Mon, Jul 25, 2022 at 4:58 PM Matthew Topol wrote: > I'm in

[C++] MakeReaderGenerator Behavior using GetCPUThreadPool

2022-07-25 Thread Ivan Chau
Hey all, While investigating the in-order behavior of the SourceNode, we found some interesting observations: 1) The ExecContext should use nullptr for its executor to guarantee any sequential behavior (as discussed previously). We found cases where our File BatchReader was reading out of order w

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-07-25 Thread Weston Pace
I'll hijack this thread for a bit of road mapping. There are a number of significant infrastructure changes that are on my mind regarding Acero. I'll list them here in no particular order. * [1] The scanner needs updated to properly support cancellation I mainly mention this here as it is a

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread David Li
Ah, thanks James. Maybe we can disable most/all optional components to start with on x86? Including Parquet, etc. (so we shouldn't need Boost at first) If you rebase again the MinGW Flight issues should be fixed. [1] [1]: https://github.com/apache/arrow/pull/13696 On Mon, Jul 25, 2022, at 17:48

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread James Duong
Arkadiy's PR can take the changes for CI from my PR to get it past the build. The build process revealed more changes necessary though. In the library code there are a few places where we'll use ARROW_POPCOUNT64 directly and they fail to build on 32-bit. Aside from this, some of the tests are fail

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread David Li
The PRs are: https://github.com/apache/arrow/pull/13659 https://github.com/apache/arrow/pull/13532 The JIRA is https://issues.apache.org/jira/browse/ARROW-16778 I suppose I agree with Arkadiy: we could merge the initial PR, suppressing warnings for the x86 build. But then we can merge James's f

Re: Preparing for version 9.0.0 release

2022-07-25 Thread Krisztián Szűcs
On Fri, Jul 22, 2022 at 1:08 PM Jacob Wujciak wrote: > > Hello Everyone, > > Currently there have been 452 JIRA tickets resolved for 9.0.0 with 23 > tickets (ex. blockers) still open, good job everyone! > > Thanks to everyone's work we were able to reduce the number of blockers and > have PRs [1][

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread Wes McKinney
Suppressing the warnings on 32-bit MSVC sounds like a reasonable compromise. Is there an open PR for this (and what is the corresponding Jira issue so we don't lose track of it)? On Fri, Jul 22, 2022 at 1:23 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) wrote: > > Or live with the warnings. Or cast

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-25 Thread Aldrin
oh, perfect. I'll just link the JIRAs. Thanks Kou! Aldrin Montana Computer Science PhD Student UC Santa Cruz On Mon, Jul 25, 2022 at 1:53 PM Sutou Kouhei wrote: > Hi, > > https://issues.apache.org/jira/browse/ARROW-17092 may be > related. > > Thanks, > -- > kou > > In > "Re: [Rust] IPC Form

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Matthew Topol
I'm in favor of it, for what it's worth. --Matt On Mon, Jul 25 2022 at 02:56:31 PM -0600, Wes McKinney wrote: hi all, Do you think we could make a push to make this happen after the 9.0.0 release goes out? Thanks Wes On Tue, Feb 15, 2022 at 2:32 PM Fiona La >

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Wes McKinney
hi all, Do you think we could make a push to make this happen after the 9.0.0 release goes out? Thanks Wes On Tue, Feb 15, 2022 at 2:32 PM Fiona La wrote: > > Thank you Antoine for bringing up the engineering work that is required to > enable this. And thank you Neal for sharing the link to th

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-25 Thread Sutou Kouhei
Hi, https://issues.apache.org/jira/browse/ARROW-17092 may be related. Thanks, -- kou In "Re: [Rust] IPC Format / Feather support in Datafusion" on Mon, 25 Jul 2022 13:39:54 -0700, Aldrin wrote: > It seems unfortunate to me that the feather file format doc page [1] > appears to have been

Re: [FlightSql] Spark Flight SQL

2022-07-25 Thread David Li
Got it. It seems there's potential benefits to have this work together with Spark Connect (at least, there's two very similar wire protocols going on that could possibly be consolidated). Have you shown this to the Spark community as well? You caught me in the middle of things, the PR to add th

Re: [Rust] IPC Format / Feather support in Datafusion

2022-07-25 Thread Aldrin
It seems unfortunate to me that the feather file format doc page [1] appears to have been forgotten when those IPC file format docs were written [2][3]. I will find or make a JIRA to make this info consistent in the docs. [1]: https://arrow.apache.org/docs/python/feather.html [2]: https://arrow.ap

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Li Jin
Hold on Yaron - I think Ivan and I got something working with existing code - Ivan will post details in a bit On Mon, Jul 25, 2022 at 3:25 PM Yaron Gvili wrote: > Yes, I think you mean this post by Weston< > https://lists.apache.org/thread/llfm5dfh2988w2w4j6off417w9szp1tg>. I'll > look into addi

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Yaron Gvili
Yes, I think you mean this post by Weston. I'll look into adding this sequential-option to source-node and report back. Yaron. From: Li Jin Sent: Monday, July 25, 2022 11:39 AM To: dev@arrow.apac

Re: [FlightSql] Spark Flight SQL

2022-07-25 Thread Tornike Gurgenidze
1) David, thanks for mentioning that. tbh, this is the first time I'm reading about it. If we are talking only about SparkFlightSql, It does seem similar in the sense that both use Arrow for streaming data back to the client. The difference is the Spark Connect client still has a dependency on Spar

Re: [VOTE][RUST] Release Apache Arrow Rust 19.0.0 RC1

2022-07-25 Thread Chao Sun
+1 (non-binding). Verified on Intel Mac. Thanks, Andrew. Chao On Mon, Jul 25, 2022 at 12:52 AM Martin Grigorov wrote: > > +1 (non-binding) > > Tested on Ubuntu 20.04.4 x86_64 and openEuler 20.03 aarch64. > > Regards, > Martin > > On Fri, Jul 22, 2022 at 7:56 PM Andrew Lamb wrote: > > > Hi, > >

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Li Jin
Now I think about it more. Weston has probably answered this in another mailing thread that this is not guaranteed and the observation of batches becoming out of file reader + source node happened by chance. Perhaps we can look into adding an option to Source node to ensure "sequential".. Li On M

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Yaron Gvili
I've also been using source node with a generator, but observed batches in random order (in a 1-to-2-months old version of Arrow). So, I'd be surprised if ordering is guaranteed, and I'm also interested in how to obtain such a guarantee. Yaron. From: Li Jin Se

Re: [C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Li Jin
Sorry the link to the generator above is wrong - We traced into the code and found it uses BackgroundGenerator: https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L1581 On Mon, Jul 25, 2022 at 11:07 AM Li Jin wrote: > Hi, > > Ivan a

[C++] Clarifying the behavior of source node and executor

2022-07-25 Thread Li Jin
Hi, Ivan and I are debugging some behavior of the source node this morning and I was hoping to clarify that our understanding is correct. We observed that when using source node with a generator: https://github.com/apache/arrow/blob/66c66d040bbf81a4819b276aee306625dc02837c/cpp/src/arrow/compute/e

Re: [FlightSql] Spark Flight SQL

2022-07-25 Thread David Li
So this is now both a Flight SQL producer and consumer for Spark? That is very cool. A couple things I was wondering about: - How do you think this compares to the Spark Connect proposal? [1] - Have you considered ADBC [2] instead of Flight SQL for the DataSourceV2 implementation? While still u

Re: [VOTE][RUST] Release Apache Arrow Rust 19.0.0 RC1

2022-07-25 Thread Martin Grigorov
+1 (non-binding) Tested on Ubuntu 20.04.4 x86_64 and openEuler 20.03 aarch64. Regards, Martin On Fri, Jul 22, 2022 at 7:56 PM Andrew Lamb wrote: > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 19.0.0. > > This release candidate is based on commit: > c