Re: nullptr for mutable data in pyarrow table from pandas

2021-04-20 Thread Weston Pace
If it comes from pandas (and is eligible for zero-copy) then the buffer implementation will be `NumPyBuffer`. Printing one in GDB yields... ``` $12 = {_vptr.Buffer = 0x7f0b66e147f8 , is_mutable_ = true, is_cpu_ = true, data_ = 0x55b71f901a70 "\001", mutable_data_ = 0x0, size_ = 16, capacity_ =

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Sutou Kouhei
Hi Weston, Sorry. I didn't think that you don't use our verification script. If you don't want to use our verification script, you need to adjust Yum repository URL manually. Because apache-arrow-release RPM registers Yum repository URL for released version (not RC). We need to use Yum

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Jonathan Keane
I've also got a PR up for https://issues.apache.org/jira/browse/ARROW-12485 which sets the mimalloc default we thought we had already set. On Tue, Apr 20, 2021 at 7:41 PM Weston Pace wrote: > Hmm, I wasn't actually running any script, just trying to access > things manually. When I ran the

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Weston Pace
Hmm, I wasn't actually running any script, just trying to access things manually. When I ran the script I got errors because dnf wasn't installed and then when I installed dnf I find that dnf does not like the AWS mirrors... Repository u'amzn2-core': Error parsing config: Error parsing

nullptr for mutable data in pyarrow table from pandas

2021-04-20 Thread Niranda Perera
Hi all, We have been using Arrow v2.0.0 and we encountered the following issue. I was reading a table with numeric data using pandas.read_csv and then converting it into pyarrow table. In our application (Cylon ), we are accessing this pyarrow table from c++.

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Sutou Kouhei
Hi Weston, It seems that you use old verification script. Could you confirm that you use the verification script on master? Thanks, -- kou In "Re: [VOTE] Release Apache Arrow 4.0.0 - RC1" on Tue, 20 Apr 2021 11:23:25 -1000, Weston Pace wrote: > I'm not sure if it is blocking (and it

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Krisztián Szűcs
Thanks David! It is somewhat hard to judge, but we shouldn't release regressions in general. I can cut another RC tomorrow including your fix! Thanks, Krisztian On Wed, Apr 21, 2021 at 12:42 AM David Li wrote: > > Sorry, after some more testing I think ARROW-12487 should be a blocker > after

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread David Li
Sorry, after some more testing I think ARROW-12487 should be a blocker after all as it is a regression from 3.0 to 4.0. There's a PR up to fix it that can be backported to the 4.0 branch. [1]: https://issues.apache.org/jira/browse/ARROW-12487 Best, David On 2021/04/20 21:33:28, David Li

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread David Li
Similarly, I'm not sure if this is blocking per se, but I found an issue in a new API (that was unfortunately plumbed into some existing APIs): https://issues.apache.org/jira/browse/ARROW-12487 This should not affect anything in verification, but if we do end up with an RC2, it would be nice to

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Weston Pace
I'm not sure if it is blocking (and it might even be expected given the current status of jfrog) but I attempted to install the CentOS 7 RPM and got the following error when I ran `sudo yum update` after installing the arrow repo rpm.

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Jonathan Keane
I'm still working on my verification, but as part of that noticed that https://issues.apache.org/jira/browse/ARROW-12316 which we thought changed the default memory allocator didn't fully accomplish that. Nothing is broken per se, but jemalloc is still the default on macOS. I've made

Re: [Gandiva] Replacing the LRU cache in gandiva

2021-04-20 Thread Julian Hyde
We would love to use Gandiva in Apache Calcite [1] but we are blocked because the JAR on Maven Central doesn't work on macOS, Linux or Windows [2] and there seems to be no interest in fixing the problem. So I doubt whether anyone is using Gandiva in production (unless they have built the

Re: Arrow JS Meetup (02/13)

2021-04-20 Thread Dominik Moritz
Hi Naveen, I don’t know whether Brian took separate notes but here is the outline with some links: https://docs.google.com/document/d/1IPd9AS_0RxYOH8WiKWcqqaMoGW96xnC17cHxGwl6Ryc/edit?usp=sharing. After the meetup, a few people gathered in the arrow-js channel on the ASF Slack where we Paul has

Re: Arrow JS Meetup (02/13)

2021-04-20 Thread Naveen Michaud-Agrawal
HI Brian, Unfortunately just seeing this meetup notice now. Were you able to capture any notes from the meeting? Thanks, Naveen On Tue, Feb 9, 2021 at 11:38 AM Brian Hulette wrote: > Hi all, > > +Dominik Moritz recently reached out to +Paul Taylor > and myself to set up an Arrow JS meetup

Re: [C++][Python] ORC in pyarrow wheels?

2021-04-20 Thread Krisztián Szűcs
Hi! ORC should be enabled in the manylinux and macos wheels since we switched to vcpkg as the dependency source. See the relevant sections in the build scripts: [1], [2]. Note that the ORC write support PR has been merged after the 4.0.0RC1 tag. Regards, Krisztian [1]:

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-20 Thread Andrew Lamb
An update: we have had success filtering out git history (using git-filter-repo) in the arrow-rs and arrow-datafusion repos, and have made most of the required changes to start accepting PRs in the new repo. Stay tuned. On Tue, Apr 20, 2021 at 2:30 AM Jorge Cardoso Leitão <

[NIGHTLY] Arrow Build Report for Job nightly-2021-04-20-0

2021-04-20 Thread Crossbow
Arrow Build Report for Job nightly-2021-04-20-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-04-20-0 Failed Tasks: - centos-7-amd64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-04-20-0-github-centos-7-amd64 -

Re: [Gandiva] Replacing the LRU cache in gandiva

2021-04-20 Thread Vivekanand Vellanki
We are considering using an on-disk - this is planned for later. Even with an on-disk cache, we still need an eviction policy to ensure that Gandiva doesn't use up the entire disk. For now, we are assuming that we can measure the cost accurately - the assumption is that the query engine would use

Re: [Gandiva] Replacing the LRU cache in gandiva

2021-04-20 Thread Antoine Pitrou
Hi Projjal, The main issue here is to compute the cost accurately (is it computation runtime? memory footprint? can you measure the computation time accurately, regardless of system noise - e.g. other threads and processes?). Intuitively, if the LRU cache shows too many misses, a simple

Re: [VOTE] Release Apache Arrow 4.0.0 - RC1

2021-04-20 Thread Yibo Cai
'gandiva-decimal-test' hangs on my machine, not sure if it's a blocker issue. Details at https://issues.apache.org/jira/browse/ARROW-12476 Test command "TEST_DEFAULT=0 TEST_SOURCE=1 TEST_CPP=1 dev/release/verify-release-candidate.sh source 4.0.0 1" On 4/19/21 10:50 PM, Krisztián Szűcs wrote:

Re: [ANNOUNCE] Copying Rust components to new repositories

2021-04-20 Thread Jorge Cardoso Leitão
Perfect, thanks Krisztian. There is no hurry in any of this; I am just covering high risk tasks to evaluate and mitigate these risks ahead of time. There is now a PR that adds integration tests on the rust side: https://github.com/apache/arrow-rs/pull/10 . All green. Best, Jorge On Mon, Apr