Re: "2.0.1" and "3.0.1" versions on JIRA

2021-02-17 Thread Wes McKinney
I think 2.0.1 can be removed. I doubt that a 3.0.1 patch release is going to happen either but it can be removed later. On Wed, Feb 17, 2021 at 9:41 AM Antoine Pitrou wrote: > > Hi, > > There are versions named "2.0.1" and "3.0.1" on JIRA, they are tagged > with a number of issues: >

Re: Push force to master by mistake

2021-02-16 Thread Wes McKinney
This has been enabled — if things appear to be working correctly, could someone comment on the INFRA Jira so it can be closed? Thanks! On Sun, Feb 14, 2021 at 3:12 PM Wes McKinney wrote: > > https://issues.apache.org/jira/browse/INFRA-21421 > > On Sun, Feb 14, 2021 at 3:10 PM J

Re: Threading Improvements Proposal

2021-02-15 Thread Wes McKinney
hi Weston, Thanks for putting this comprehensive and informative document together. There are several layers of problems to consider, just thinking out loud: * I hypothesize that the bottom of the stack is a thread pool with a queue-per-thread that implements work stealing. Some code paths

Re: Push force to master by mistake

2021-02-14 Thread Wes McKinney
https://issues.apache.org/jira/browse/INFRA-21421 On Sun, Feb 14, 2021 at 3:10 PM Jorge Cardoso Leitão wrote: > > Yes please. :) > > On Sun, Feb 14, 2021, 22:08 Wes McKinney wrote: > > > Should we make master a "protected branch" now that we've resolved to &

Re: Push force to master by mistake

2021-02-14 Thread Wes McKinney
Should we make master a "protected branch" now that we've resolved to not rebase master ever again? On Sun, Feb 14, 2021 at 6:23 AM Jorge Cardoso Leitão wrote: > > I found the commit, 8547c616dcc7c3ee51f174d118c81b38847974af, and I pushed > the changes up to there, so I think that everything is

Re: [Python] Python based Query Engine for Arrow

2021-02-12 Thread Wes McKinney
I'm actively building an engineering team to work on this -- so anyone who would like to work on this as part of their day job can reach out to me to discuss. We are doing some research about what aspects of prior art in columnar database systems we can pull into the Arrow C++ project (to make

Re: [C++] Conventions around C++ shared_ptr in the code base?

2021-02-08 Thread Wes McKinney
Agreed. We should probably document this in the C++ developer docs. On Mon, Feb 8, 2021 at 12:04 PM Antoine Pitrou wrote: > > > Hi Micah, > > That's roughly my mental model as well. > > However, for 4) I would say that return a const ref to shared_ptr if > preferable because the caller will

Re: Arrow papers

2021-02-07 Thread Wes McKinney
Thanks for sharing these. I was aware of the Microsoft Magpie paper but not the TU Dresden paper. It would be great to see some academic groups engage in adding in-memory compression / encodings to the Arrow format properly in collaboration with the Apache community. On Sun, Feb 7, 2021 at 12:14

Bintray sunsetting

2021-02-06 Thread Wes McKinney
Appears that JFrog is sunsetting Bintray, so we will need to sort out alternative hosting for Linux packages for the 4.0.0 release: https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/

Re: Computational Kernels: the project overview

2021-02-05 Thread Wes McKinney
Sure, feel free to open a Jira issue and / or submit a PR. On Fri, Feb 5, 2021 at 12:48 PM Ying Zhou wrote: > > Hi, > > Speaking of the computational kernels I found that Cast needs significant > improvement. Right now it can not cast a FixedSizeBinary array to a Binary > one which caused my

Re: JIRA grooming

2021-02-05 Thread Wes McKinney
It occurs to me we could (relatively) easily program a bot to apply these "title tags" automatically based on what's in the Component field. What do you think? On Fri, Feb 5, 2021 at 10:09 AM Neal Richardson wrote: > > Hi folks, > Just a reminder to please make sure your JIRA issue titles start

Re: [RUST] Arrow guide

2021-01-31 Thread Wes McKinney
To state the obvious, it would be great to have some community maintained documentation (beyond generated API docs) for the Rust library. Writing documentation almost always causes the quality of a code base to improve because the process brings up rough edges, inconsistencies, or missing

Re: [Rust] Proposed PR Merge Guidelines

2021-01-29 Thread Wes McKinney
When it comes to downstream projects, it may make sense to implement some integration tests that can be triggered via Crossbow if you aren't sure whether a change will cause breakage. On Fri, Jan 29, 2021 at 1:25 PM Andrew Lamb wrote: > > Micah, it is a great question. > > I often find myself

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Wes McKinney
It still seems notable that our generic LZ4-compressed output stream cannot be read by Java (independent of Arrow and the Arrow IPC format). On Thu, Jan 28, 2021 at 12:30 PM Antoine Pitrou wrote: > > On Thu, 28 Jan 2021 18:19:00 + > Joris Peeters wrote: > > > To be fair, I'm happy to apply

Re: Pandas Block Manager

2021-01-28 Thread Wes McKinney
My position on this is that we should work with the pandas community to work toward elimination of the BlockManager data structure as this will solve a multitude of problems and also make things better for Arrow. I am not supportive of the IPC format changes in the PR. On Wed, Jan 27, 2021 at

Re: lz4 compressed arrow between Python & Java

2021-01-28 Thread Wes McKinney
hi Joris -- this isn't a use case that we intend for most users (we intend for users to instead use the LZ4 compression option that is part of the IPC format itself, rather than something that is layered on externally), but it would be good to make sure that our LZ4 streams are interoperable

Re: [RESULT] [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-27 Thread Wes McKinney
Right, that’s what we discussed before this release, so I want to confirm that we aren’t going to rebase master anymore. I’ll bring it up the next release to make sure we don’t do it again. On Wed, Jan 27, 2021 at 7:45 AM Krisztián Szűcs wrote: > On Wed, Jan 27, 2021 at 2:35 PM Wes McKin

Re: [RESULT] [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-27 Thread Wes McKinney
] update maven artifacts > > > >> > 14. [kou] update msys2 > > > >> > 15. [nealrichardson] update R packages > > > >> > 16. [ ] update docs > > > >> > 17. [ ] rebase the pull requests > > > >> > > > >

Re: [MATLAB] Developing a MATLAB Interface for Apache Arrow

2021-01-25 Thread Wes McKinney
hi Kevin -- I read through the document. It seems plenty reasonable to me. Look forward to seeing the buildout. Thanks Wes On Mon, Jan 25, 2021 at 3:10 PM Kevin Gurney wrote: > > Hi Antoine, > > Thanks very much for taking a first pass over the document! I'll start > working through the

[RESULT] [VOTE] Release Apache Arrow 3.0.0 - RC2

2021-01-25 Thread Wes McKinney
Copying the vote result with the usual subject line On Mon, Jan 25, 2021 at 3:23 PM Krisztián Szűcs wrote: > > The VOTE carries with > - 4 binding +1 > - 3 non-binding +1 > - 1 non-binding +0 > votes. > > Thanks everyone! > > I'm starting the post release tasks and keep you posted about the >

Re: Please Review: Application for a Media Type

2021-01-22 Thread Wes McKinney
Thank you for taking the lead on this. I gave a brief read through and I think it makes sense using Thrift or Protocol Buffers as a guideline. Would be good for some others to review who might be familiar with IANA media formats On Wed, Jan 20, 2021 at 6:17 PM Weston Pace wrote: > > Per a

Re: [Proposal] Modify release process to vote only on source release

2021-01-19 Thread Wes McKinney
I'm OK with moving to source only releases, but we need to take a step back and consider how our CI/CD is failing to notify us in a suitably timely and automated way about the packages being broken. For example, the fact that we had 2 failed RCs as the result of packaging issues points to a broken

Re: Is it ok to merge PRs to apache/master now given the 3.0.0 release?

2021-01-19 Thread Wes McKinney
I don't see any need to stop merging PRs. I recommend you go ahead and if Krisztian needs to cut another RC he can do it out of a branch. I've said this on some other e-mail threads and there haven't been objections so I think it's okay. We can formalize this policy after 3.0.0 so there isn't this

Re: [VOTE] Release Apache Arrow 3.0.0 - RC0

2021-01-18 Thread Wes McKinney
The plasma executable is failing to start for some reason, but that function should not fail in that way so please open a Jira. I don't think this is a blocking bug; if you'd like to verify without Plasma you can disable it in the verification script. On Mon, Jan 18, 2021 at 8:46 PM Ying Zhou

Re: Add teams inside Apache's organization on GitHub for Arrow

2021-01-18 Thread Wes McKinney
In principle this sounds okay to me (you can create an Infra JIRA ticket to create such a GitHub team -- note that they will only be able to add committers, I think). It might be just as easy to put a document somewhere (on Confluence or in a text file -- like [1]) so it's clear who specifically

Re: Release 3.0 timeline?

2021-01-17 Thread Wes McKinney
ues > and merge new features to master. I don't know who (if anyone) would do > such a thing, but I would be happy to help -- I just don't know what to do. > > On Sat, Jan 16, 2021 at 2:24 PM Wes McKinney wrote: > >> I would move that we should release make any needed follow-up RC's

Re: Release 3.0 timeline?

2021-01-16 Thread Wes McKinney
I would move that we should release make any needed follow-up RC's out of a maintenance branch and let master evolve freely On Sat, Jan 16, 2021 at 12:47 PM Neville Dipale wrote: > > Hi Arrow devs, > > There's some bugs in the Parquet implementation which affect reading of > data: > > -

Re: Release 3.0 timeline?

2021-01-15 Thread Wes McKinney
I think we should make sure to switch to releasing from a branch in 4.0.0 so that patches can flow uninterrupted into master regardless of whether it’s close to a release or not. We will have to make some changes to the release tools but this seems consistent with past discussions. On Fri, Jan

Re: ORC writer [Re: When will my PR be available in a release?]

2021-01-14 Thread Wes McKinney
Note that a good way to speed up the process is to help with other development tasks in the project (since this will help maintainers dedicate more time to code review). On Thu, Jan 14, 2021 at 8:09 AM Antoine Pitrou wrote: > > > Hi Ying, > > Sorry for the delay. We're quite strained these days

Re: Arrow January board report -- help needed

2021-01-13 Thread Wes McKinney
I'll pull this together today and post. Any last comments? On Wed, Jan 6, 2021 at 4:16 AM Antoine Pitrou wrote: > > > Le 05/01/2021 à 21:57, Wes McKinney a écrit : > > It's time for our quarterly board report (due next Wednesday). Can > > everyone help with completing

Re: ursa-labs/crossbow on travis-ci.com is disabled

2021-01-11 Thread Wes McKinney
understand that ursa-labs/crossbow was recently disabled by GitHub due to the large number of releas artifacts, so a new repo may need to be created there. I will leave it to Krisztian or others to sort out. Thanks, Wes On Sat, Jan 9, 2021 at 5:44 PM Wes McKinney wrote: > > Hi Kou — yes, we

Re: ursa-labs/crossbow on travis-ci.com is disabled

2021-01-09 Thread Wes McKinney
Hi Kou — yes, we can do that. I’m not able to do it right this minute but I can look in a few hours or Neal may beat me to it. On Sat, Jan 9, 2021 at 5:36 PM Sutou Kouhei wrote: > Hi, > > Could Ursa Labs buy the Travis CI's "5 Concurrent plan" > ($249/month) only for this month? It's just for

Re: Github Actions feedback time

2021-01-07 Thread Wes McKinney
is more difficult to run our own > docker images. We can probably make it work via docker on docker, > but it is likely more evolved (I haven't tried that yet). We could also > build an image on top of "buildkite/agent:3" for this, but well... > > Best, > Jorge > >

Re: [Rust][DataFusion] Target SQL Dialect Proposal

2021-01-07 Thread Wes McKinney
Just a drive-by comment from me, but since Materialize (source-available, but not open source) also implements Postgres dialect in Rust, I wonder if there's a collaboration possibility across SQL-related Rust projects. On Thu, Jan 7, 2021 at 10:03 AM Andrew Lamb wrote: > > There was discussion

Re: Github Actions feedback time

2021-01-07 Thread Wes McKinney
Jorge -- if you want to test a Linux agent, you could run the buildkite-agent in a Docker container. We (Ursa) could possibly look into adding Dockerized agents on some of our physical machines, particularly if we set up a well-documented procedure for setting this up on a new machine. On Wed,

Re: NumPyBuffer does not set mutable_data_. Bug?

2021-01-07 Thread Wes McKinney
hi Arthur -- yes, this is a bug (I just quickly confirmed looking in the C++ code). Please do create a Jira issue and if you submit a PR for it, that's great. Thanks, Wes On Thu, Jan 7, 2021 at 11:17 AM Arthur Peters wrote: > > NumPyBuffer sets is_mutable_, but does not set mutable_data_ in

Re: Github Actions feedback time

2021-01-05 Thread Wes McKinney
repo, or is there > > any PMC-specific activity that is blocking us from working on this? > > > > Best, > > Jorge > > > > > > > > > > On Tue, Jan 5, 2021 at 5:10 PM Wes McKinney wrote: > > > > > At the risk of sounding like a brok

Re: Github Actions feedback time

2021-01-05 Thread Wes McKinney
to reporting to triggers and reporting back to > gihtub? I.e. can we just place a `pipeline.yml` on the repo, or is there > any PMC-specific activity that is blocking us from working on this? > > Best, > Jorge > > > > > On Tue, Jan 5, 2021 at 5:10 PM Wes McKinney wrote: &g

Arrow January board report -- help needed

2021-01-05 Thread Wes McKinney
It's time for our quarterly board report (due next Wednesday). Can everyone help with completing the various sections? ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: [Insert your own

Re: Github Actions feedback time

2021-01-05 Thread Wes McKinney
At the risk of sounding like a broken record -- we are almost certainly going to have to move our builds to dedicated infrastructure that this community has complete agency over sometime between now and 2025. Maybe it will be this year, maybe next year, but to me it is an inevitability. I spent

Re: arrow::compute::ExecContext default

2020-12-31 Thread Wes McKinney
Right now it's an application-dependent decision. I think until we see more fully-formed query engines or other data processing applications that use this code, it's hard to say what will emerge as the best practice. For example, a mutable ExecContext* is passed to many functions, but it might be

Re: Fix for bug in parquet stream writer

2020-12-28 Thread Wes McKinney
hi Anders, would you like to open a Jira issue and submit a PR (with unit test)? On Mon, Dec 28, 2020 at 9:51 AM anders johansson wrote: > > Hi, > > When writing to a primitive node of a logical type not supported by > converted_type (such as parquet::LogicalType::TimeUnit::NANOS), the error >

Re: upper() / lower() for utf8 strings

2020-12-23 Thread Wes McKinney
It might be worthwhile to see if some reusable templates can be assembled that can be employed in both places On Tue, Dec 22, 2020 at 5:47 PM Neal Richardson wrote: > > FWIW the C++ compute library now uses > https://github.com/JuliaStrings/utf8proc, so assuming it does all of the > things you

Re: building and debugging on Mac without rpath

2020-12-22 Thread Wes McKinney
What does it mean that DYLD_LIBRARY_PATH is "flashed"? It seems like there are some issues here which may affect other developers, in which case we should try to document them in our docs for future reference. On Mon, Dec 21, 2020 at 1:19 PM Neal Richardson wrote: > > Building with

Re: [C++] Includes and failing checks in Python and C Glib & Ruby

2020-12-18 Thread Wes McKinney
As a matter of development policy, we do not permit Arrow's public / non-internal headers to transitively include the header files of any build or runtime dependencies. So I would suggest creating a self-contained way to specify the ORC write options when using from Arrow-land. On Fri, Dec 18,

Re: [C++] Minimum CMake version

2020-12-16 Thread Wes McKinney
I support raising the minimum version. On Wed, Dec 16, 2020 at 3:59 PM Sutou Kouhei wrote: > Hi, > > We require CMake 3.2 or later. Can we require more newer > CMake? I don't to want to add workaround for CMake 3.2 like > this: > > >

Re: Empty RecordBatch in Java Flight client

2020-12-16 Thread Wes McKinney
If the manual protobuf parsing in Java is not compliant with the Protobuf spec, then I think we should fix that. On Wed, Dec 16, 2020 at 10:46 AM Eric Erhardt wrote: > > An incompatibility between the .NET and Java flight implementations was > raised with

Re: Should we default to write parquet format version 2.0? (not data page version 2.0)

2020-12-15 Thread Wes McKinney
I'm in favor of the confusingly-named version='2.0' default. I note that such decisions are hampered by our lack of integration / compatibility testing with other Parquet consumers to know whether they will understand all of the data that we write. On Tue, Dec 15, 2020 at 10:50 AM Antoine Pitrou

Re: arrow for game engine / graphics workloads?

2020-12-14 Thread Wes McKinney
Arrow only uses Flatbuffers to serialize metadata, *not* data. On Mon, Dec 14, 2020 at 1:39 PM Robert Bigelow wrote: > > This is an excellent point. I could use Flatbuffers directly to define any > custom format needed by the engine. The engine itself would need to use the > same principles

Re: pass input args directly to kernel

2020-12-14 Thread Wes McKinney
Also, do not feel the need to be constrained by the structures that are currently defined. On Mon, Dec 14, 2020 at 4:33 AM Antoine Pitrou wrote: > > > Hi, > > If you set `can_execute_chunkwise = false` on the kernel options, you > should see the whole chunked array. > > Regards > > Antoine. > >

Re: [C++] Are stream adapters necessary for the Arrow2ORC adapter?

2020-12-13 Thread Wes McKinney
It would be more flexible to use the Arrow IO interfaces. That would enable you to read and write to remote filesystems as well. I would recommend that over e.g. passing in a file path. On Sat, Dec 12, 2020 at 3:51 AM Ying Zhou wrote: > > Hi, > > As the developer who is testing the APIs in the

Re: pyarrow no arm builds, and errors when building

2020-12-11 Thread Wes McKinney
I would recommend going the conda-forge route for obtaining aarch64 binary packages. Otherwise, you will want to follow the Python build-from-source instructions in our documentation. If you want "pip install pyarrow" to work without a wheel available (i.e. build from source), you will have to

Re: Incompatability of all existing pyarrow releases with the next NumPy release

2020-12-07 Thread Wes McKinney
I believe we can do a release that is just focused on the Python artifacts, yes. On Mon, Dec 7, 2020 at 6:52 AM Joris Van den Bossche wrote: > > On Fri, 4 Dec 2020 at 21:11, Uwe L. Korn wrote: > > > Hello all, > > > > Today the Karotothek CI turned quite red in > >

Demise of Ursabot CI jobs

2020-12-07 Thread Wes McKinney
hi folks -- I just wanted to confirm that the Ursabot CI jobs that went down a few weeks ago won't be coming back, at least not in the Buildbot form factor. The Buildbot master was hosted on a physical machine which suffered some kind of network configuration problem during a Linux update that I

Re: C Data interface landed on Rust

2020-12-06 Thread Wes McKinney
Congrats to everyone on the milestone and teamwork! On Sat, Dec 5, 2020 at 10:04 PM Jorge Cardoso Leitão wrote: > > Hi, > > Just to let you know that with #8401 > merged, Rust's implementation > has now basic support for the c data interface >

Re: Removing Python 3.5 support

2020-11-30 Thread Wes McKinney
Note that the latest pip release deprecates Python 3.5 support and will be removed altogether in pip 21.0 https://mail.python.org/archives/list/pypi-annou...@python.org/thread/DIWIYIMGAOHDWQXXUZW44YRSW7UYQ4CA/ On Sun, Nov 29, 2020 at 7:57 PM Wes McKinney wrote: > > On Fri, Nov 27, 2020 a

Re: Computational Kernels: the project overview

2020-11-30 Thread Wes McKinney
One objective of the precompiled kernels project is to have meaningful computational functionality in a package that does not need to include the LLVM runtime -- to require the LLVM dependency even for simple functions would more than double the size of our Python packages, for example. There is

Re: [C++] Sparse Unions and CICD tests

2020-11-30 Thread Wes McKinney
Regarding our CI: these builds should be consistently green (which they are, only the Python 3.5 CI entry is failing for known reasons, see e.g. https://github.com/apache/arrow/commit/64f9b3fbe9ef4c718449a735435b53ab992ca852). We have a couple of flaky tests, but if you are seeing other failures

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-29 Thread Wes McKinney
It sounds like releasing from branches going forward would solve our problems: * No more force-pushing master * No need to rebase after releases If there are no objections, I would say we should go ahead and do this. I don't think there is a need for a vote but if anyone wants one please chime

Re: Removing Python 3.5 support

2020-11-29 Thread Wes McKinney
On Fri, Nov 27, 2020 at 2:48 AM Antoine Pitrou wrote: > > > Le 27/11/2020 à 06:07, Micah Kornfield a écrit : > > Could we hold off for at least a few days, I'd like to run this by some > > colleagues. > > Certainly. > > > Are there any options for maintaining a functioning CI for > > Python 3.5?

Re: python module pyarrow.compute is not found

2020-11-27 Thread Wes McKinney
On Fri, Nov 27, 2020 at 10:42 AM Kirill Lykov wrote: > > ARROW_COMPUTE was enabled because ARROW_PYTHON was enabled (I double > checked). > And sanity check helped -- I should be in python folder, not in the root > folder of arrow repo, thanks! > > I wonder if there a way to detect if compute

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Wes McKinney
avoid both of those patterns). > > I'd also clarify, while I'm against merge commits, I think that a release > branch that forks slightly off the main branch should be fine. Do people > disagree with this? > > > > On Wed, Nov 25, 2020 at 5:52 AM Wes McKinney wrote: &

Re: [Discuss] Bearer Token refresh design with retry mechanism

2020-11-25 Thread Wes McKinney
In principle adding a new error code for this seems reasonable to me. What do you think about calling it something more generic like "AUTH_EXPIRED"? I haven't looked at the details of the implementation -- David Li or others may be able to provide better comments? On Fri, Nov 20, 2020 at 6:23 PM

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Wes McKinney
Note that at any time you can change your GitHub settings to disallow your branches from being edited by maintainers. On Wed, Nov 25, 2020 at 7:51 AM Wes McKinney wrote: > > > The first two sound logical, but why couldn't those version bumps be a > merge commit into master? &g

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Wes McKinney
> The first two sound logical, but why couldn't those version bumps be a merge commit into master? We've made the commitment to maintaining a linear commit history in this project. Auto-rebasing the PRs at this point is best described as "harm reduction". The root cause is GitHub's UI which

Re: Requesting December release of Arrow Flight

2020-11-23 Thread Wes McKinney
m and see if there is anything that I have > particular difficulties with. > -- > Charlene Solonynka > M: 778-903-3124 • E: csolony...@dremio.com > <https://hello.dremio.com/email-signature-url> > <https://hello.dremio.com/email-signature-url> > > On Nov 23, 2020, at 10:06 AM

Re: Requesting December release of Arrow Flight

2020-11-23 Thread Wes McKinney
hi Charlene, I think if you can resolve the issues we have raised with the Java release process and better automate the production of all the release artifacts (so that there is less time commitment required for the RM), then we may be able to release in December. Otherwise, I am not sure it's

Re: [Discuss] Arrow Release Schedule

2020-11-18 Thread Wes McKinney
not, who is the > > best person to create those tickets? > > > > Regards, > > Keerat > > > > > > On Tue, Nov 10, 2020 at 7:53 PM Wes McKinney wrote: > > > > > +1 to everything that Kou said. > > > > > > On Tue, Nov 10, 202

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-18 Thread Wes McKinney
he intermediate state data across multiple > >>> processes. Unfortunately, KernelState struct does not expose the data > >>> pointer to the outside. If say SumState is exposed, we could have accessed > >>> that data, isn't it? WDYT? > >>> 2. Polymorphism an

Re: C++: Cache RecordBatch

2020-11-17 Thread Wes McKinney
On Tue, Nov 17, 2020 at 5:41 PM Rares Vernica wrote: > > Hi Antoine, > > On Tue, Nov 17, 2020 at 2:34 AM Antoine Pitrou wrote: > > > > Le 17/11/2020 à 03:34, Rares Vernica a écrit : > > > > > > I'm using an arrow::io::BufferReader and > > > arrow::ipc::RecordBatchStreamReader to read an

Re: [Discuss] Should dense union offsets be always increasing?

2020-11-17 Thread Wes McKinney
In principle I'm in favor of #2 -- the only question is what kinds of problems it might pose for forward compatibility. Note * This is completely backward compatible (any data conforming to the spec to the letter will continue to be conforming) * It is also forward compatible at a protocol

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-13 Thread Wes McKinney
quot; and slot j from array "b" are equal. > > Best, > Jorge > > > > > On Fri, Nov 13, 2020 at 3:27 PM Wes McKinney wrote: > > > On Fri, Nov 13, 2020 at 1:19 AM Micah Kornfield > > wrote: > > > > > > Hi Jorge, > > > I thin

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-13 Thread Wes McKinney
t; > > This logic is also tricky for any type with childs, where we need to > > compare the slot of the child through recursion. > > These things are not really implementation specific, yet they are really > > important when implementations inter-operate. > > > >

Re: [Discuss] Arrow Release Schedule

2020-11-10 Thread Wes McKinney
ghtly builds green, we will be > able to release a new version soon when we want to release. > > > Thanks, > -- > kou > > In > "Re: [Discuss] Arrow Release Schedule" on Tue, 10 Nov 2020 18:23:16 > -0600, > Wes McKinney wrote: > > > We do

Re: [Discuss] Arrow Release Schedule

2020-11-10 Thread Wes McKinney
p with, given they > are not able to satisfy certain release requirements? > > Regards, > Keerat > > On Tue, Nov 3, 2020 at 5:27 AM Wes McKinney wrote: >> >> I think to release more often, a few things are necessary: >> >> - Other organizations / PMC me

Re: [ANNOUNCE] New Arrow committer: Andrew Lamb

2020-11-10 Thread Wes McKinney
Congrats Andrew! On Tue, Nov 10, 2020 at 9:42 AM Andy Grove wrote: > On behalf of the Arrow PMC, I'm happy to announce that Andrew Lamb has > accepted an invitation to become a committer on Apache Arrow. > > Welcome, and thank you for your contributions! >

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-10 Thread Wes McKinney
Yes, open a Jira and propose a PR implementing the changes you need On Mon, Nov 9, 2020 at 8:31 PM Niranda Perera wrote: > > @wes How should I proceed with this nevertheless? should I open a JIRA? > > On Mon, Nov 9, 2020 at 11:09 AM Wes McKinney wrote: > > > On Mon, N

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-09 Thread Wes McKinney
> > @Wes > Yes, that would be great. How about adding a CMake compilation flag for > such dev use cases? > This seems like it could cause more problems -- I think it would be sufficient to use an "internal::" C++ namespace and always install the relevant header file > >

Re: Using arrow/compute/kernels/*internal.h headers

2020-11-08 Thread Wes McKinney
I'm not opposed to installing headers that provide access to some of the kernel implementation internals (with the caveat that changes won't go through a deprecation cycle, so caveat emptor). It might be more sustainable to think about what kind of stable-ish public API could be exported to

Re: PyArrow Compute API

2020-11-05 Thread Wes McKinney
You can help by opening Jira issues for adding new functions or adding new type cases to functions. Since Arrow is a volunteer-based project there's no guarantee when or if something will be implemented but you are free to submit PRs of course. On Thu, Nov 5, 2020 at 4:35 PM Vibhatha Abeykoon

Re: [DISCUSS] Extend specification with the definition of equality?

2020-11-05 Thread Wes McKinney
hi Jorge, The intent when authoring the specification was as follows * If two array slots being compared are both null, then they are equal * If one is null and the other is not, they are not equal * If they are both not null, then they are equal if the data represented in the slot is equal (and

Re: Closing of Server Resources

2020-11-03 Thread Wes McKinney
leanup > > operation, while also automating some of the clean-up > > as well. It seems error-prone. > > > > On Fri, Oct 30, 2020 at 3:56 PM Wes McKinney wrote: > > > >> Do you need this beyond FlightSQL? Since any user/client interface to > >> a serv

Re: [Discuss] Arrow Release Schedule

2020-11-03 Thread Wes McKinney
I think to release more often, a few things are necessary: - Other organizations / PMC members must volunteer more time to drive releases and the process around them. My team (and Krisztian in particular) together with Kou and Uwe have done the majority of this work the last couple of years. -

Re: Using FlightClientOption `disable_server_verification` with TLS throws exception with pyarrow 2.0.0

2020-11-02 Thread Wes McKinney
> Channel > pyarrow 2.0.0 py38h07f3135_5_cpuconda-forge > > Regards, > Keerat > > On Mon, Nov 2, 2020 at 1:14 PM Wes McKinney wrote: >> >> Supposedly we are building with grpc 1.29.1 in the manylinux wheels >> (in conda-forge,

Re: Arrow 2.0.0 versioning

2020-11-02 Thread Wes McKinney
The parts of the release process involving Maven are brittle and error prone, so it probably just didn't flow through properly, or the commit updating the Java POM versions didn't get pushed to the release branch or something similar On Mon, Nov 2, 2020 at 6:12 PM James Duong wrote: > > Hi, > >

Re: null buffers in primitive arrays

2020-11-02 Thread Wes McKinney
as handled inconsistently in C++ -- now it is consistent. On Mon, Nov 2, 2020 at 3:47 PM Niranda Perera wrote: > > I see. So, what are the backward compatibility guarantees Arrow has moving > forward? > > On Mon, Nov 2, 2020 at 9:52 AM Wes McKinney wrote: > > > No, you'd

Re: Using FlightClientOption `disable_server_verification` with TLS throws exception with pyarrow 2.0.0

2020-11-02 Thread Wes McKinney
Supposedly we are building with grpc 1.29.1 in the manylinux wheels (in conda-forge, we're at >= 1.33) https://github.com/apache/arrow/blob/master/python/manylinux201x/scripts/build_grpc.sh So someone will need to take a closer look at the wheel (or conda) builds to see what might be going

Re: null buffers in primitive arrays

2020-11-02 Thread Wes McKinney
s like > this? (apart from a release change log, that is) > > On Mon, Nov 2, 2020 at 9:26 AM Wes McKinney wrote: > > > Indeed, we made a change to cause buffers[0] to always be null when > > the null count is 0, which has always been permitted by the columnar > >

Re: null buffers in primitive arrays

2020-11-02 Thread Wes McKinney
Indeed, we made a change to cause buffers[0] to always be null when the null count is 0, which has always been permitted by the columnar format specification (and in 0.16.0 and prior it was inconsistently null depending on how the array was created). On Mon, Nov 2, 2020 at 8:22 AM Niranda Perera

Re: Closing of Server Resources

2020-10-30 Thread Wes McKinney
Do you need this beyond FlightSQL? Since any user/client interface to a server is going to require some custom development anyway (just like the server requires custom development), if there is a need to close resources, it seems like this could be implemented by an action that is specific to the

Re: [C++] Arrow debug with ORC & unittest can not be built

2020-10-25 Thread Wes McKinney
The Arrow build system is configured to build Apache ORC without libhdfspp https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L2613 If you'd like to change this or make it configurable, some development will be needed, so I would suggest opening a Jira issue

Re: [ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-25 Thread Wes McKinney
Thanks all! On Sun, Oct 25, 2020 at 6:29 AM Krisztián Szűcs wrote: > > Congrats Wes! > > On Sun, Oct 25, 2020 at 2:40 AM David Li wrote: > > > > Congratulations Wes! > > > > Best, > > David > > > > On 10/24/20, Li Jin wrote: > > > Congrats Wes! > > > > > > On Sat, Oct 24, 2020 at 10:05 AM Ying

Re: Arrow on PyPy3 patch

2020-10-22 Thread Wes McKinney
Either way having a Dockerfile in the project to test with PyPy sounds like a good idea. On Thu, Oct 22, 2020 at 6:37 AM Antoine Pitrou wrote: > > We can, but we cannot be expected to act if something breaks. So this > would be wasting CPU resources for little use. > > Regards > > Antoine. > >

Re: [Discuss] Provide pluggable APIs to support user customized compression codec

2020-10-21 Thread Wes McKinney
Yes, I think he's asking about the motivation for the project. My understanding is that Snappy is used more often than Gzip with Parquet On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi wrote: > > Hi, Antoine > > Do you mean the performance data HW-GZIP compared with LZ4/ZSTD? > > Thanks, > XieQi > >

Re: hadoop file system connect problem with pyarrow

2020-10-21 Thread Wes McKinney
Do either of these machines have a current Hadoop installation (and is that installation in the system path)? On Tue, Oct 20, 2020 at 9:53 AM 황세규 wrote: > > Dear Maintainer. My name is Joseph Hwang in South Korea. I need some advice > about PyArrow. > > I try to develop Hadoop File System

Re: Experiment with DataFusion + Pyarrow

2020-10-20 Thread Wes McKinney
Exciting to see, this is exactly the kind of interop we've been working diligently toward since the start of the project! On Tue, Oct 20, 2020 at 11:54 AM Micah Kornfield wrote: > > Really cool work. Very nice to see this type of integration! > > On Tue, Oct 20, 2020 at 9:35 AM Jorge Cardoso

Re: [Discuss] Provide pluggable APIs to support user customized compression codec

2020-10-19 Thread Wes McKinney
What is the purpose of the key-value metadata aside from automatically loading the plugin library if it's available (which seems like a security risk if reading a data file can cause a shared library to be loaded dynamically)? Is it necessary to have that metadata for it to be safe to use the

Re: Arrow C Data Interface

2020-10-19 Thread Wes McKinney
hi Pasha, Copying dev@. You can see how DuckDB interacts with the pyarrow data structures by the C interface here, maybe it's helpful https://github.com/cwida/duckdb/blob/master/tools/pythonpkg/duckdb_python.cpp We haven't defined a Python API (either C API level or Python API level) so that

Re: [C++] AppendValues for numeric types with invalid slots omitted from source

2020-10-18 Thread Wes McKinney
hi Ying, the code in adapter_util.cc doesn't look right to me unless the data in liborc::ColumnVectorBatch is spaced (has placeholder bytes where there is a null). We have quite a bit of code in Parquet that deals specifically with this issue -- I'm not sure if we have a ready-made function that

Re: [VOTE] Release Apache Arrow 2.0.0 - RC2

2020-10-13 Thread Wes McKinney
* Do you have LLVM installed? * Can you turn ARROW_GANDIVA=OFF? On Tue, Oct 13, 2020 at 12:22 PM Antoine Pitrou wrote: > > > C++ source fails building for me on Ubuntu 20.04: > > -- Could NOT find LLVM (missing: LLVM_DIR) > -- Could NOT find LLVM (missing: LLVM_DIR) > -- Could NOT find LLVM

<    1   2   3   4   5   6   7   8   9   10   >