[jira] [Created] (ARROW-8325) [R][CI] Stop including boost in R windows bundle

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8325: -- Summary: [R][CI] Stop including boost in R windows bundle Key: ARROW-8325 URL: https://issues.apache.org/jira/browse/ARROW-8325 Project: Apache Arrow

[jira] [Created] (ARROW-8324) [R] Add read/write_ipc_file separate from _feather

2020-04-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8324: -- Summary: [R] Add read/write_ipc_file separate from _feather Key: ARROW-8324 URL: https://issues.apache.org/jira/browse/ARROW-8324 Project: Apache Arrow

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Wes McKinney
On Thu, Apr 2, 2020 at 2:19 PM Antoine Pitrou wrote: > > > Le 02/04/2020 à 20:58, Joris Van den Bossche a écrit : > > > > Yes, both autopep8 and black can fix up linting issues to ensure your code > > passes the PEP8 checks (although autopep8 can not fix all issues > > automatically). > > But

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Antoine Pitrou
Le 02/04/2020 à 20:58, Joris Van den Bossche a écrit : > > Yes, both autopep8 and black can fix up linting issues to ensure your code > passes the PEP8 checks (although autopep8 can not fix all issues > automatically). > But with autopep8 you *still* need to think about how to format your code,

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Joris Van den Bossche
Personally, I don't think autopep8 being less aggressive / more conservative is that relevant. This is only for the single PR that does the reformatting where black gives a much bigger number of changed lines. But once that one-time cost is paid, using black will not give larger diffs or make more

[jira] [Created] (ARROW-8323) [C++] Pin gRPC at v1.27 to avoid compilation error in its headers

2020-04-02 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8323: --- Summary: [C++] Pin gRPC at v1.27 to avoid compilation error in its headers Key: ARROW-8323 URL: https://issues.apache.org/jira/browse/ARROW-8323 Project: Apache Arrow

[jira] [Created] (ARROW-8322) [CI] Fix C# workflow file syntax

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8322: -- Summary: [CI] Fix C# workflow file syntax Key: ARROW-8322 URL: https://issues.apache.org/jira/browse/ARROW-8322 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-8321) [CI] Use bundled thrift in Fedora 30 build

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8321: -- Summary: [CI] Use bundled thrift in Fedora 30 build Key: ARROW-8321 URL: https://issues.apache.org/jira/browse/ARROW-8321 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8320) [Documentation][Format] Clarify (lack of) alignment requirements in C data interface

2020-04-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8320: --- Summary: [Documentation][Format] Clarify (lack of) alignment requirements in C data interface Key: ARROW-8320 URL: https://issues.apache.org/jira/browse/ARROW-8320

[jira] [Created] (ARROW-8319) [CI] Install thrift compiler in the debian build

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8319: -- Summary: [CI] Install thrift compiler in the debian build Key: ARROW-8319 URL: https://issues.apache.org/jira/browse/ARROW-8319 Project: Apache Arrow

[jira] [Created] (ARROW-8318) [C++][Dataset] Dataset should instantiate Fragment

2020-04-02 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8318: - Summary: [C++][Dataset] Dataset should instantiate Fragment Key: ARROW-8318 URL: https://issues.apache.org/jira/browse/ARROW-8318 Project: Apache

[jira] [Created] (ARROW-8317) [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor build to fail

2020-04-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8317: --- Summary: [C++] grpc-cpp 1.28.0 from conda-forge causing Appveyor build to fail Key: ARROW-8317 URL: https://issues.apache.org/jira/browse/ARROW-8317 Project: Apache

Re: CPP : arrow symbols.map issue

2020-04-02 Thread Wes McKinney
On Thu, Apr 2, 2020 at 12:06 PM Antoine Pitrou wrote: > > > Hi, > > On Thu, 2 Apr 2020 16:56:06 + > Brian Bowman wrote: > > A new high-performance file system we are working with returns an error > > while writing a .parquet file. The following arrow symbol does not > > resolve properly

Re: CPP : arrow symbols.map issue

2020-04-02 Thread Antoine Pitrou
Hi, On Thu, 2 Apr 2020 16:56:06 + Brian Bowman wrote: > A new high-performance file system we are working with returns an error while > writing a .parquet file. The following arrow symbol does not resolve > properly and the error is masked. > > libparquet.so: undefined symbol:

CPP : arrow symbols.map issue

2020-04-02 Thread Brian Bowman
A new high-performance file system we are working with returns an error while writing a .parquet file. The following arrow symbol does not resolve properly and the error is masked. libparquet.so: undefined symbol: _ZNK5arrow6Status8ToStringB5cxx11Ev > nm libarrow.so* | grep -i

[jira] [Created] (ARROW-8316) [CI] Set docker-compose to use docker-cli instead of docker-py for building images

2020-04-02 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8316: -- Summary: [CI] Set docker-compose to use docker-cli instead of docker-py for building images Key: ARROW-8316 URL: https://issues.apache.org/jira/browse/ARROW-8316

Re: Support of more manipulation for Record Batch

2020-04-02 Thread Wes McKinney
hi Chengxin, Yes, if you look at the JIRA tracker and look for past discussions on the mailing list, there are plans to develop comprehensive data manipulation and query processing capabilities in this project for use in Python, R, and any other language that binds to C++, including C/GLib and

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Wes McKinney
I admit that the status quo does not bother me that much, so `autopep8` as the more conservative / less aggressive option seems fine to me, and also makes it simple for people to fix up common linting issues in their PRs. On Thu, Apr 2, 2020 at 5:16 AM Antoine Pitrou wrote: > > > I have looked

Re: [Python] black vs. autopep8

2020-04-02 Thread Wes McKinney
I'm personally fine with the Black changes. After the one-time cost of reformatting the codebase, it will take any personal preferences out of code formatting (I admit that I have several myself, but I don't mind the normalization provided by Black). I hope that Cython support comes soon since a

[jira] [Created] (ARROW-8315) [Python]

2020-04-02 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8315: --- Summary: [Python] Key: ARROW-8315 URL: https://issues.apache.org/jira/browse/ARROW-8315 Project: Apache Arrow Issue Type: Bug Reporter: Ben

Re: [Python] black vs. autopep8

2020-04-02 Thread Jacek Pliszka
Hi! I believe amount of changes is not that important. In my opinion, what matters is which format will allow reviewers to be more efficient. The committer can always reformat as they like. It is harder for the reviewer. BR, Jacek czw., 2 kwi 2020 o 15:32 Antoine Pitrou napisał(a): > > >

Re: [Python] black vs. autopep8

2020-04-02 Thread Antoine Pitrou
PS: in both cases, Cython files are not processed. autopep8 is actually able to process them, but the comparison wouldn't be apples-to-apples. (that said, autopep8 gives suboptimal results on Cython files, for example it changes "_variable" to "& c_variable" and "void* ptr" to "void * ptr")

[Python] black vs. autopep8

2020-04-02 Thread Antoine Pitrou
Hello, I've put up two PRs to compare the effect of running black vs. autopep8 on the Python codebase. * black: https://github.com/apache/arrow/pull/6810 65 files changed, 7855 insertions(+), 5215 deletions(-) * autopep8: https://github.com/apache/arrow/pull/6811 20 files changed, 137

[jira] [Created] (ARROW-8314) [Python] Provide a method to select a subset of columns of a Table

2020-04-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8314: Summary: [Python] Provide a method to select a subset of columns of a Table Key: ARROW-8314 URL: https://issues.apache.org/jira/browse/ARROW-8314

Re: Join operation on attributes from arrow structs

2020-04-02 Thread Francois Saint-Jacques
They're mapped with the StructType/StructArray, which is also columnar representation, e.g. one buffer per field in the sub-object. If you have varying/incompatible types, a field will be promoted to a UnionType. François On Thu, Apr 2, 2020 at 12:54 AM Micah Kornfield wrote: > > Hi Hasara, >

Support of more manipulation for Record Batch

2020-04-02 Thread Chengxin Ma
Hi all, I am working on a distributed sorting program which runs on multiple computation nodes. In this sorting program, data is represented as pandas DataFrames and key operations are groupby, concat, and sort_values. For shuffling data among the computation nodes, the DataFrames are

Re: Proposal to use Black for automatic formatting of Python code

2020-04-02 Thread Antoine Pitrou
I have looked at the kind of reformatting used by black and I've become -1 on this. `black` is much too aggressive and actually makes the code less readable. `autopep8` seems much better and less aggressive. Let's use that instead. Regards Antoine. On Thu, 26 Mar 2020 20:37:01 +0100 Joris

Re: Clarification regarding the `CDataInterface.rst`

2020-04-02 Thread Anish Biswas
Upgrading the pip installer worked perfectly. Thanks! Regards, Anish Biswas On 2020/04/02 09:35:50, Antoine Pitrou wrote: > > Hi Anish, > > It looks like a bug with old pip versions. You can first upgrade pip using: > > $ pip install -U pip > > Then redo the "pip install" command for

Re: Clarification regarding the `CDataInterface.rst`

2020-04-02 Thread Antoine Pitrou
Hi Anish, It looks like a bug with old pip versions. You can first upgrade pip using: $ pip install -U pip Then redo the "pip install" command for pyarrow. If you can't upgrade pip, you can install Numpy separately first (using "pip install numpy"). Regards Antoine. Le 02/04/2020 à

[jira] [Created] (ARROW-8313) [Gandiva][UDF] Solutions to register new UDFs dynamically without checking it into arrow repo.

2020-04-02 Thread ZMZ91 (Jira)
ZMZ91 created ARROW-8313: Summary: [Gandiva][UDF] Solutions to register new UDFs dynamically without checking it into arrow repo. Key: ARROW-8313 URL: https://issues.apache.org/jira/browse/ARROW-8313

[jira] [Created] (ARROW-8312) improve IN expression support

2020-04-02 Thread Yuan Zhou (Jira)
Yuan Zhou created ARROW-8312: Summary: improve IN expression support Key: ARROW-8312 URL: https://issues.apache.org/jira/browse/ARROW-8312 Project: Apache Arrow Issue Type: Improvement