Java: DefaultVectorComparators - invalid implementation

2020-04-09 Thread Martin Janda
I made first look to Apache Arrow Java sources. I found wrong implementation for DefaultVectorComparators.LongComparator I suppose that other comparators can be wrong too. Simple test: long l1=Long.MIN_VALUE +1L; long l2=Long.MAX_VALUE; System.out.println("Arrow: " + Long.signum(l1 - l2));

[jira] [Created] (ARROW-8388) [C++] GCC 4.8 fails to move on return

2020-04-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8388: --- Summary: [C++] GCC 4.8 fails to move on return Key: ARROW-8388 URL: https://issues.apache.org/jira/browse/ARROW-8388 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-8387) [rust] Make schema_to_fb public because it is very useful!

2020-04-09 Thread Max Burke (Jira)
Max Burke created ARROW-8387: Summary: [rust] Make schema_to_fb public because it is very useful! Key: ARROW-8387 URL: https://issues.apache.org/jira/browse/ARROW-8387 Project: Apache Arrow

Re: [Python] black vs. autopep8

2020-04-09 Thread Joris Van den Bossche
> > So autopep8 doesn't fix everything? Sounds inferior to me. That said, I'm > > in favor of any resolution that increases our automation of this and > > decreases the energy we expend debating it. > > It does fix everything, where "everything" is compliance with PEP8, > which I think is the

Re: mutual TLS auth support with arrow flight

2020-04-09 Thread David Li
Hey David, This isn't exposed right now. You'd have to expose the gRPC option on the client and server sides; right now while Flight does set up SSL credentials when TLS is enabled, it's only to allow you to set the root certificate on the client [1] and the server certificate [2]. There is

[jira] [Created] (ARROW-8391) [C++] Implement row range read API for IPC file (and Feather)

2020-04-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8391: --- Summary: [C++] Implement row range read API for IPC file (and Feather) Key: ARROW-8391 URL: https://issues.apache.org/jira/browse/ARROW-8391 Project: Apache Arrow

[jira] [Created] (ARROW-8389) [Integration] Run tests in parallel

2020-04-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8389: - Summary: [Integration] Run tests in parallel Key: ARROW-8389 URL: https://issues.apache.org/jira/browse/ARROW-8389 Project: Apache Arrow Issue Type:

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Paul Dix
I'd be happy to pitch in on getting the integration tests developed. It would certainly beat my current method of building and running my test project and switching over to a Jupyter notebook to manually check it. Is there any prior work in the Rust project that I could basically copy from? Or

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Wes McKinney
Well, luckily we have some newly spruced up documentation about how integration testing works (thanks Neal!) https://github.com/apache/arrow/blob/master/docs/source/format/Integration.rst The main task is writing a parser for the JSON format used for integration testing. The JSON is used to

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Wes McKinney
hi Paul, Dictionary-encoded is not a nested type, so there shouldn't be any children -- the IPC layout of a dictionary encoded field is that same as the type of the indices (probably want to change the terminology in the Rust library from "keys" to "indices" which is what's used in the

mutual TLS auth support with arrow flight

2020-04-09 Thread David Seapy
grpc supports connections using mutual TLS with client and server certificates. Is there an example of how to do this with arrow flight libraries, or does one need to step down to the grpc-level when making requests? Specifically we are working on having data-scientists establish a

[jira] [Created] (ARROW-8390) [R] Expose schema unification features

2020-04-09 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8390: -- Summary: [R] Expose schema unification features Key: ARROW-8390 URL: https://issues.apache.org/jira/browse/ARROW-8390 Project: Apache Arrow Issue Type:

Re: [Rust] Dictionary encoding and Flight

2020-04-09 Thread Paul Dix
I managed to get something up and running. I ended up creating a dictionary_batch.rs and adding that to convert.rs to translate dictionary fields in a schema over to the correct fb thing. I also added a method to writer.rs to convert that to bytes so it can be sent via ipc. However, when writing

Re: Java: DefaultVectorComparators - invalid implementation

2020-04-09 Thread Fan Liya
Hi Martin, Thank you so much for reporting this problem. In the current implementation, we do not consider corner cases related to integer overflow, and this problem should be fixed. I have opened an issue to track this problem [1]. Do you want to provide a patch for it? Best, Liya Fan [1]

[jira] [Created] (ARROW-8392) [Java] Fix overflow related corner cases for vector value comparison

2020-04-09 Thread Liya Fan (Jira)
Liya Fan created ARROW-8392: --- Summary: [Java] Fix overflow related corner cases for vector value comparison Key: ARROW-8392 URL: https://issues.apache.org/jira/browse/ARROW-8392 Project: Apache Arrow

[jira] [Created] (ARROW-8380) [RUST] StringDictionaryBuilder not publicly exported from arrow::array

2020-04-09 Thread Jira
Jörn Horstmann created ARROW-8380: - Summary: [RUST] StringDictionaryBuilder not publicly exported from arrow::array Key: ARROW-8380 URL: https://issues.apache.org/jira/browse/ARROW-8380 Project:

Re: [Python] black vs. autopep8

2020-04-09 Thread Uwe L. Korn
The non-configurability of black is one of the strongest arguments I see for black. The codestyle will always be subjective. From previous discussions I know that my personal preference of readability conflicts with that of Antoine and Wes, so will probably others. We have the same issue with

Re: [C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-09 Thread Antoine Pitrou
It seems there are two different concerns here: - the kernels' public API - the ease with which kernels can be implemented. If we need a different public API, then IMO we should change it sooner rather than later. As for the implementation, perhaps we should start by drawing the main

[jira] [Created] (ARROW-8381) [C++][Dataset] Dataset writing should require a writer schema

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8381: - Summary: [C++][Dataset] Dataset writing should require a writer schema Key: ARROW-8381 URL: https://issues.apache.org/jira/browse/ARROW-8381

[jira] [Created] (ARROW-8386) [Python] pyarrow.jvm raises error for empty Arrays

2020-04-09 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-8386: --- Summary: [Python] pyarrow.jvm raises error for empty Arrays Key: ARROW-8386 URL: https://issues.apache.org/jira/browse/ARROW-8386 Project: Apache Arrow Issue

Re: [Python] black vs. autopep8

2020-04-09 Thread Rok Mihevc
+1 for autopep8 On Thu, Apr 9, 2020 at 4:45 PM Wes McKinney wrote: > So to summarize, it seems that what we are agreeing in this thread is > to not debate readability about otherwise PEP8-compliant Python code > in code reviews, is that right? Absent a consensus about a change, the > outcome is

Re: [C++] Compute: Datum and "ChunkedArray&" inputs

2020-04-09 Thread Wes McKinney
On Thu, Apr 9, 2020, 5:25 AM Antoine Pitrou wrote: > > It seems there are two different concerns here: > - the kernels' public API > - the ease with which kernels can be implemented. > > If we need a different public API, then IMO we should change it sooner > rather than later. > Yes, based on

[jira] [Created] (ARROW-8382) [C++][Dataset] Refactor WritePlan to decouple from Fragment/Scan/Partition classes

2020-04-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8382: - Summary: [C++][Dataset] Refactor WritePlan to decouple from Fragment/Scan/Partition classes Key: ARROW-8382 URL:

[jira] [Created] (ARROW-8385) Crash on parquet.read_table on windows python 3.82

2020-04-09 Thread Geoff Quested-Joens (Jira)
Geoff Quested-Joens created ARROW-8385: -- Summary: Crash on parquet.read_table on windows python 3.82 Key: ARROW-8385 URL: https://issues.apache.org/jira/browse/ARROW-8385 Project: Apache Arrow

[jira] [Created] (ARROW-8383) [RUST] Easier random access to DictionaryArray keys and values

2020-04-09 Thread Jira
Jörn Horstmann created ARROW-8383: - Summary: [RUST] Easier random access to DictionaryArray keys and values Key: ARROW-8383 URL: https://issues.apache.org/jira/browse/ARROW-8383 Project: Apache Arrow

Re: [Python] black vs. autopep8

2020-04-09 Thread Wes McKinney
So to summarize, it seems that what we are agreeing in this thread is to not debate readability about otherwise PEP8-compliant Python code in code reviews, is that right? Absent a consensus about a change, the outcome is basically "no change". The addition of autopep8 as a tool for automated PEP8

[jira] [Created] (ARROW-8384) [C++][Python] arrow/filesystem/hdfs.h and Python wrapper does not have an option for setting a path to a Kerberos ticket

2020-04-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8384: --- Summary: [C++][Python] arrow/filesystem/hdfs.h and Python wrapper does not have an option for setting a path to a Kerberos ticket Key: ARROW-8384 URL: