[jira] [Created] (ARROW-7362) [Python] ListArray.flatten() should take care of slicing offsets

2019-12-09 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-7362:


 Summary: [Python] ListArray.flatten() should take care of slicing 
offsets
 Key: ARROW-7362
 URL: https://issues.apache.org/jira/browse/ARROW-7362
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Zhuo Peng
Assignee: Zhuo Peng


Currently ListArray.flatten() simply returns the child array. If a ListArray is 
a slice of another ListArray, they will share the same child array, however the 
expected behavior (I think) of flatten() should be returning an Array that's a 
concatenation of all the sub-lists in the ListArray, so the slicing offset 
should be taken into account.

 

For example:

a = pa.array([[1], [2], [3]])

assert a.flatten().equals(pa.array([1,2,3]))

# expected:

a.slice(1).flatten().equals(pa.array([2, 3]))



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7361) [Rust] Build directory is not passed to ci/scripts/rust_test.sh

2019-12-09 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7361:
--

 Summary: [Rust] Build directory is not passed to 
ci/scripts/rust_test.sh
 Key: ARROW-7361
 URL: https://issues.apache.org/jira/browse/ARROW-7361
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


See build https://github.com/apache/arrow/runs/340751277



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7360) [R] Can't use dataset's filter with non-literal expression

2019-12-09 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-7360:
-

 Summary: [R] Can't use dataset's filter with non-literal expression
 Key: ARROW-7360
 URL: https://issues.apache.org/jira/browse/ARROW-7360
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Francois Saint-Jacques


The following will generate an error


{code:r}
test_that("filtering with expression", {
  char_sym <- "b"   
  expect_dplyr_equal(   
input %>%   
  filter(chr == char_sym) %>%   
  select(string = chr, int) %>% 
  collect(),
tbl 
  ) 
})  

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Krisztián Szűcs
Congrats Joris!

On Mon, Dec 9, 2019 at 8:36 PM David Li  wrote:
>
> Congrats Joris!
>
> Best,
> David
>
> On 12/9/19, Francois Saint-Jacques  wrote:
> > Bravo!
> >
> > On Mon, Dec 9, 2019 at 6:55 AM Wes McKinney  wrote:
> >>
> >> On behalf of the Arrow PMC, I'm happy to announce that Joris has
> >> accepted an invitation to become a committer on Apache Arrow.
> >>
> >> Welcome, and thank you for your contributions!
> >


Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread David Li
Congrats Joris!

Best,
David

On 12/9/19, Francois Saint-Jacques  wrote:
> Bravo!
>
> On Mon, Dec 9, 2019 at 6:55 AM Wes McKinney  wrote:
>>
>> On behalf of the Arrow PMC, I'm happy to announce that Joris has
>> accepted an invitation to become a committer on Apache Arrow.
>>
>> Welcome, and thank you for your contributions!
>


[jira] [Created] (ARROW-7359) [C++][Gandiva] Don't throw error for locate function with start position exceeding string length, return 0 instead

2019-12-09 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7359:
-

 Summary: [C++][Gandiva] Don't throw error for locate function with 
start position exceeding string length, return 0 instead
 Key: ARROW-7359
 URL: https://issues.apache.org/jira/browse/ARROW-7359
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Projjal Chanda
Assignee: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7358) [CI] [Dev] [C++] ccache disabled on conda-python-hdfs

2019-12-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7358:
-

 Summary: [CI] [Dev] [C++] ccache disabled on conda-python-hdfs
 Key: ARROW-7358
 URL: https://issues.apache.org/jira/browse/ARROW-7358
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration, Developer Tools
Reporter: Antoine Pitrou
Assignee: Krisztian Szucs






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7357) [Go] migrate from pkg/errors to x/xerrors

2019-12-09 Thread Sebastien Binet (Jira)
Sebastien Binet created ARROW-7357:
--

 Summary: [Go] migrate from pkg/errors to x/xerrors
 Key: ARROW-7357
 URL: https://issues.apache.org/jira/browse/ARROW-7357
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Sebastien Binet


we should migrate away from `pkg/errors` to `golang.org/x/xerrors` to ensure 
better error handling (and one that is Go-1.13 compatible).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Francois Saint-Jacques
Bravo!

On Mon, Dec 9, 2019 at 6:55 AM Wes McKinney  wrote:
>
> On behalf of the Arrow PMC, I'm happy to announce that Joris has
> accepted an invitation to become a committer on Apache Arrow.
>
> Welcome, and thank you for your contributions!


Re: [ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Neal Richardson
Congratulations!


> On Dec 9, 2019, at 4:38 AM, Fan Liya  wrote:
> 
> Congratulations, Joris!
> 
> Best,
> Liya Fan
> 
>> On Mon, Dec 9, 2019 at 7:55 PM Wes McKinney  wrote:
>> 
>> On behalf of the Arrow PMC, I'm happy to announce that Joris has
>> accepted an invitation to become a committer on Apache Arrow.
>> 
>> Welcome, and thank you for your contributions!
>> 


[jira] [Created] (ARROW-7356) [C++] Refactor Parquet Code Samples to use Result APIs

2019-12-09 Thread Gal Lushi (Jira)
Gal Lushi created ARROW-7356:


 Summary: [C++] Refactor Parquet Code Samples to use Result APIs
 Key: ARROW-7356
 URL: https://issues.apache.org/jira/browse/ARROW-7356
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Gal Lushi


Currently, the Parquet code samples use the (now deprecated by ARROW-7235) 
`Status`-returning functions.

See [https://github.com/apache/arrow/pull/5994]

this also closes ARROW-7352 which was opened is the wrong JIRA by mistake.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-12-09-0

2019-12-09 Thread Krisztián Szűcs
The HDFS issue is a regression, created a jira for it
https://issues.apache.org/jira/browse/ARROW-7354
I broke the fuzzit builds with a recent docker-compose update
https://issues.apache.org/jira/browse/ARROW-7355


On Mon, Dec 9, 2019 at 2:31 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2019-12-09-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0
>
> Failed Tasks:
> - centos-8:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-8
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-gandiva-jar-osx
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-hdfs-2.9.2
> - test-ubuntu-fuzzit-fuzzing:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-ubuntu-fuzzit-fuzzing
> - test-ubuntu-fuzzit-regression:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-ubuntu-fuzzit-regression
> - ubuntu-bionic:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-ubuntu-bionic
>
> Succeeded Tasks:
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-6
> - centos-7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-7
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py27
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py37
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py27
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py37
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-win-vs2015-py37
> - debian-buster:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-debian-buster
> - debian-stretch:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-debian-stretch
> - gandiva-jar-trusty:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-gandiva-jar-trusty
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-homebrew-cpp
> - macos-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-macos-r-autobrew
> - test-conda-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-cpp
> - test-conda-python-2.7-pandas-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-2.7-pandas-latest
> - test-conda-python-2.7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-2.7
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.6
> - test-conda-python-3.7-dask-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-pandas-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-pandas-latest
> - test-conda-python-3.7-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-pandas-master
> - test-conda-python-3.7-spark-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-spark-master
> - test-conda-python-3.7-turbodbc-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-turbodbc-latest
> - test-conda-python-3.7-turbodbc-master:
>   URL: 
> 

[jira] [Created] (ARROW-7355) [CI] Environment variables are defined twice for the fuzzit builds

2019-12-09 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7355:
--

 Summary: [CI] Environment variables are defined twice for the 
fuzzit builds
 Key: ARROW-7355
 URL: https://issues.apache.org/jira/browse/ARROW-7355
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


https://github.com/apache/arrow/commit/7102d7eeef60fd0fb4fb7b06ed092d63db961b15#diff-4e5e90c6228fd48698d074241c2ba760R916
 broke the fuzzit builds

The environment variables should be merged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7354) [C++] TestHadoopFileSystem::ThreadSafety fails with sigabort

2019-12-09 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7354:
--

 Summary: [C++] TestHadoopFileSystem::ThreadSafety fails with 
sigabort
 Key: ARROW-7354
 URL: https://issues.apache.org/jira/browse/ARROW-7354
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Krisztian Szucs
Assignee: Antoine Pitrou


The regression has been introduced recently:
https://github.com/ursa-labs/crossbow/branches/all?utf8=✓=hdfs
Most certainly with commit:
https://github.com/apache/arrow/commit/6758b24fdd4525dda0f9b2760d016753015a948d

{code}
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f9988f69801 in __GI_abort () at abort.c:79
#2  0x7f998654abf5 in os::abort(bool) () from 
/opt/conda/envs/arrow/jre/lib/amd64/server/libjvm.so
#3  0x7f99866dce03 in VMError::report_and_die() () from 
/opt/conda/envs/arrow/jre/lib/amd64/server/libjvm.so
#4  0x7f9986551622 in JVM_handle_linux_signal () from 
/opt/conda/envs/arrow/jre/lib/amd64/server/libjvm.so
#5  0x7f9986546c93 in signalHandler(int, siginfo*, void*) () from 
/opt/conda/envs/arrow/jre/lib/amd64/server/libjvm.so
#6  
#7  0x55f59f107870 in arrow::Buffer::data (this=0x726162) at 
/arrow/cpp/src/arrow/buffer.h:181
#8  0x55f59f121bbe in 
arrow::io::TestHadoopFileSystem_ThreadSafety_Test::TestBody()::{lambda()#1}::operator()()
 const (__closure=0x55f5a03a1ba8) at /arrow/cpp/src/arrow/io/hdfs_test.cc:470
#9  0x55f59f130d76 in std::__invoke_impl::TestBody()::{lambda()#1}>(std::__invoke_other,
 arrow::io::TestHadoopFileSystem_ThreadSafety_Test::TestBody()::{lambda()#1}&&) (__f=...) at 
/opt/conda/envs/arrow/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/invoke.h:60
#10 0x55f59f12f9ee in 
std::__invoke::TestBody()::{lambda()#1}>(std::__invoke_result&&,
 (arrow::io::TestHadoopFileSystem_ThreadSafety_Test::T
estBody()::{lambda()#1}&&)...) (__fn=...) at 
/opt/conda/envs/arrow/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/bits/invoke.h:95
#11 0x55f59f134148 in 
std::thread::_Invoker::TestBody()::{lambda()#1}>
 >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x55f5a03a1ba8)
at 
/opt/conda/envs/arrow/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/thread:234
#12 0x55f59f1340f5 in 
std::thread::_Invoker::TestBody()::{lambda()#1}>
 >::operator()() (this=0x55f5a03a1ba8)
at 
/opt/conda/envs/arrow/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/thread:243
#13 0x55f59f1340b4 in 
std::thread::_State_impl::TestBody()::{lambda()#1}>
 > >::_M_run() (this=0x55f5a03a1ba0)
at 
/opt/conda/envs/arrow/x86_64-conda_cos6-linux-gnu/include/c++/7.3.0/thread:186
#14 0x7f99893e2163 in std::execute_native_thread_routine 
(__p=0x55f5a03a1ba0)
at 
/home/conda/feedstock_root/build_artifacts/ctng-compilers_1574978377740/work/.build/x86_64-conda_cos6-linux-gnu/src/gcc/libstdc++-v3/src/c++11/thread.cc:80
#15 0x7f99894956db in start_thread (arg=0x7f99039ff700) at 
pthread_create.c:463
#16 0x7f998904a88f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-12-09-0

2019-12-09 Thread Crossbow


Arrow Build Report for Job nightly-2019-12-09-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0

Failed Tasks:
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-8
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-gandiva-jar-osx
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-ubuntu-fuzzit-fuzzing:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-ubuntu-fuzzit-fuzzing
- test-ubuntu-fuzzit-regression:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-ubuntu-fuzzit-regression
- ubuntu-bionic:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-ubuntu-bionic

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-centos-7
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-linux-gcc-py37
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-osx-clang-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-09-0-circle-test-conda-python-3.8-dask-master
- test-conda-python-3.8-pandas-latest:
  URL: 

[jira] [Created] (ARROW-7353) [C++] Disable -Wmissing-braces when building with clang

2019-12-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7353:
---

 Summary: [C++] Disable -Wmissing-braces when building with clang
 Key: ARROW-7353
 URL: https://issues.apache.org/jira/browse/ARROW-7353
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


I found this fails the build with Xcode 8.3.3. It seems that it is advised to 
ignore this warning

https://stackoverflow.com/questions/13905200/is-it-wise-to-ignore-gcc-clangs-wmissing-braces-warning



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7352) Refactor Parquet Code Samples to use Result APIs

2019-12-09 Thread Gal Lushi (Jira)
Gal Lushi created ARROW-7352:


 Summary: Refactor Parquet Code Samples to use Result APIs
 Key: ARROW-7352
 URL: https://issues.apache.org/jira/browse/ARROW-7352
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Gal Lushi


Currently, the Parquet code samples use the (now deprecated by ARROW-7235) 
`Status`-returning functions.

A PR will be opened shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-09 Thread Antoine Pitrou


Right, I'll give it a try in a few days.

Best regards

Antoine.


Le 09/12/2019 à 12:46, Wes McKinney a écrit :
> While it's unfortunate to have to re-examine some basic design issues
> at this stage, I agree with Jacques's point that it would be nice if
> we can accommodate (without great hardship) the use case where a
> stream/pipeline of record batches are passed in C that does not
> require the called function to have to parse or validate the schema
> each time. Gandiva uses its own data structure [1] for passing a
> schemaless record batch across JNI and in theory this could be
> replaced by the C data structure
> 
> [1]: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/eval_batch.h
> 
> On Sun, Dec 8, 2019 at 8:09 PM Fan Liya  wrote:
>>
>> +1, as this is useful IMO.
>>
>> Best,
>> Liya Fan
>>
>> On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau  wrote:
>>
>>> -1 (binding)
>>>
>>> I'm voting -1 on this. I posted the thinking why on the PR. The high-level
>>> is that I think it needs to better address the pipelined use case as right
>>> now it fails to support that at all and has too much weight to ignore that
>>> use case.
>>>
>>> I actually would have posted it here but totally missed this vote thread
>>> until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm
>>> simply asking for some small changes to the approach to also support the
>>> pipelined usage pattern.
>>>
>>> On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney  wrote:
>>>
 Hello,

 Could more PMC members take a look at this work?

 Thank you

 On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson
  wrote:
>
> +1 (non-binding)
>
> On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney 
 wrote:
>
>> +1 (binding)
>>
>> On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney 
 wrote:
>>>
>>> hello,
>>>
>>> We have been discussing the creation of a minimalist C-based data
>>> interface for applications to exchange Arrow columnar data
>>> structures
>>> with each other. Some notable features of this interface include:
>>>
>>> * A small amount of header-only C code can be copied into
>>> downstream
>>> applications, no external dependencies are needed (notable, it is
>>> not
>>> required to use Flatbuffers, though there are trade-offs resulting
>>> from this)
>>> * Low development investment (in other words: limited-scope use
>>> cases
>>> can be accomplished with little code). Enable C libraries to export
>>> Arrow columnar data at C call sites with minimal code
>>>
>>> This "C Data Interface" serves different use cases from the
>>> language-independent IPC protocol and trades away a number of
 features
>>> (such as forward/backward compatibility) in the interest of
 minimalism
>>> / simplicity. It is not a replacement for the IPC protocol and will
>>> only be used to interchange in-process data at C call sites.
>>>
>>> The PR providing the specification is here
>>>
>>> https://github.com/apache/arrow/pull/5442
>>>
>>> A fairly comprehensive C++ implementation of this demonstrating its
>>> use is found here
>>>
>>> https://github.com/apache/arrow/pull/5608
>>>
>>> (note that other applications implementing the interface may choose
 to
>>> only support a few features and thus have far less code to write)
>>>
>>> Please vote to adopt the SPECIFICATION (GitHub PR #5442).
>>>
>>> This vote will be open for at least 72 hours
>>>
>>> [ ] +1 Adopt C Data Interface specification
>>> [ ] +0
>>> [ ] -1 Do not adopt because...
>>>
>>> Thank you
>>

>>>


Re: Human-readable version of Arrow Schema?

2019-12-09 Thread Wes McKinney
The only "canonical" representation of schemas at the moment is the
Flatbuffers data structure [1]

Having a human-readable/parseable text representation I think only
makes sense if it is offered without any backward/forward
compatibility guarantees.

Note I had previously opened
https://issues.apache.org/jira/browse/ARROW-3730 where I noted that
there's no way (aside from generating the Flatbuffers messages) to
generate a schema representation that can be used later to reconstruct
a schema in a program. If such a representation were human
readable/editable that seems beneficial.



[1]: https://github.com/apache/arrow/blob/master/format/Schema.fbs

On Sat, Dec 7, 2019 at 11:56 AM Maarten Ballintijn  wrote:
>
>
> Is there a syntax specified for schemas?
>
> Cheers,
> Maarten.
>
>
> > On Dec 6, 2019, at 5:01 PM, Micah Kornfield  wrote:
> >
> > Hi Christian,
> > As far as I know no-one is working on a canonical text representation for
> > schemas.  A JSON serializer exists for integration test purposes, but
> > IMO it shouldn't be relied upon as canonical.
> >
> > It looks like Flatbuffers supports serialization to/from JSON [1
> > ],
> > using that functionality might be a promising avenue to pursue for a human
> > readable schema. I could see adding a helper method someplace under IPC for
> > this.  Would that meet your needs?  I think if there are other
> > requirements, then a proposal would be welcome.  Ideally, a solution would
> > not require additional build/runtime dependencies.
> >
> >
> > Thanks,
> > Micah
> >
> > [1] See Text & schema parsing
> > https://google.github.io/flatbuffers/flatbuffers_guide_use_cpp.html
> >
> > On Fri, Dec 6, 2019 at 1:26 PM Christian Hudon  wrote:
> >
> >> Hi,
> >>
> >> For the uses I would like to make of Arrow, I would need a human-readable
> >> and -writable version of an Arrow Schema, that could be converted to and
> >> from the Arrow Schema C++ object. Going through the doc for 0.15.1, I don't
> >> see anything to that effect, with the closest being the ToString() method
> >> on DataType instances, but which is meant for debugging only. (I need an
> >> expression of an Arrow Schema that people can read, and that can live
> >> outside of the code for a particular operation.)
> >>
> >> Is a text representation of an Arrow Schema something that is being worked
> >> on now? If not, would you folks be interested in me putting up an initial
> >> proposal for discussion? Any design constraints I should pay attention to,
> >> then?
> >>
> >> Thanks,
> >>
> >>  Christian
> >> --
> >>
> >>
> >> │ Christian Hudon
> >>
> >> │ Applied Research Scientist
> >>
> >>   Element AI, 6650 Saint-Urbain #500
> >>
> >>   Montréal, QC, H2S 3G9, Canada
> >>   Elementai.com
> >>
>


[ANNOUNCE] New Arrow committer: Joris van den Bossche

2019-12-09 Thread Wes McKinney
On behalf of the Arrow PMC, I'm happy to announce that Joris has
accepted an invitation to become a committer on Apache Arrow.

Welcome, and thank you for your contributions!


Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-09 Thread Wes McKinney
While it's unfortunate to have to re-examine some basic design issues
at this stage, I agree with Jacques's point that it would be nice if
we can accommodate (without great hardship) the use case where a
stream/pipeline of record batches are passed in C that does not
require the called function to have to parse or validate the schema
each time. Gandiva uses its own data structure [1] for passing a
schemaless record batch across JNI and in theory this could be
replaced by the C data structure

[1]: https://github.com/apache/arrow/blob/master/cpp/src/gandiva/eval_batch.h

On Sun, Dec 8, 2019 at 8:09 PM Fan Liya  wrote:
>
> +1, as this is useful IMO.
>
> Best,
> Liya Fan
>
> On Sat, Dec 7, 2019 at 12:21 PM Jacques Nadeau  wrote:
>
> > -1 (binding)
> >
> > I'm voting -1 on this. I posted the thinking why on the PR. The high-level
> > is that I think it needs to better address the pipelined use case as right
> > now it fails to support that at all and has too much weight to ignore that
> > use case.
> >
> > I actually would have posted it here but totally missed this vote thread
> > until just now (I'm traveling atm). My -1 is not an indefinite -1, I'm
> > simply asking for some small changes to the approach to also support the
> > pipelined usage pattern.
> >
> > On Sat, Dec 7, 2019 at 3:09 AM Wes McKinney  wrote:
> >
> > > Hello,
> > >
> > > Could more PMC members take a look at this work?
> > >
> > > Thank you
> > >
> > > On Tue, Dec 3, 2019 at 1:50 PM Neal Richardson
> > >  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > On Tue, Dec 3, 2019 at 10:56 AM Wes McKinney 
> > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Tue, Dec 3, 2019 at 12:54 PM Wes McKinney 
> > > wrote:
> > > > > >
> > > > > > hello,
> > > > > >
> > > > > > We have been discussing the creation of a minimalist C-based data
> > > > > > interface for applications to exchange Arrow columnar data
> > structures
> > > > > > with each other. Some notable features of this interface include:
> > > > > >
> > > > > > * A small amount of header-only C code can be copied into
> > downstream
> > > > > > applications, no external dependencies are needed (notable, it is
> > not
> > > > > > required to use Flatbuffers, though there are trade-offs resulting
> > > > > > from this)
> > > > > > * Low development investment (in other words: limited-scope use
> > cases
> > > > > > can be accomplished with little code). Enable C libraries to export
> > > > > > Arrow columnar data at C call sites with minimal code
> > > > > >
> > > > > > This "C Data Interface" serves different use cases from the
> > > > > > language-independent IPC protocol and trades away a number of
> > > features
> > > > > > (such as forward/backward compatibility) in the interest of
> > > minimalism
> > > > > > / simplicity. It is not a replacement for the IPC protocol and will
> > > > > > only be used to interchange in-process data at C call sites.
> > > > > >
> > > > > > The PR providing the specification is here
> > > > > >
> > > > > > https://github.com/apache/arrow/pull/5442
> > > > > >
> > > > > > A fairly comprehensive C++ implementation of this demonstrating its
> > > > > > use is found here
> > > > > >
> > > > > > https://github.com/apache/arrow/pull/5608
> > > > > >
> > > > > > (note that other applications implementing the interface may choose
> > > to
> > > > > > only support a few features and thus have far less code to write)
> > > > > >
> > > > > > Please vote to adopt the SPECIFICATION (GitHub PR #5442).
> > > > > >
> > > > > > This vote will be open for at least 72 hours
> > > > > >
> > > > > > [ ] +1 Adopt C Data Interface specification
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not adopt because...
> > > > > >
> > > > > > Thank you
> > > > >
> > >
> >


[jira] [Created] (ARROW-7351) [Developer] Only suggest cpp-* fix versions when merging Parquet patches

2019-12-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7351:
---

 Summary: [Developer] Only suggest cpp-* fix versions when merging 
Parquet patches
 Key: ARROW-7351
 URL: https://issues.apache.org/jira/browse/ARROW-7351
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Wes McKinney
 Fix For: 1.0.0


The default fix version (1.11.0) for Parquet issues is resulting in the wrong 
fix version being set sometimes when committers merge PARQUET-* patches. Since 
we only have C++ Parquet issues, I think we can safely only suggest the cpp-* 
fix versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7350) [Python] Parquet file metadata min and max statistics not decoded from bytes for Decimal data types

2019-12-09 Thread Max Firman (Jira)
Max Firman created ARROW-7350:
-

 Summary: [Python] Parquet file metadata min and max statistics not 
decoded from bytes for Decimal data types
 Key: ARROW-7350
 URL: https://issues.apache.org/jira/browse/ARROW-7350
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.15.1
Reporter: Max Firman


Parquet file metadata for Decimal type columns contain min and max values that 
are not decoded from bytes into Decimals. This causes issues in dependent 
libraries like Dask (see [https://github.com/dask/dask/issues/5647]).

 
{code:python|title=Reproducible example|borderStyle=solid}
from decimal import Decimal
import random

import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa

NUM_DATA_POINTS_PER_PARTITION = 25

random.seed(0)
data1 = [{"col1": Decimal(f"{random.randint(0, 999)}.{random.randint(0, 99)}")} 
for i in range(NUM_DATA_POINTS_PER_PARTITION)]

df = pd.DataFrame(data1)
table = pa.Table.from_pandas(df)
pq.write_table(table, 'my_data.parquet')

parquet_file = pq.ParquetFile('my_data.parquet')

assert isinstance(parquet_file.metadata.row_group(0).column(0).statistics.min, 
Decimal) # <-- AssertionError here because min has type bytes rather than 
Decimal
assert isinstance(parquet_file.metadata.row_group(0).column(0).statistics.max, 
Decimal)

{code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7349) [C++] Fix the bug of parsing string hex values

2019-12-09 Thread Liya Fan (Jira)
Liya Fan created ARROW-7349:
---

 Summary: [C++] Fix the bug of parsing string hex values
 Key: ARROW-7349
 URL: https://issues.apache.org/jira/browse/ARROW-7349
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Liya Fan
Assignee: Liya Fan


std::lower_bound returns the end of the search range, when failing to find a 
match. 

The end of the search range is one position after the last valid position. So 
the value in this position is undefined, and we should not reference the value 
here to compare it with the target value. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)