[jira] [Updated] (ARROW-8932) [C++] symbol resolution failures with liborc.a

2020-05-24 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated ARROW-8932:

Description: 
This is failing in the Travis CI s390x build. I am not sure this is related to 
ARROW-8930.

[https://travis-ci.org/github/apache/arrow/jobs/690006107] was successful.

[https://travis-ci.org/github/apache/arrow/jobs/690634108#L1023|https://travis-ci.org/github/apache/arrow/jobs/690634108]
 causes failures.
{code:java}
[435/548] Linking CXX executable debug/arrow-orc-adapter-test
1024 FAILED: debug/arrow-orc-adapter-test
1025 : && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
debug//libgtestd.so  /usr/lib/s390x-linux-gnu/libsnappy.so.1.1.8  
/usr/lib/s390x-linux-gnu/liblz4.so  /usr/lib/s390x-linux-gnu/libz.so  -lpthread 
 -ldl  orc_ep-install/lib/liborc.a  /usr/lib/s390x-linux-gnu/libssl.so  
/usr/lib/s390x-linux-gnu/libcrypto.so  /usr/lib/s390x-linux-gnu/libbrotlienc.so 
 /usr/lib/s390x-linux-gnu/libbrotlidec.so  
/usr/lib/s390x-linux-gnu/libbrotlicommon.so  /usr/lib/s390x-linux-gnu/libbz2.so 
 /usr/lib/s390x-linux-gnu/libzstd.so  protobuf_ep-install/lib/libprotobuf.a  
/usr/lib/s390x-linux-gnu/libglog.so  
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  -lrt 
&& :
1026 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::doStreamingCompression()':
1027 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:244: undefined 
reference to `deflateReset'
1028 /usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:266: undefined 
reference to `deflate'
1029 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::init()':
1030 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:296: undefined 
reference to `deflateInit2_'
1031 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::end()':
1032 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:303: undefined 
reference to `deflateEnd'
1033 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::ZlibDecompressionStream(std::unique_ptr >, unsigned long, 
orc::MemoryPool&)':
1034 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:417: undefined 
reference to `inflateInit2_'
1035 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::~ZlibDecompressionStream()':
1036 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:442: undefined 
reference to `inflateEnd'
1037 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::Next(void const**, int*)':
1038 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:483: undefined 
reference to `inflateReset'
1039 /usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:489: undefined 
reference to `inflate'
1040 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::SnappyDecompressionStream::decompress(char const*, unsigned long, char*, 
unsigned long)':
1041 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:848: undefined 
reference to `snappy::GetUncompressedLength(char const*, unsigned long, 
unsigned long*)'
1042 /usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:856: undefined 
reference to `snappy::RawUncompress(char const*, unsigned long, char*)'
1043 /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::Lz4DecompressionStream::decompress(char const*, unsigned long, char*, 
unsigned long)':
1044 /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:922: undefined 
reference to `LZ4_decompress_safe'
1045 collect2: error: ld returned 1 exit status{code}
 

  was:
This is failing in the Travis CI s390x build. I am not sure this is related to 
ARROW-8930.

[https://travis-ci.org/github/apache/arrow/jobs/690006107] was successful.

[https://travis-ci.org/github/apache/arrow/jobs/690634108#L1023|https://travis-ci.org/github/apache/arrow/jobs/690634108]
 causes failures.
{code:java}
[435/548] Linking CXX executable debug/arrow-orc-adapter-test
1024FAILED: debug/arrow-orc-adapter-test
1025: && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug 

[jira] [Commented] (ARROW-8930) [C++] libz.so linking error with liborc.a

2020-05-24 Thread Kazuaki Ishizaki (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115672#comment-17115672
 ] 

Kazuaki Ishizaki commented on ARROW-8930:
-

According to 
[https://stackoverflow.com/questions/19901934/libpthread-so-0-error-adding-symbols-dso-missing-from-command-line],
 IMHO, the proposed order makes sense.

The question is why this failure did not occur three days ago.
[https://travis-ci.org/github/apache/arrow/jobs/690006106]

> [C++] libz.so linking error with liborc.a
> -
>
> Key: ARROW-8930
> URL: https://issues.apache.org/jira/browse/ARROW-8930
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> This is failing in the Travis CI ARM build
> https://travis-ci.org/github/apache/arrow/jobs/690722203
> {code}
> : && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
> -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
> -Wno-sign-conversion -Wno-unused-variable -Werror -march=armv8-a  -g  
> -rdynamic 
> src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o
>   -o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
> debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
> debug//libgtestd.so  /usr/lib/aarch64-linux-gnu/libsnappy.so.1.1.8  
> /usr/lib/aarch64-linux-gnu/liblz4.so  /usr/lib/aarch64-linux-gnu/libz.so  
> -lpthread  -ldl  orc_ep-install/lib/liborc.a  
> /usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libcrypto.so 
>  /usr/lib/aarch64-linux-gnu/libbrotlienc.so  
> /usr/lib/aarch64-linux-gnu/libbrotlidec.so  
> /usr/lib/aarch64-linux-gnu/libbrotlicommon.so  
> /usr/lib/aarch64-linux-gnu/libbz2.so  /usr/lib/aarch64-linux-gnu/libzstd.so  
> /usr/lib/aarch64-linux-gnu/libprotobuf.so  
> /usr/lib/aarch64-linux-gnu/libglog.so  
> jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  
> -lrt && :
> /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): undefined 
> reference to symbol 'inflateEnd'
> /usr/bin/ld: /usr/lib/aarch64-linux-gnu/libz.so: error adding symbols: DSO 
> missing from command line
> collect2: error: ld returned 1 exit status
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8932) [C++] symbol resolution failures with liborc.a

2020-05-24 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated ARROW-8932:

Description: 
This is failing in the Travis CI s390x build. I am not sure this is related to 
ARROW-8930.

[https://travis-ci.org/github/apache/arrow/jobs/690006107] was successful.

[https://travis-ci.org/github/apache/arrow/jobs/690634108#L1023|https://travis-ci.org/github/apache/arrow/jobs/690634108]
 causes failures.
{code:java}
[435/548] Linking CXX executable debug/arrow-orc-adapter-test
1024FAILED: debug/arrow-orc-adapter-test
1025: && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
debug//libgtestd.so  /usr/lib/s390x-linux-gnu/libsnappy.so.1.1.8  
/usr/lib/s390x-linux-gnu/liblz4.so  /usr/lib/s390x-linux-gnu/libz.so  -lpthread 
 -ldl  orc_ep-install/lib/liborc.a  /usr/lib/s390x-linux-gnu/libssl.so  
/usr/lib/s390x-linux-gnu/libcrypto.so  /usr/lib/s390x-linux-gnu/libbrotlienc.so 
 /usr/lib/s390x-linux-gnu/libbrotlidec.so  
/usr/lib/s390x-linux-gnu/libbrotlicommon.so  /usr/lib/s390x-linux-gnu/libbz2.so 
 /usr/lib/s390x-linux-gnu/libzstd.so  protobuf_ep-install/lib/libprotobuf.a  
/usr/lib/s390x-linux-gnu/libglog.so  
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  -lrt 
&& :
1026/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::doStreamingCompression()':
1027/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:244: undefined 
reference to `deflateReset'
1028/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:266: undefined 
reference to `deflate'
1029/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::init()':
1030/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:296: undefined 
reference to `deflateInit2_'
1031/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::end()':
1032/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:303: undefined 
reference to `deflateEnd'
1033/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::ZlibDecompressionStream(std::unique_ptr >, unsigned long, 
orc::MemoryPool&)':
1034/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:417: undefined 
reference to `inflateInit2_'
1035/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::~ZlibDecompressionStream()':
1036/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:442: undefined 
reference to `inflateEnd'
1037/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::Next(void const**, int*)':
1038/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:483: undefined 
reference to `inflateReset'
1039/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:489: undefined 
reference to `inflate'
1040/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::SnappyDecompressionStream::decompress(char const*, unsigned long, char*, 
unsigned long)':
1041/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:848: undefined 
reference to `snappy::GetUncompressedLength(char const*, unsigned long, 
unsigned long*)'
1042/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:856: undefined 
reference to `snappy::RawUncompress(char const*, unsigned long, char*)'
1043/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::Lz4DecompressionStream::decompress(char const*, unsigned long, char*, 
unsigned long)':
1044/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:922: undefined 
reference to `LZ4_decompress_safe'
1045collect2: error: ld returned 1 exit status{code}

  was:
This is failing in the Travis CI s390x build. I am not sure this is related to 
ARROW-8930.

[https://travis-ci.org/github/apache/arrow/jobs/690006107] was successful.

[https://travis-ci.org/github/apache/arrow/jobs/690634108#L1023|https://travis-ci.org/github/apache/arrow/jobs/690634108]
 causes failures.


{code:java}
 [435/548] Linking CXX executable debug/arrow-orc-adapter-test1024FAILED: 
debug/arrow-orc-adapter-test 1025: && /usr/bin/ccache /usr/bin/c++  
-Wno-noexcept-type  -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  

[jira] [Created] (ARROW-8932) [C++] symbol resolution failures with liborc.a

2020-05-24 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-8932:
---

 Summary: [C++] symbol resolution failures with liborc.a
 Key: ARROW-8932
 URL: https://issues.apache.org/jira/browse/ARROW-8932
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Kazuaki Ishizaki


This is failing in the Travis CI s390x build. I am not sure this is related to 
ARROW-8930.

[https://travis-ci.org/github/apache/arrow/jobs/690006107] was successful.

[https://travis-ci.org/github/apache/arrow/jobs/690634108#L1023|https://travis-ci.org/github/apache/arrow/jobs/690634108]
 causes failures.


{code:java}
 [435/548] Linking CXX executable debug/arrow-orc-adapter-test1024FAILED: 
debug/arrow-orc-adapter-test 1025: && /usr/bin/ccache /usr/bin/c++  
-Wno-noexcept-type  -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
debug//libgtestd.so  /usr/lib/s390x-linux-gnu/libsnappy.so.1.1.8  
/usr/lib/s390x-linux-gnu/liblz4.so  /usr/lib/s390x-linux-gnu/libz.so  -lpthread 
 -ldl  orc_ep-install/lib/liborc.a  /usr/lib/s390x-linux-gnu/libssl.so  
/usr/lib/s390x-linux-gnu/libcrypto.so  /usr/lib/s390x-linux-gnu/libbrotlienc.so 
 /usr/lib/s390x-linux-gnu/libbrotlidec.so  
/usr/lib/s390x-linux-gnu/libbrotlicommon.so  /usr/lib/s390x-linux-gnu/libbz2.so 
 /usr/lib/s390x-linux-gnu/libzstd.so  protobuf_ep-install/lib/libprotobuf.a  
/usr/lib/s390x-linux-gnu/libglog.so  
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  -lrt 
&& :1026/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::doStreamingCompression()':1027/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:244:
 undefined reference to `deflateReset'1028/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:266: undefined 
reference to `deflate'1029/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::init()':1030/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:296:
 undefined reference to `deflateInit2_'1031/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibCompressionStream::end()':1032/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:303:
 undefined reference to `deflateEnd'1033/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::ZlibDecompressionStream(std::unique_ptr >, unsigned long, 
orc::MemoryPool&)':1034/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:417:
 undefined reference to `inflateInit2_'1035/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::~ZlibDecompressionStream()':1036/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:442:
 undefined reference to `inflateEnd'1037/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::ZlibDecompressionStream::Next(void const**, 
int*)':1038/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:483: 
undefined reference to `inflateReset'1039/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:489: undefined 
reference to `inflate'1040/usr/bin/ld: 
orc_ep-install/lib/liborc.a(Compression.cc.o): in function 
`orc::SnappyDecompressionStream::decompress(char const*, unsigned long, char*, 
unsigned 
long)':1041/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:848: 
undefined reference to `snappy::GetUncompressedLength(char const*, unsigned 
long, unsigned long*)'1042/usr/bin/ld: 
/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:856: undefined 
reference to `snappy::RawUncompress(char const*, unsigned long, 
char*)'1043/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): in 
function `orc::Lz4DecompressionStream::decompress(char const*, unsigned long, 
char*, unsigned 
long)':1044/build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Compression.cc:922: 
undefined reference to `LZ4_decompress_safe'1045collect2: error: ld returned 1 
exit status{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8931) [Rust] Support lexical sort in arrow compute kernel

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8931:
--
Labels: pull-request-available  (was: )

> [Rust] Support lexical sort in arrow compute kernel
> ---
>
> Key: ARROW-8931
> URL: https://issues.apache.org/jira/browse/ARROW-8931
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: QP Hou
>Assignee: QP Hou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8931) [Rust] Support lexical sort in arrow compute kernel

2020-05-24 Thread QP Hou (Jira)
QP Hou created ARROW-8931:
-

 Summary: [Rust] Support lexical sort in arrow compute kernel
 Key: ARROW-8931
 URL: https://issues.apache.org/jira/browse/ARROW-8931
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: QP Hou
Assignee: QP Hou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8633) [C++] Add ValidateAscii function

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8633.
-
Resolution: Fixed

Issue resolved by pull request 7121
[https://github.com/apache/arrow/pull/7121]

> [C++] Add ValidateAscii function
> 
>
> Key: ARROW-8633
> URL: https://issues.apache.org/jira/browse/ARROW-8633
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Yuqi Gu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In some cases, we want to be able to check whether it's safe to use functions 
> that assume ASCII (like {{std::tolower}}, or {{std::string::substr). This was 
> implemented in a PR for ARROW-6131 that was not merged



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8930) [C++] libz.so linking error with liborc.a

2020-05-24 Thread Yibo Cai (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115632#comment-17115632
 ] 

Yibo Cai commented on ARROW-8930:
-

I remember I saw same error when building flighrpc some months ago, workaround 
by append "-lz". In current link command, libz.so is listed before liborc.a, I 
think it should be after.
Will look into this bug.

> [C++] libz.so linking error with liborc.a
> -
>
> Key: ARROW-8930
> URL: https://issues.apache.org/jira/browse/ARROW-8930
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> This is failing in the Travis CI ARM build
> https://travis-ci.org/github/apache/arrow/jobs/690722203
> {code}
> : && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
> -fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
> -Wno-sign-conversion -Wno-unused-variable -Werror -march=armv8-a  -g  
> -rdynamic 
> src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o
>   -o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
> debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
> debug//libgtestd.so  /usr/lib/aarch64-linux-gnu/libsnappy.so.1.1.8  
> /usr/lib/aarch64-linux-gnu/liblz4.so  /usr/lib/aarch64-linux-gnu/libz.so  
> -lpthread  -ldl  orc_ep-install/lib/liborc.a  
> /usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libcrypto.so 
>  /usr/lib/aarch64-linux-gnu/libbrotlienc.so  
> /usr/lib/aarch64-linux-gnu/libbrotlidec.so  
> /usr/lib/aarch64-linux-gnu/libbrotlicommon.so  
> /usr/lib/aarch64-linux-gnu/libbz2.so  /usr/lib/aarch64-linux-gnu/libzstd.so  
> /usr/lib/aarch64-linux-gnu/libprotobuf.so  
> /usr/lib/aarch64-linux-gnu/libglog.so  
> jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  
> -lrt && :
> /usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): undefined 
> reference to symbol 'inflateEnd'
> /usr/bin/ld: /usr/lib/aarch64-linux-gnu/libz.so: error adding symbols: DSO 
> missing from command line
> collect2: error: ld returned 1 exit status
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8930) [C++] libz.so linking error with liborc.a

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8930:
---

 Summary: [C++] libz.so linking error with liborc.a
 Key: ARROW-8930
 URL: https://issues.apache.org/jira/browse/ARROW-8930
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Continuous Integration
Reporter: Wes McKinney
 Fix For: 1.0.0


This is failing in the Travis CI ARM build

https://travis-ci.org/github/apache/arrow/jobs/690722203

{code}
: && /usr/bin/ccache /usr/bin/c++  -Wno-noexcept-type  
-fdiagnostics-color=always -ggdb -O0  -Wall -Wno-conversion 
-Wno-sign-conversion -Wno-unused-variable -Werror -march=armv8-a  -g  -rdynamic 
src/arrow/adapters/orc/CMakeFiles/arrow-orc-adapter-test.dir/adapter_test.cc.o  
-o debug/arrow-orc-adapter-test  -Wl,-rpath,/build/cpp/debug  
debug/libarrow_testing.a  debug/libarrow.a  debug//libgtest_maind.so  
debug//libgtestd.so  /usr/lib/aarch64-linux-gnu/libsnappy.so.1.1.8  
/usr/lib/aarch64-linux-gnu/liblz4.so  /usr/lib/aarch64-linux-gnu/libz.so  
-lpthread  -ldl  orc_ep-install/lib/liborc.a  
/usr/lib/aarch64-linux-gnu/libssl.so  /usr/lib/aarch64-linux-gnu/libcrypto.so  
/usr/lib/aarch64-linux-gnu/libbrotlienc.so  
/usr/lib/aarch64-linux-gnu/libbrotlidec.so  
/usr/lib/aarch64-linux-gnu/libbrotlicommon.so  
/usr/lib/aarch64-linux-gnu/libbz2.so  /usr/lib/aarch64-linux-gnu/libzstd.so  
/usr/lib/aarch64-linux-gnu/libprotobuf.so  
/usr/lib/aarch64-linux-gnu/libglog.so  
jemalloc_ep-prefix/src/jemalloc_ep/dist//lib/libjemalloc_pic.a  -pthread  -lrt 
&& :
/usr/bin/ld: orc_ep-install/lib/liborc.a(Compression.cc.o): undefined reference 
to symbol 'inflateEnd'
/usr/bin/ld: /usr/lib/aarch64-linux-gnu/libz.so: error adding symbols: DSO 
missing from command line
collect2: error: ld returned 1 exit status
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8926) [C++] Improve docstrings in new public APIs in arrow/compute and fix miscellaneous typos

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8926:
--
Labels: pull-request-available  (was: )

> [C++] Improve docstrings in new public APIs in arrow/compute and fix 
> miscellaneous typos
> 
>
> Key: ARROW-8926
> URL: https://issues.apache.org/jira/browse/ARROW-8926
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've noticed some imprecise language while reading the headers and some other 
> opportunities for improvement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8929) [C++] Change compute::Arity:VarArgs min_args default to 0

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8929:
---

 Summary: [C++] Change compute::Arity:VarArgs min_args default to 0
 Key: ARROW-8929
 URL: https://issues.apache.org/jira/browse/ARROW-8929
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


The issue of minimum number of arguments is separate from providing an 
{{InputType}} for input type checking. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8928) [C++] Measure microperformance associated with data structure access interactions with arrow::compute::ExecBatch

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8928:
---

 Summary: [C++] Measure microperformance associated with data 
structure access interactions with arrow::compute::ExecBatch
 Key: ARROW-8928
 URL: https://issues.apache.org/jira/browse/ARROW-8928
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


{{arrow::compute::ExecBatch}} uses a vector of {{arrow::Datum}} to contain a 
collection of ArrayData and Scalar objects for kernel execution. It would be 
helpful to know how many nanoseconds of overhead is associated with basic 
interactions with this data structure to know the cost of using our vendored 
variant, and other such issues. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8927) [C++] Support dictionary memos when reading/writing record batches using cuda IPC

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8927:
--
Labels: pull-request-available  (was: )

> [C++] Support dictionary memos when reading/writing record batches using cuda 
> IPC
> -
>
> Key: ARROW-8927
> URL: https://issues.apache.org/jira/browse/ARROW-8927
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Alex Baden
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the cuda IPC calls for `ReadRecordBatch` do not accept a 
> dictionary memo as a parameter, building an empty memo before calling 
> `ipc::ReadRecordBatch`. As such, adding a test which duplicates 
> `TestCudaArrowIpc, BasicWriteRead` but calls `MakeDictionary` to create the 
> record batch fails because the dictionaries for the original record batch are 
> not found. The cuda IPC calls should be modified to take a dictionary memo, 
> just like the standard IPC API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8927) [C++] Support dictionary memos when reading/writing record batches using cuda IPC

2020-05-24 Thread Alex Baden (Jira)
Alex Baden created ARROW-8927:
-

 Summary: [C++] Support dictionary memos when reading/writing 
record batches using cuda IPC
 Key: ARROW-8927
 URL: https://issues.apache.org/jira/browse/ARROW-8927
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Alex Baden


Currently, the cuda IPC calls for `ReadRecordBatch` do not accept a dictionary 
memo as a parameter, building an empty memo before calling 
`ipc::ReadRecordBatch`. As such, adding a test which duplicates 
`TestCudaArrowIpc, BasicWriteRead` but calls `MakeDictionary` to create the 
record batch fails because the dictionaries for the original record batch are 
not found. The cuda IPC calls should be modified to take a dictionary memo, 
just like the standard IPC API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8926) [C++] Improve docstrings in new public APIs in arrow/compute and fix miscellaneous typos

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8926:
---

 Summary: [C++] Improve docstrings in new public APIs in 
arrow/compute and fix miscellaneous typos
 Key: ARROW-8926
 URL: https://issues.apache.org/jira/browse/ARROW-8926
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 1.0.0


I've noticed some imprecise language while reading the headers and some other 
opportunities for improvement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8911) [C++] Slicing a ChunkedArray with zero chunks segfaults

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8911:
--
Labels: pull-request-available  (was: )

> [C++] Slicing a ChunkedArray with zero chunks segfaults
> ---
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Assignee: Wes McKinney
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8911) [C++] Slicing a ChunkedArray with zero chunks segfaults

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8911:

Component/s: C++

> [C++] Slicing a ChunkedArray with zero chunks segfaults
> ---
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Assignee: Wes McKinney
>Priority: Critical
> Fix For: 1.0.0
>
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8911) [C++] Slicing a ChunkedArray with zero chunks segfaults

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8911:

Summary: [C++] Slicing a ChunkedArray with zero chunks segfaults  (was: 
[Python] An empty ChunkedArray created by `filter` can crash.)

> [C++] Slicing a ChunkedArray with zero chunks segfaults
> ---
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Assignee: Wes McKinney
>Priority: Critical
> Fix For: 1.0.0
>
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8911) [Python] An empty ChunkedArray created by `filter` can crash.

2020-05-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115473#comment-17115473
 ] 

Wes McKinney commented on ARROW-8911:
-

Thanks for the bug report. I'm opening a PR presently

> [Python] An empty ChunkedArray created by `filter` can crash.
> -
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Assignee: Wes McKinney
>Priority: Critical
> Fix For: 1.0.0
>
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8911) [Python] An empty ChunkedArray created by `filter` can crash.

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8911:
---

Assignee: Wes McKinney

> [Python] An empty ChunkedArray created by `filter` can crash.
> -
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Assignee: Wes McKinney
>Priority: Critical
> Fix For: 1.0.0
>
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8916) [Python] Add relevant glue for implementing each kind of FunctionOptions

2020-05-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115469#comment-17115469
 ] 

Wes McKinney commented on ARROW-8916:
-

I implemented a couple of wrappers already. Given that we're dealing with 
Cython it might be difficult for it to be much more than a manual wrapping 
affair, but in any case someone can take a closer look at what I've done and 
see if we want to do something different

> [Python] Add relevant glue for implementing each kind of FunctionOptions
> 
>
> Key: ARROW-8916
> URL: https://issues.apache.org/jira/browse/ARROW-8916
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8907) [Rust] implement scalar comparison operations

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8907:
--
Labels: pull-request-available  (was: )

> [Rust] implement scalar comparison operations
> -
>
> Key: ARROW-8907
> URL: https://issues.apache.org/jira/browse/ARROW-8907
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Reporter: Yordan Pavlov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently comparing an array to a scalar / literal value using the comparison 
> operations defined in the comparison kernel here:
> https://github.com/apache/arrow/blob/master/rust/arrow/src/compute/kernels/comparison.rs
> is very inefficient because:
> (1) an array with the scalar value repeated has to be created, taking time 
> and wasting memory
> (2) time is spent during comparison to load the same literal values over and 
> over
> Initial benchmarking of a specialized scalar comparison function indicates 
> good performance gains:
> eq Float32 time: [938.54 us 950.28 us 962.65 us]
> eq scalar Float32 time: [836.47 us 838.47 us 840.78 us]
> eq Float32 simd time: [75.836 us 76.389 us 77.185 us]
> eq scalar Float32 simd time: [61.551 us 61.605 us 61.671 us]
> The benchmark results above show that the scalar comparison function is about 
> 12% faster for non-SIMD and about 20% faster for SIMD comparison operations.
> And this is before accounting for creating the literal array. 
> In a more complex benchmark, the scalar comparison version is about 40% 
> faster overall when we account for not having to create arrays of scalar / 
> literal values.
> Here are the benchmark results:
> filter/filter with arrow SIMD (array) time: [647.77 us 675.12 us 706.69 us]
> filter/filter with arrow SIMD (scalar) time: [402.19 us 404.23 us 407.22 us]
> And here is the code for the benchmark:
> https://github.com/yordan-pavlov/arrow-benchmark/blob/master/rust/arrow_benchmark/src/main.rs#L230
> My only concern is that I can't see an easy way to use scalar comparison 
> operations in Data Fusion as it is currently designed to only work on arrays.
> [~paddyhoran] [~andygrove]  let me know what you think, would there be value 
> in implementing scalar comparison operations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8924) [C++][Gandiva] castDATE_date32() may cause overflow

2020-05-24 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-8924.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7260
[https://github.com/apache/arrow/pull/7260]

> [C++][Gandiva] castDATE_date32() may cause overflow
> ---
>
> Key: ARROW-8924
> URL: https://issues.apache.org/jira/browse/ARROW-8924
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following code in `cpp/src/gandiva/precompiled/time.cc` may cause 
> overflow since `int32` * `int32` is `int32`, then it is converted to `int64`. 
> The result of `int32` may loose the part of the result of the multiplication
>  
> {code:java}
> gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8924) [C++][Gandiva] castDATE_date32() may cause overflow

2020-05-24 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou reassigned ARROW-8924:
---

Assignee: Kazuaki Ishizaki

> [C++][Gandiva] castDATE_date32() may cause overflow
> ---
>
> Key: ARROW-8924
> URL: https://issues.apache.org/jira/browse/ARROW-8924
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following code in `cpp/src/gandiva/precompiled/time.cc` may cause 
> overflow since `int32` * `int32` is `int32`, then it is converted to `int64`. 
> The result of `int32` may loose the part of the result of the multiplication
>  
> {code:java}
> gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7778) [C++] Support nested dictionaries in JSON integration format

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-7778.
-
Resolution: Fixed

Issue resolved by pull request 7216
[https://github.com/apache/arrow/pull/7216]

> [C++] Support nested dictionaries in JSON integration format
> 
>
> Key: ARROW-7778
> URL: https://issues.apache.org/jira/browse/ARROW-7778
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The {{generate_nested_dictionary_case}} is disabled for all library 
> implementations. We support dictionaries-within-dictionaries in IPC (I 
> believe) and so need to integration test this



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8913) [Ruby] Use "field" instead of "child"

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8913.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7255
[https://github.com/apache/arrow/pull/7255]

> [Ruby] Use "field" instead of "child"
> -
>
> Key: ARROW-8913
> URL: https://issues.apache.org/jira/browse/ARROW-8913
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Ruby
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7843) [Ruby] MSYS2 packages needed for Gandiva

2020-05-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115438#comment-17115438
 ] 

Wes McKinney commented on ARROW-7843:
-

FYI some of us are working actively on "interpreted" execution of expressions 
without an LLVM dependency (see recent PR 
https://github.com/apache/arrow/commit/7ad49eeca5215d9b2a56b6439f1bd6ea3ea9)

> [Ruby] MSYS2 packages needed for Gandiva
> 
>
> Key: ARROW-7843
> URL: https://issues.apache.org/jira/browse/ARROW-7843
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.16.0
> Environment: windows with rubyinstaller
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
>
> {noformat}
> require "gandiva"
> table = Arrow::Table.new(:field1 => Arrow::Int32Array.new([1, 2, 3, 4]),
>  :field2 => Arrow::Int32Array.new([11, 13, 15, 17]))
> schema = table.schema
> expression1 = schema.build_expression do |record|
>   record.field1 + record.field2
> end
> expression2 = schema.build_expression do |record, context|
>   context.if(record.field1 > record.field2)
> .then(record.field1 / record.field2)
> .else(record.field1)
> end
> projector = Gandiva::Projector.new(schema, [expression1, expression2])
> table.each_record_batch do |record_batch|
>   outputs = projector.evaluate(record_batch)
>   puts outputs.collect(&:values)
> end
> C:\Users\Dominic E Sisneros\source\repos\ruby\try_arrow>ruby gandiva_test2.rb
> Traceback (most recent call last):
> 2: from gandiva_test2.rb:1:in `'
> 1: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in 
> `require'
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in 
> `require': cannot load such file -- gandiva (LoadError)
> 9: from gandiva_test2.rb:1:in `'
> 8: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:156:in 
> `require'
> 7: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in 
> `rescue in require'
> 6: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in 
> `require'
> 5: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:24:in
>  `'
> 4: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:28:in
>  `'
> 3: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva/loader.rb:22:in
>  `load'
> 2: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:25:in
>  `load'
> 1: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
>  `load'
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
>  `require': Typelib file for namespace 'Gandiva' (any version) not found 
> (GObjectIntrospection::RepositoryError::TypelibNotFound)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8923) [C++] Improve usability of arrow::compute::CallFunction by moving ExecContext* argument to end and adding default

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8923.
-
Resolution: Fixed

Issue resolved by pull request 7259
[https://github.com/apache/arrow/pull/7259]

> [C++] Improve usability of arrow::compute::CallFunction by moving 
> ExecContext* argument to end and adding default
> -
>
> Key: ARROW-8923
> URL: https://issues.apache.org/jira/browse/ARROW-8923
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-7843) [Ruby] MSYS2 packages needed for Gandiva

2020-05-24 Thread Dominic Sisneros (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-7843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115434#comment-17115434
 ] 

Dominic Sisneros commented on ARROW-7843:
-

Any update on Gandiva support?  I want to try to create a sequel like dataset 
api for arrow and would like to use gandiva to compile the expressions

> [Ruby] MSYS2 packages needed for Gandiva
> 
>
> Key: ARROW-7843
> URL: https://issues.apache.org/jira/browse/ARROW-7843
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Affects Versions: 0.16.0
> Environment: windows with rubyinstaller
>Reporter: Dominic Sisneros
>Assignee: Kouhei Sutou
>Priority: Major
>
> {noformat}
> require "gandiva"
> table = Arrow::Table.new(:field1 => Arrow::Int32Array.new([1, 2, 3, 4]),
>  :field2 => Arrow::Int32Array.new([11, 13, 15, 17]))
> schema = table.schema
> expression1 = schema.build_expression do |record|
>   record.field1 + record.field2
> end
> expression2 = schema.build_expression do |record, context|
>   context.if(record.field1 > record.field2)
> .then(record.field1 / record.field2)
> .else(record.field1)
> end
> projector = Gandiva::Projector.new(schema, [expression1, expression2])
> table.each_record_batch do |record_batch|
>   outputs = projector.evaluate(record_batch)
>   puts outputs.collect(&:values)
> end
> C:\Users\Dominic E Sisneros\source\repos\ruby\try_arrow>ruby gandiva_test2.rb
> Traceback (most recent call last):
> 2: from gandiva_test2.rb:1:in `'
> 1: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in 
> `require'
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:92:in 
> `require': cannot load such file -- gandiva (LoadError)
> 9: from gandiva_test2.rb:1:in `'
> 8: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:156:in 
> `require'
> 7: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in 
> `rescue in require'
> 6: from 
> c:/Ruby27-x64/lib/ruby/2.7.0/rubygems/core_ext/kernel_require.rb:168:in 
> `require'
> 5: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:24:in
>  `'
> 4: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva.rb:28:in
>  `'
> 3: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/red-gandiva-0.16.0/lib/gandiva/loader.rb:22:in
>  `load'
> 2: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:25:in
>  `load'
> 1: from 
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
>  `load'
> c:/Ruby27-x64/lib/ruby/gems/2.7.0/gems/gobject-introspection-3.4.1/lib/gobject-introspection/loader.rb:37:in
>  `require': Typelib file for namespace 'Gandiva' (any version) not found 
> (GObjectIntrospection::RepositoryError::TypelibNotFound)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8925) [Rust] [DataFusion] CsvExec::schema() returns incorrect results

2020-05-24 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated ARROW-8925:
--
Description: CsvExec::schema() returns the underlying CSV schema and not 
the projected schema. Also, the documentation for the CsvExec schema field is 
incorrect.  (was: CsvExec::schema() returns then underlying CSV schema and not 
the projected schema. Also, the documentation for the CsvExec schema field is 
incorrect.)

> [Rust] [DataFusion] CsvExec::schema() returns incorrect results
> ---
>
> Key: ARROW-8925
> URL: https://issues.apache.org/jira/browse/ARROW-8925
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Rust, Rust - DataFusion
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.0.0
>
>
> CsvExec::schema() returns the underlying CSV schema and not the projected 
> schema. Also, the documentation for the CsvExec schema field is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8925) [Rust] [DataFusion] CsvExec::schema() returns incorrect results

2020-05-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-8925:
-

 Summary: [Rust] [DataFusion] CsvExec::schema() returns incorrect 
results
 Key: ARROW-8925
 URL: https://issues.apache.org/jira/browse/ARROW-8925
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust, Rust - DataFusion
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


CsvExec::schema() returns then underlying CSV schema and not the projected 
schema. Also, the documentation for the CsvExec schema field is incorrect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8924) [C++][Gandiva] castDATE_date32() may cause overflow

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8924:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] castDATE_date32() may cause overflow
> ---
>
> Key: ARROW-8924
> URL: https://issues.apache.org/jira/browse/ARROW-8924
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following code in `cpp/src/gandiva/precompiled/time.cc` may cause 
> overflow since `int32` * `int32` is `int32`, then it is converted to `int64`. 
> The result of `int32` may loose the part of the result of the multiplication
>  
> {code:java}
> gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8923) [C++] Improve usability of arrow::compute::CallFunction by moving ExecContext* argument to end and adding default

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8923:
--
Labels: pull-request-available  (was: )

> [C++] Improve usability of arrow::compute::CallFunction by moving 
> ExecContext* argument to end and adding default
> -
>
> Key: ARROW-8923
> URL: https://issues.apache.org/jira/browse/ARROW-8923
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8924) [C++][Gandiva] castDATE_date32() may cause overflow

2020-05-24 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated ARROW-8924:

Description: 
The following code in `cpp/src/gandiva/precompiled/time.cc` may cause overflow 
since `int32` * `int32` is `int32`, then it is converted to `int64`. The result 
of `int32` may loose the part of the result of the multiplication

 
{code:java}
gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
{code}

  was:
The following code in `cpp/src/gandiva/precompiled/time.cc` may cause overflow 
since `int32` * `int32` is `int32`, then it is converted to `int64`.

 
{code:java}
gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
{code}


> [C++][Gandiva] castDATE_date32() may cause overflow
> ---
>
> Key: ARROW-8924
> URL: https://issues.apache.org/jira/browse/ARROW-8924
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> The following code in `cpp/src/gandiva/precompiled/time.cc` may cause 
> overflow since `int32` * `int32` is `int32`, then it is converted to `int64`. 
> The result of `int32` may loose the part of the result of the multiplication
>  
> {code:java}
> gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8914) [C++][Gandiva] Decimal128 related test failed on big-endian platforms

2020-05-24 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki reassigned ARROW-8914:
---

Assignee: Kazuaki Ishizaki

> [C++][Gandiva] Decimal128 related test failed on big-endian platforms
> -
>
> Key: ARROW-8914
> URL: https://issues.apache.org/jira/browse/ARROW-8914
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> These test failures in gandiva tests occur on big-endian platforms. An 
> example from https://travis-ci.org/github/apache/arrow/jobs/690006107#L2306
> {code}
> ...
> [==] 17 tests from 1 test case ran. (2334 ms total)
> [  PASSED  ] 7 tests.
> [  FAILED  ] 10 tests, listed below:
> [  FAILED  ] TestDecimal.TestSimple
> [  FAILED  ] TestDecimal.TestLiteral
> [  FAILED  ] TestDecimal.TestCompare
> [  FAILED  ] TestDecimal.TestRoundFunctions
> [  FAILED  ] TestDecimal.TestCastFunctions
> [  FAILED  ] TestDecimal.TestIsDistinct
> [  FAILED  ] TestDecimal.TestCastVarCharDecimal
> [  FAILED  ] TestDecimal.TestCastDecimalVarChar
> [  FAILED  ] TestDecimal.TestVarCharDecimalNestedCast
> [  FAILED  ] TestDecimal.TestCastDecimalOverflow
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8924) [C++][Gandiva] castDATE_date32() may cause overflow

2020-05-24 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-8924:
---

 Summary: [C++][Gandiva] castDATE_date32() may cause overflow
 Key: ARROW-8924
 URL: https://issues.apache.org/jira/browse/ARROW-8924
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Gandiva
Reporter: Kazuaki Ishizaki


The following code in `cpp/src/gandiva/precompiled/time.cc` may cause overflow 
since `int32` * `int32` is `int32`, then it is converted to `int64`.

 
{code:java}
gdv_date64 castDATE_date32(gdv_date32 days) { return days * MILLIS_IN_DAY; } 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8918) [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching to appropriate type-specific CastFunction

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8918:
--
Labels: pull-request-available  (was: )

> [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching 
> to appropriate type-specific CastFunction
> --
>
> Key: ARROW-8918
> URL: https://issues.apache.org/jira/browse/ARROW-8918
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> By setting the output type in {{CastOptions}}, we can write
> {code}
> call_function("cast", [arg], cast_options)
> {code}
> This simplifies use of casting for binding developers. This mimics the 
> standard SQL
> {code}
> CAST(expr AS target_type)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8923) [C++] Improve usability of arrow::compute::CallFunction by moving ExecContext* argument to end and adding default

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8923:
---

Assignee: Wes McKinney

> [C++] Improve usability of arrow::compute::CallFunction by moving 
> ExecContext* argument to end and adding default
> -
>
> Key: ARROW-8923
> URL: https://issues.apache.org/jira/browse/ARROW-8923
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8923) [C++] Improve usability of arrow::compute::CallFunction by moving ExecContext* argument to end and adding default

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8923:
---

 Summary: [C++] Improve usability of arrow::compute::CallFunction 
by moving ExecContext* argument to end and adding default
 Key: ARROW-8923
 URL: https://issues.apache.org/jira/browse/ARROW-8923
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-8918) [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching to appropriate type-specific CastFunction

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-8918:
---

Assignee: Wes McKinney

> [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching 
> to appropriate type-specific CastFunction
> --
>
> Key: ARROW-8918
> URL: https://issues.apache.org/jira/browse/ARROW-8918
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> By setting the output type in {{CastOptions}}, we can write
> {code}
> call_function("cast", [arg], cast_options)
> {code}
> This simplifies use of casting for binding developers. This mimics the 
> standard SQL
> {code}
> CAST(expr AS target_type)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8922) [C++] Implement example string scalar kernel function to assist with string kernels buildout per ARROW-555

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8922:
---

 Summary: [C++] Implement example string scalar kernel function to 
assist with string kernels buildout per ARROW-555
 Key: ARROW-8922
 URL: https://issues.apache.org/jira/browse/ARROW-8922
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


I will write a patch to provide an example of creating a string-input 
string-output kernel for executing scalar-valued string functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8921) [C++] Add "TypeResolver" class interface to replace current OutputType::Resolver pattern

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8921:
---

 Summary: [C++] Add "TypeResolver" class interface to replace 
current OutputType::Resolver pattern
 Key: ARROW-8921
 URL: https://issues.apache.org/jira/browse/ARROW-8921
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Like the {{TypeMatcher}} for extensible input type checking, TypeResolver will 
allow more flexibility with respect to the output type resolution rule. 
Currently the resolver function is defined as

{code}
using Resolver =
  std::function(KernelContext*, const 
std::vector&)>;
{code}

By changing to a {{TypeResolver}} interface with a virtual Resolve function, we 
also can provide for better human-readability when printing kernel signatures 
(by having {{TypeResolver::ToString}}) and permitting TypeResolvers to be 
compared



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8915) [Dev][Archery] Require Click 7

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8915.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

Issue resolved by pull request 7257
[https://github.com/apache/arrow/pull/7257]

> [Dev][Archery] Require Click 7
> --
>
> Key: ARROW-8915
> URL: https://issues.apache.org/jira/browse/ARROW-8915
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (ARROW-8920) [CI] ARM Travis CI build is failing with archery "case_sensitive" error

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-8920.
---
Fix Version/s: (was: 1.0.0)
   Resolution: Duplicate

dup of ARROW-8915

> [CI] ARM Travis CI build is failing with archery "case_sensitive" error
> ---
>
> Key: ARROW-8920
> URL: https://issues.apache.org/jira/browse/ARROW-8920
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: CI
>Reporter: Wes McKinney
>Priority: Major
>
> See https://travis-ci.org/github/apache/arrow/jobs/690602409
> {code}
> Traceback (most recent call last):
>   File "/home/travis/.local/bin/archery", line 11, in 
> load_entry_point('archery', 'console_scripts', 'archery')()
>   File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", 
> line 490, in load_entry_point
> return get_distribution(dist).load_entry_point(group, name)
>   File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", 
> line 2853, in load_entry_point
> return ep.load()
>   File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", 
> line 2453, in load
> return self.resolve()
>   File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", 
> line 2459, in resolve
> module = __import__(self.module_name, fromlist=['__name__'], level=0)
>   File "/home/travis/build/apache/arrow/dev/archery/archery/cli.py", line 
> 100, in 
> case_sensitive=False)
> TypeError: __init__() got an unexpected keyword argument 'case_sensitive'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-3134) [C++] Implement n-ary iterator for a collection of chunked arrays with possibly different chunking layouts

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3134.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

This was done in 
https://github.com/apache/arrow/commit/7ad49eeca5215d9b2a56b6439f1bd6ea3ea9

> [C++] Implement n-ary iterator for a collection of chunked arrays with 
> possibly different chunking layouts
> --
>
> Key: ARROW-3134
> URL: https://issues.apache.org/jira/browse/ARROW-3134
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: dataframe
> Fix For: 1.0.0
>
>
> This is a common pattern that will result in kernel invocation on chunked 
> arrays



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-4022) [C++] Promote Datum variant out of compute namespace

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4022.
-
Fix Version/s: 1.0.0
   Resolution: Fixed

This was done in 
https://github.com/apache/arrow/commit/7ad49eeca5215d9b2a56b6439f1bd6ea3ea9

> [C++] Promote Datum variant out of compute namespace
> 
>
> Key: ARROW-4022
> URL: https://issues.apache.org/jira/browse/ARROW-4022
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In working on ARROW-3762, I've found it's useful to be able to have functions 
> return either {{Array}} or {{ChunkedArray}}. We might consider promoting the 
> {{arrow::compute::Datum}} variant out of {{arrow/compute/kernel.h}} so it can 
> be used in other places where it's helpful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4022) [C++] Promote Datum variant out of compute namespace

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-4022:

Summary: [C++] Promote Datum variant out of compute namespace  (was: [C++] 
RFC: promote Datum variant out of compute namespace)

> [C++] Promote Datum variant out of compute namespace
> 
>
> Key: ARROW-4022
> URL: https://issues.apache.org/jira/browse/ARROW-4022
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>
> In working on ARROW-3762, I've found it's useful to be able to have functions 
> return either {{Array}} or {{ChunkedArray}}. We might consider promoting the 
> {{arrow::compute::Datum}} variant out of {{arrow/compute/kernel.h}} so it can 
> be used in other places where it's helpful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8792) [C++] Improved declarative compute function / kernel development framework, normalize calling conventions

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-8792.
-
Resolution: Fixed

Issue resolved by pull request 7240
[https://github.com/apache/arrow/pull/7240]

> [C++] Improved declarative compute function / kernel development framework, 
> normalize calling conventions
> -
>
> Key: ARROW-8792
> URL: https://issues.apache.org/jira/browse/ARROW-8792
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> I'm working on a significant revamp of the way that kernels are implemented 
> in the project as discussed on the mailing list. PR to follow within the next 
> week or sooner
> A brief list of features:
> * Kernel selection that takes into account the shape of inputs (whether 
> Scalar or Array, so you can provide an implementation just for Arrays and a 
> separate one just for Scalars if you want)
> * More customizable / less monolithic type-to-kernel dispatch
> * Standardized C++ function signature for kernel implementations (rather than 
> every one being a little bit special)
> * Multiple implementations of the same function can coexist (e.g. with / 
> without SIMD optimizations) so that you can choose the one you want at runtime
> * Browsable function registry (see all available kernels and their input type 
> signatures)
> * Central code path for type-checking and argument validation
> * Central code path for kernel execution on ChunkedArray inputs
> There's a lot of JIRAs in the backlog that will follow from this work so I 
> will attach those to this issue for visibility but this issue will cover the 
> initial refactoring work to port the existing code to the new framework 
> without altering existing features.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8920) [CI] ARM Travis CI build is failing with archery "case_sensitive" error

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8920:
---

 Summary: [CI] ARM Travis CI build is failing with archery 
"case_sensitive" error
 Key: ARROW-8920
 URL: https://issues.apache.org/jira/browse/ARROW-8920
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI
Reporter: Wes McKinney
 Fix For: 1.0.0


See https://travis-ci.org/github/apache/arrow/jobs/690602409

{code}
Traceback (most recent call last):
  File "/home/travis/.local/bin/archery", line 11, in 
load_entry_point('archery', 'console_scripts', 'archery')()
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 
490, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 
2853, in load_entry_point
return ep.load()
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 
2453, in load
return self.resolve()
  File "/usr/local/lib/python3.6/dist-packages/pkg_resources/__init__.py", line 
2459, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/travis/build/apache/arrow/dev/archery/archery/cli.py", line 100, 
in 
case_sensitive=False)
TypeError: __init__() got an unexpected keyword argument 'case_sensitive'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8909) [Java] Out of order writes using setSafe

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8909:

Summary: [Java] Out of order writes using setSafe  (was: Out of order 
writes using setSafe)

> [Java] Out of order writes using setSafe
> 
>
> Key: ARROW-8909
> URL: https://issues.apache.org/jira/browse/ARROW-8909
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Saurabh
>Priority: Major
>
> I noticed that calling setSafe on a VarCharVector with indices not in 
> increasing order causes the lastIndex to be set to the index in the last call 
> to setSafe.
> Is this a documented and expected behavior ?
> Sample code:
> {code:java}
> import java.util.Collections;
> import lombok.extern.slf4j.Slf4j;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> import org.apache.arrow.vector.util.Text;
> @Slf4j
> public class ATest {
>   public static void main() {
> Schema schema = new 
> Schema(Collections.singletonList(Field.nullable("Data", new 
> ArrowType.Utf8(;
> try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
> RootAllocator())) {
>   VarCharVector vec = (VarCharVector) vroot.getVector("Data");
>   for (int i = 0; i < 10; i++) {
> vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
>   }
>   // vec.setSafe(0, new Text(Integer.toString(0) + "_new"));
>   vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
>   vroot.setRowCount(10);
>   log.info(vroot.contentToTSVString());
> }
>   }
> }
> {code}
>  
> If I don't set the 0 or 7 after the loop, I get all the 0_mtest, 1_mtest, 
> ..., 9_mtest entries.
> If I set index 0 after the loop, I only see 0_new entry; other entries are ""
> If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 7_new; other 
> entries are ""
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8911) [Python] An empty ChunkedArray created by `filter` can crash.

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8911:

Summary: [Python] An empty ChunkedArray created by `filter` can crash.  
(was: An empty ChunkedArray created by `filter` can crash.)

> [Python] An empty ChunkedArray created by `filter` can crash.
> -
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Priority: Critical
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8911) [Python] An empty ChunkedArray created by `filter` can crash.

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8911:

Fix Version/s: 1.0.0

> [Python] An empty ChunkedArray created by `filter` can crash.
> -
>
> Key: ARROW-8911
> URL: https://issues.apache.org/jira/browse/ARROW-8911
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.17.1
> Environment: macOS, ubuntu
>Reporter: A. Coady
>Priority: Critical
> Fix For: 1.0.0
>
>
> {code:python}
> import pyarrow as pa
> arr = pa.chunked_array([[1]])
> empty = arr.filter(pa.array([False]))
> print(empty)
> print(empty[:]) # <- crash
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8919) [C++] Add "DispatchBest" APIs to compute::Function that selects a kernel that may require implicit casts to invoke

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8919:
---

 Summary: [C++] Add "DispatchBest" APIs to compute::Function that 
selects a kernel that may require implicit casts to invoke
 Key: ARROW-8919
 URL: https://issues.apache.org/jira/browse/ARROW-8919
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Currently we have "DispatchExact" which requires an exact match of input types. 
"DispatchBest" would permit kernel selection with implicit casts required. 
Since multiple kernels may be valid when allowing implicit casts, we will need 
to break ties by estimating the "cost" of the implicit casts. For example, 
casting int8 to int32 is "less expensive" than implicitly casting to int64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8918) [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching to appropriate type-specific CastFunction

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8918:

Description: 
By setting the output type in {{CastOptions}}, we can write

{code}
call_function("cast", [arg], cast_options)
{code}

This simplifies use of casting for binding developers. This mimics the standard 
SQL

{code}
CAST(expr AS target_type)
{code}

  was:
By setting the output type in {{CastOptions}}, we can write

{code}
call_function("cast", [arg], cast_options)
{code}

This simplifies use of casting for binding developers


> [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching 
> to appropriate type-specific CastFunction
> --
>
> Key: ARROW-8918
> URL: https://issues.apache.org/jira/browse/ARROW-8918
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> By setting the output type in {{CastOptions}}, we can write
> {code}
> call_function("cast", [arg], cast_options)
> {code}
> This simplifies use of casting for binding developers. This mimics the 
> standard SQL
> {code}
> CAST(expr AS target_type)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8918) [C++] Add cast "metafunction" to FunctionRegistry that addresses dispatching to appropriate type-specific CastFunction

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8918:
---

 Summary: [C++] Add cast "metafunction" to FunctionRegistry that 
addresses dispatching to appropriate type-specific CastFunction
 Key: ARROW-8918
 URL: https://issues.apache.org/jira/browse/ARROW-8918
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


By setting the output type in {{CastOptions}}, we can write

{code}
call_function("cast", [arg], cast_options)
{code}

This simplifies use of casting for binding developers



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8917) [C++] Add compute::Function subclass for invoking certain kernels on RecordBatch/Table-valued inputs

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8917:
---

 Summary: [C++] Add compute::Function subclass for invoking certain 
kernels on RecordBatch/Table-valued inputs
 Key: ARROW-8917
 URL: https://issues.apache.org/jira/browse/ARROW-8917
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This will enable bindings to invoke such functions (like take, filter) like

{code}
call_function('take', [table, indices])
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8916) [Python] Add relevant glue for implementing each kind of FunctionOptions

2020-05-24 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8916:
---

 Summary: [Python] Add relevant glue for implementing each kind of 
FunctionOptions
 Key: ARROW-8916
 URL: https://issues.apache.org/jira/browse/ARROW-8916
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-5760) [C++] Optimize Take and Filter

2020-05-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115287#comment-17115287
 ] 

Wes McKinney edited comment on ARROW-5760 at 5/24/20, 12:36 PM:


Another problem I noticed with the current implementation of Take and Filter: 
different x86 is generated for applying these operations on arrays with the 
same underlying C type. For example, instructions for moving 8-byte-wide values 
are being generated for Int64Type, UInt64Type, Date64Type, Time64Type, and 
TimestampType, when only one  underlying "data movement function" is needed. As 
part of improving the performance of Take and Filter we should also ensure that 
we eliminate this unneeded binary bloat in the shared library


was (Author: wesmckinn):
Another problem I noticed with the current implementation of Take and Filter: 
different x86 is generated for applying these operations on arrays with the 
same underlying C type. For example, instructions for moving {{int64_t}} values 
are being generated for Int64Type, Date64Type, Time64Type, and TimestampType, 
when only one  underlying "data movement function" is needed. As part of 
improving the performance of Take and Filter we should also ensure that we 
eliminate this unneeded binary bloat in the shared library

> [C++] Optimize Take and Filter
> --
>
> Key: ARROW-5760
> URL: https://issues.apache.org/jira/browse/ARROW-5760
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 2.0.0
>
>
> There is some question of whether these kernels allocate optimally- for 
> example when Filtering or Taking strings it might be more efficient to pass 
> over the filter/indices twice, first to determine how much character storage 
> will be needed then again into allocated memory: 
> https://github.com/apache/arrow/pull/4531#discussion_r297160457
> Additionally, these kernels could probably make good use of scatter/gather 
> SIMD instructions.
> Furthermore, Filter's bitmap is currently lazily expanded into the indices of 
> elements to be appended to the output array. It would probably be more 
> efficient to expand to indices in batches, then gather using an index batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5760) [C++] Optimize Take and Filter

2020-05-24 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17115287#comment-17115287
 ] 

Wes McKinney commented on ARROW-5760:
-

Another problem I noticed with the current implementation of Take and Filter: 
different x86 is generated for applying these operations on arrays with the 
same underlying C type. For example, instructions for moving {{int64_t}} values 
are being generated for Int64Type, Date64Type, Time64Type, and TimestampType, 
when only one  underlying "data movement function" is needed. As part of 
improving the performance of Take and Filter we should also ensure that we 
eliminate this unneeded binary bloat in the shared library

> [C++] Optimize Take and Filter
> --
>
> Key: ARROW-5760
> URL: https://issues.apache.org/jira/browse/ARROW-5760
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
> Fix For: 2.0.0
>
>
> There is some question of whether these kernels allocate optimally- for 
> example when Filtering or Taking strings it might be more efficient to pass 
> over the filter/indices twice, first to determine how much character storage 
> will be needed then again into allocated memory: 
> https://github.com/apache/arrow/pull/4531#discussion_r297160457
> Additionally, these kernels could probably make good use of scatter/gather 
> SIMD instructions.
> Furthermore, Filter's bitmap is currently lazily expanded into the indices of 
> elements to be appended to the output array. It would probably be more 
> efficient to expand to indices in batches, then gather using an index batch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8900) Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as parameters

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8900:

Component/s: C++

> Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as 
> parameters
> --
>
> Key: ARROW-8900
> URL: https://issues.apache.org/jira/browse/ARROW-8900
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.17.0
>Reporter: Daniel Nugent
>Priority: Minor
>
> HTTP_PROXY and HTTPS_PROXY are not automatically respected by the 
> Aws::Client::ClientConfiguration (see: 
> https://github.com/aws/aws-sdk-cpp/issues/1049)
> Either Arrow should respect them or make them available as parameters when 
> connecting to S3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8900) [C++] Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as parameters

2020-05-24 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8900:

Summary: [C++] Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy 
options as parameters  (was: Respect HTTP(S)_PROXY for S3 Filesystems and/or 
expose proxy options as parameters)

> [C++] Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as 
> parameters
> 
>
> Key: ARROW-8900
> URL: https://issues.apache.org/jira/browse/ARROW-8900
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.17.0
>Reporter: Daniel Nugent
>Priority: Minor
>
> HTTP_PROXY and HTTPS_PROXY are not automatically respected by the 
> Aws::Client::ClientConfiguration (see: 
> https://github.com/aws/aws-sdk-cpp/issues/1049)
> Either Arrow should respect them or make them available as parameters when 
> connecting to S3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8915) [Dev][Archery] Require Click 7

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8915:
--
Labels: pull-request-available  (was: )

> [Dev][Archery] Require Click 7
> --
>
> Key: ARROW-8915
> URL: https://issues.apache.org/jira/browse/ARROW-8915
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8915) [Dev][Archery] Require Click 7

2020-05-24 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-8915:
---

 Summary: [Dev][Archery] Require Click 7
 Key: ARROW-8915
 URL: https://issues.apache.org/jira/browse/ARROW-8915
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8909) Out of order writes using setSafe

2020-05-24 Thread Saurabh (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saurabh updated ARROW-8909:
---
Priority: Major  (was: Minor)

> Out of order writes using setSafe
> -
>
> Key: ARROW-8909
> URL: https://issues.apache.org/jira/browse/ARROW-8909
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Saurabh
>Priority: Major
>
> I noticed that calling setSafe on a VarCharVector with indices not in 
> increasing order causes the lastIndex to be set to the index in the last call 
> to setSafe.
> Is this a documented and expected behavior ?
> Sample code:
> {code:java}
> import java.util.Collections;
> import lombok.extern.slf4j.Slf4j;
> import org.apache.arrow.memory.RootAllocator;
> import org.apache.arrow.vector.VarCharVector;
> import org.apache.arrow.vector.VectorSchemaRoot;
> import org.apache.arrow.vector.types.pojo.ArrowType;
> import org.apache.arrow.vector.types.pojo.Field;
> import org.apache.arrow.vector.types.pojo.Schema;
> import org.apache.arrow.vector.util.Text;
> @Slf4j
> public class ATest {
>   public static void main() {
> Schema schema = new 
> Schema(Collections.singletonList(Field.nullable("Data", new 
> ArrowType.Utf8(;
> try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new 
> RootAllocator())) {
>   VarCharVector vec = (VarCharVector) vroot.getVector("Data");
>   for (int i = 0; i < 10; i++) {
> vec.setSafe(i, new Text(Integer.toString(i) + "_mtest"));
>   }
>   // vec.setSafe(0, new Text(Integer.toString(0) + "_new"));
>   vec.setSafe(7, new Text(Integer.toString(7) + "_new"));
>   vroot.setRowCount(10);
>   log.info(vroot.contentToTSVString());
> }
>   }
> }
> {code}
>  
> If I don't set the 0 or 7 after the loop, I get all the 0_mtest, 1_mtest, 
> ..., 9_mtest entries.
> If I set index 0 after the loop, I only see 0_new entry; other entries are ""
> If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 7_new; other 
> entries are ""
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-8914) [C++][Gandiva] Decimal128 related test failed on big-endian platforms

2020-05-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-8914:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Decimal128 related test failed on big-endian platforms
> -
>
> Key: ARROW-8914
> URL: https://issues.apache.org/jira/browse/ARROW-8914
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> These test failures in gandiva tests occur on big-endian platforms. An 
> example from https://travis-ci.org/github/apache/arrow/jobs/690006107#L2306
> {code}
> ...
> [==] 17 tests from 1 test case ran. (2334 ms total)
> [  PASSED  ] 7 tests.
> [  FAILED  ] 10 tests, listed below:
> [  FAILED  ] TestDecimal.TestSimple
> [  FAILED  ] TestDecimal.TestLiteral
> [  FAILED  ] TestDecimal.TestCompare
> [  FAILED  ] TestDecimal.TestRoundFunctions
> [  FAILED  ] TestDecimal.TestCastFunctions
> [  FAILED  ] TestDecimal.TestIsDistinct
> [  FAILED  ] TestDecimal.TestCastVarCharDecimal
> [  FAILED  ] TestDecimal.TestCastDecimalVarChar
> [  FAILED  ] TestDecimal.TestVarCharDecimalNestedCast
> [  FAILED  ] TestDecimal.TestCastDecimalOverflow
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)