[jira] [Commented] (ARROW-5507) [Plasma] [CUDA] Compile error

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855531#comment-16855531
 ] 

Antoine Pitrou commented on ARROW-5507:
---

Probably introduced in ARROW-5365.

> [Plasma] [CUDA] Compile error
> -
>
> Key: ARROW-5507
> URL: https://issues.apache.org/jira/browse/ARROW-5507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, GPU
>Reporter: Antoine Pitrou
>Priority: Critical
>
> I'm starting getting this today:
> {code}
> ../src/plasma/protocol.cc:546:55: error: no matching member function for call 
> to 'CreateVector'
>   handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
>   ^~~~
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
>  note: candidate function not viable: no known conversion from 
> 'std::shared_ptr' to 'const std::vector' for 1st argument
>   Offset> CreateVector(const std::vector ) {
>   ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
>  note: candidate template ignored: could not match 'vector' against 
> 'shared_ptr'
>   template Offset> CreateVector(const std::vector 
> ) {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(const T *v, size_t len) 
> {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   Offset>> CreateVector(const Offset *v, size_t len) {
> ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(size_t vector_size,
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
>  note: candidate function template not viable: requires 3 arguments, but 1 
> was provided
>   Offset> CreateVector(size_t vector_size, F f, S *state) {
> ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5485) [Gandiva][Crossbow] OSx builds failing

2019-06-04 Thread Praveen Kumar Desabandu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855631#comment-16855631
 ] 

Praveen Kumar Desabandu commented on ARROW-5485:


[~wesmckinn] I think this started when we switched to using shared gtest 
library.

We get the following error : (happens both locally and in travis when building 
gtest from source)

dyld: Library not loaded: libgtest_main.dylib
 Referenced from: 
/Users/travis/build/[secure]/arrow-build/arrow/cpp/build/./release/gandiva-decimal_test
 Reason: image not found
dev/tasks/gandiva-jars/build-cpp-osx.sh: line 45:  5626 Abort trap: 6

All tests failed with this error.

Maybe the rpath for the source code build library is not being set correctly? I 
ran it with the cmake rpath flag to on but it did not help.

I am planning to turn tests off in OsX Crossbow if this is not a quick fix.

> [Gandiva][Crossbow] OSx builds failing
> --
>
> Key: ARROW-5485
> URL: https://issues.apache.org/jira/browse/ARROW-5485
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.14.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.14.0
>
>
> OSX builds are failing for the last 3 days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5334) [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for consistency

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5334:
--
Labels: pull-request-available  (was: )

> [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for 
> consistency
> -
>
> Key: ARROW-5334
> URL: https://issues.apache.org/jira/browse/ARROW-5334
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> These intermediate classes used for template metaprogramming (in particular, 
> {{std::is_base_of}}) have inconsistent names with the rest of data types. For 
> clarity, I think we should add "Type" to these class names and others like 
> them
> Please do after ARROW-3144



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5236) [Python] hdfs.connect() is trying to load libjvm in windows

2019-06-04 Thread Urmila (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855674#comment-16855674
 ] 

Urmila edited comment on ARROW-5236 at 6/4/19 1:06 PM:
---

Hi, I am also facing same issue. I have conda and spark installed on my local 
machine and trying to connect HDFS as mentioned below

import pyarrow as pa
fs = pa.hdfs.connect('hostname.xx.xx.com', port_number, user='a...@xyx.com', 
kerb_ticket='local machine path')
Traceback (most recent call last):
File "", line 1, in 
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
183, in connect
extra_conf=extra_conf)
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
37, in init
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm


was (Author: urmilarv):
Hi, I am also facing same issue, but could not find issue fix details anya ny 
of JIRA ARROW-5236 OR 4215.
Please help. I have conda and spark installed on my local machine and trying to 
connect HDFS as mentioned below

import pyarrow as pa
fs = pa.hdfs.connect('hostname.xx.xx.com', port_number, user='a...@xyx.com', 
kerb_ticket='local machine path')
Traceback (most recent call last):
File "", line 1, in 
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
183, in connect
extra_conf=extra_conf)
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
37, in init
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm

> [Python] hdfs.connect() is trying to load libjvm in windows
> ---
>
> Key: ARROW-5236
> URL: https://issues.apache.org/jira/browse/ARROW-5236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: Windows 7 Enterprise, pyarrow 0.13.0
>Reporter: Kamaraju
>Priority: Major
>  Labels: hdfs
>
> This issue was originally reported at 
> [https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes 
> McKinney's request.
> Summary:
>  The following script
> {code}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {code}
> tries to load libjvm in windows 7 which is not expected.
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in 
> fs = pa.hdfs.connect()
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 183, in connect
> extra_conf=extra_conf)
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 37, in __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}
> There is no libjvm file in Windows Java installation.
> {noformat}
> $ echo $JAVA_HOME
> C:\Progra~1\Java\jdk1.8.0_141
> $ find $JAVA_HOME -iname '*libjvm*'
> 
> {noformat}
> I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.
> Steps to reproduce the issue (with more details):
> Create the environment
> {noformat}
> $ cat scratch_py36_pyarrow.yml
> name: scratch_py36_pyarrow
> channels:
>   - defaults
> dependencies:
>   - python=3.6.8
>   - pyarrow
> {noformat}
> {noformat}
> $ conda env create -f scratch_py36_pyarrow.yml
> {noformat}
> Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do 
> this since the Hadoop installation that comes with MapR <[https://mapr.com/]> 
> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named 
> $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call 
> fails with FileNotFoundError if this patch is not applied.
> {noformat}
> $ cat ~/x/patch.txt
> 131c131
> < hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> ---
> > hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])
> $ patch 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
>  ~/x/patch.txt
> patching file 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
> {noformat}
> Activate the environment
> {noformat}
> $ source activate scratch_py36_pyarrow
> {noformat}
> Sample script
> {noformat}
> $ 

[jira] [Commented] (ARROW-5236) [Python] hdfs.connect() is trying to load libjvm in windows

2019-06-04 Thread Urmila (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855674#comment-16855674
 ] 

Urmila commented on ARROW-5236:
---

Hi, I am also facing same issue, but could not find issue fix details anya ny 
of JIRA ARROW-5236 OR 4215.
Please help. I have conda and spark installed on my local machine and trying to 
connect HDFS as mentioned below

import pyarrow as pa
fs = pa.hdfs.connect('hostname.xx.xx.com', port_number, user='a...@xyx.com', 
kerb_ticket='local machine path')
Traceback (most recent call last):
File "", line 1, in 
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
183, in connect
extra_conf=extra_conf)
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
37, in init
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm

> [Python] hdfs.connect() is trying to load libjvm in windows
> ---
>
> Key: ARROW-5236
> URL: https://issues.apache.org/jira/browse/ARROW-5236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: Windows 7 Enterprise, pyarrow 0.13.0
>Reporter: Kamaraju
>Priority: Major
>  Labels: hdfs
>
> This issue was originally reported at 
> [https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes 
> McKinney's request.
> Summary:
>  The following script
> {code}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {code}
> tries to load libjvm in windows 7 which is not expected.
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in 
> fs = pa.hdfs.connect()
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 183, in connect
> extra_conf=extra_conf)
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 37, in __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}
> There is no libjvm file in Windows Java installation.
> {noformat}
> $ echo $JAVA_HOME
> C:\Progra~1\Java\jdk1.8.0_141
> $ find $JAVA_HOME -iname '*libjvm*'
> 
> {noformat}
> I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.
> Steps to reproduce the issue (with more details):
> Create the environment
> {noformat}
> $ cat scratch_py36_pyarrow.yml
> name: scratch_py36_pyarrow
> channels:
>   - defaults
> dependencies:
>   - python=3.6.8
>   - pyarrow
> {noformat}
> {noformat}
> $ conda env create -f scratch_py36_pyarrow.yml
> {noformat}
> Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do 
> this since the Hadoop installation that comes with MapR <[https://mapr.com/]> 
> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named 
> $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call 
> fails with FileNotFoundError if this patch is not applied.
> {noformat}
> $ cat ~/x/patch.txt
> 131c131
> < hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> ---
> > hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])
> $ patch 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
>  ~/x/patch.txt
> patching file 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
> {noformat}
> Activate the environment
> {noformat}
> $ source activate scratch_py36_pyarrow
> {noformat}
> Sample script
> {noformat}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {noformat}
> Execute the script
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in 
> fs = pa.hdfs.connect()
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 183, in connect
> extra_conf=extra_conf)
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 37, in __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}



--
This message was sent by Atlassian JIRA

[jira] [Assigned] (ARROW-5020) [C++][Gandiva] Split Gandiva-related conda packages for builds into separate .yml conda env file

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5020:
-

Assignee: Antoine Pitrou

> [C++][Gandiva] Split Gandiva-related conda packages for builds into separate 
> .yml conda env file
> 
>
> Key: ARROW-5020
> URL: https://issues.apache.org/jira/browse/ARROW-5020
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> These installs are large and should not be required unconditionally in CI and 
> elsewhere



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855693#comment-16855693
 ] 

Antoine Pitrou commented on ARROW-5491:
---

This is fixed, no?

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5507) [Plasma] [C++] Compile error

2019-06-04 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5507:
-

 Summary: [Plasma] [C++] Compile error
 Key: ARROW-5507
 URL: https://issues.apache.org/jira/browse/ARROW-5507
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++ - Plasma
Reporter: Antoine Pitrou


I'm starting getting this today:
{code}
../src/plasma/protocol.cc:546:55: error: no matching member function for call 
to 'CreateVector'
  handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
  ^~~~
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
 note: candidate function not viable: no known conversion from 
'std::shared_ptr' to 'const std::vector' for 1st argument
  Offset> CreateVector(const std::vector ) {
  ^
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
 note: candidate template ignored: could not match 'vector' against 'shared_ptr'
  template Offset> CreateVector(const std::vector ) {
 ^
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
 note: candidate function template not viable: requires 2 arguments, but 1 was 
provided
  template Offset> CreateVector(const T *v, size_t len) {
 ^
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
 note: candidate function template not viable: requires 2 arguments, but 1 was 
provided
  Offset>> CreateVector(const Offset *v, size_t len) {
^
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
 note: candidate function template not viable: requires 2 arguments, but 1 was 
provided
  template Offset> CreateVector(size_t vector_size,
 ^
/home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
 note: candidate function template not viable: requires 3 arguments, but 1 was 
provided
  Offset> CreateVector(size_t vector_size, F f, S *state) {
^
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5507) [Plasma] [CUDA] Compile error

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5507:
--
Labels: pull-request-available  (was: )

> [Plasma] [CUDA] Compile error
> -
>
> Key: ARROW-5507
> URL: https://issues.apache.org/jira/browse/ARROW-5507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, GPU
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
>
> I'm starting getting this today:
> {code}
> ../src/plasma/protocol.cc:546:55: error: no matching member function for call 
> to 'CreateVector'
>   handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
>   ^~~~
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
>  note: candidate function not viable: no known conversion from 
> 'std::shared_ptr' to 'const std::vector' for 1st argument
>   Offset> CreateVector(const std::vector ) {
>   ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
>  note: candidate template ignored: could not match 'vector' against 
> 'shared_ptr'
>   template Offset> CreateVector(const std::vector 
> ) {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(const T *v, size_t len) 
> {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   Offset>> CreateVector(const Offset *v, size_t len) {
> ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(size_t vector_size,
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
>  note: candidate function template not viable: requires 3 arguments, but 1 
> was provided
>   Offset> CreateVector(size_t vector_size, F f, S *state) {
> ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5507) [Plasma] [CUDA] Compile error

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5507:
--
Component/s: GPU

> [Plasma] [CUDA] Compile error
> -
>
> Key: ARROW-5507
> URL: https://issues.apache.org/jira/browse/ARROW-5507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, GPU
>Reporter: Antoine Pitrou
>Priority: Critical
>
> I'm starting getting this today:
> {code}
> ../src/plasma/protocol.cc:546:55: error: no matching member function for call 
> to 'CreateVector'
>   handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
>   ^~~~
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
>  note: candidate function not viable: no known conversion from 
> 'std::shared_ptr' to 'const std::vector' for 1st argument
>   Offset> CreateVector(const std::vector ) {
>   ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
>  note: candidate template ignored: could not match 'vector' against 
> 'shared_ptr'
>   template Offset> CreateVector(const std::vector 
> ) {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(const T *v, size_t len) 
> {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   Offset>> CreateVector(const Offset *v, size_t len) {
> ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(size_t vector_size,
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
>  note: candidate function template not viable: requires 3 arguments, but 1 
> was provided
>   Offset> CreateVector(size_t vector_size, F f, S *state) {
> ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5334) [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for consistency

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5334:
-

Assignee: Antoine Pitrou  (was: Wes McKinney)

> [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for 
> consistency
> -
>
> Key: ARROW-5334
> URL: https://issues.apache.org/jira/browse/ARROW-5334
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> These intermediate classes used for template metaprogramming (in particular, 
> {{std::is_base_of}}) have inconsistent names with the rest of data types. For 
> clarity, I think we should add "Type" to these class names and others like 
> them
> Please do after ARROW-3144



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5236) [Python] hdfs.connect() is trying to load libjvm in windows

2019-06-04 Thread Urmila (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855674#comment-16855674
 ] 

Urmila edited comment on ARROW-5236 at 6/4/19 1:07 PM:
---

Hi, I am also facing same issue. I have conda and spark installed on my local 
windows machine and trying to connect HDFS (unix) as mentioned below

import pyarrow as pa
fs = pa.hdfs.connect('hostname.xx.xx.com', port_number, user='a...@xyx.com', 
kerb_ticket='local machine path')
Traceback (most recent call last):
File "", line 1, in 
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
183, in connect
extra_conf=extra_conf)
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
37, in init
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm


was (Author: urmilarv):
Hi, I am also facing same issue. I have conda and spark installed on my local 
machine and trying to connect HDFS as mentioned below

import pyarrow as pa
fs = pa.hdfs.connect('hostname.xx.xx.com', port_number, user='a...@xyx.com', 
kerb_ticket='local machine path')
Traceback (most recent call last):
File "", line 1, in 
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
183, in connect
extra_conf=extra_conf)
File "C:\Users\vishurm\opt\miniconda3\lib\site-packages\pyarrow\hdfs.py", line
37, in init
self._connect(host, port, user, kerb_ticket, driver, extra_conf)
File "pyarrow\io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect
File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: Unable to load libjvm

> [Python] hdfs.connect() is trying to load libjvm in windows
> ---
>
> Key: ARROW-5236
> URL: https://issues.apache.org/jira/browse/ARROW-5236
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: Windows 7 Enterprise, pyarrow 0.13.0
>Reporter: Kamaraju
>Priority: Major
>  Labels: hdfs
>
> This issue was originally reported at 
> [https://github.com/apache/arrow/issues/4215] . Raising a Jira as per Wes 
> McKinney's request.
> Summary:
>  The following script
> {code}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {code}
> tries to load libjvm in windows 7 which is not expected.
> {noformat}
> $ python ./expt2.py
> Traceback (most recent call last):
>   File "./expt2.py", line 3, in 
> fs = pa.hdfs.connect()
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 183, in connect
> extra_conf=extra_conf)
>   File 
> "C:\ProgramData\Continuum\Anaconda\envs\scratch_py36_pyarrow\lib\site-packages\pyarrow\hdfs.py",
>  line 37, in __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow\io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow\error.pxi", line 83, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Unable to load libjvm
> {noformat}
> There is no libjvm file in Windows Java installation.
> {noformat}
> $ echo $JAVA_HOME
> C:\Progra~1\Java\jdk1.8.0_141
> $ find $JAVA_HOME -iname '*libjvm*'
> 
> {noformat}
> I see the libjvm error with both 0.11.1 and 0.13.0 versions of pyarrow.
> Steps to reproduce the issue (with more details):
> Create the environment
> {noformat}
> $ cat scratch_py36_pyarrow.yml
> name: scratch_py36_pyarrow
> channels:
>   - defaults
> dependencies:
>   - python=3.6.8
>   - pyarrow
> {noformat}
> {noformat}
> $ conda env create -f scratch_py36_pyarrow.yml
> {noformat}
> Apply the following patch to lib/site-packages/pyarrow/hdfs.py . I had to do 
> this since the Hadoop installation that comes with MapR <[https://mapr.com/]> 
> windows client only has $HADOOP_HOME/bin/hadoop.cmd . There is no file named 
> $HADOOP_HOME/bin/hadoop and so the subsequent subprocess.check_output call 
> fails with FileNotFoundError if this patch is not applied.
> {noformat}
> $ cat ~/x/patch.txt
> 131c131
> < hadoop_bin = '{0}/bin/hadoop'.format(os.environ['HADOOP_HOME'])
> ---
> > hadoop_bin = '{0}/bin/hadoop.cmd'.format(os.environ['HADOOP_HOME'])
> $ patch 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
>  ~/x/patch.txt
> patching file 
> /c/ProgramData/Continuum/Anaconda/envs/scratch_py36_pyarrow/lib/site-packages/pyarrow/hdfs.py
> {noformat}
> Activate the environment
> {noformat}
> $ source activate scratch_py36_pyarrow
> {noformat}
> Sample script
> {noformat}
> $ cat expt2.py
> import pyarrow as pa
> fs = pa.hdfs.connect()
> {noformat}

[jira] [Commented] (ARROW-5334) [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for consistency

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855679#comment-16855679
 ] 

Antoine Pitrou commented on ARROW-5334:
---

Applies to {{Number}} as well.

> [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for 
> consistency
> -
>
> Key: ARROW-5334
> URL: https://issues.apache.org/jira/browse/ARROW-5334
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> These intermediate classes used for template metaprogramming (in particular, 
> {{std::is_base_of}}) have inconsistent names with the rest of data types. For 
> clarity, I think we should add "Type" to these class names and others like 
> them
> Please do after ARROW-3144



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3779) [C++/Python] Validate timezone passed to pa.timestamp

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855723#comment-16855723
 ] 

Antoine Pitrou commented on ARROW-3779:
---

Validating the timezone implies we have access to the Olson database or 
something similar. Not sure this is a priority for us given the amount of 
scaffolding that will probably be required in the build chain.

> [C++/Python] Validate timezone passed to pa.timestamp
> -
>
> Key: ARROW-3779
> URL: https://issues.apache.org/jira/browse/ARROW-3779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3779) [C++/Python] Validate timezone passed to pa.timestamp

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3779:
--
Priority: Minor  (was: Major)

> [C++/Python] Validate timezone passed to pa.timestamp
> -
>
> Key: ARROW-3779
> URL: https://issues.apache.org/jira/browse/ARROW-3779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Priority: Minor
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3779) [C++/Python] Validate timezone passed to pa.timestamp

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3779:
--
Fix Version/s: (was: 0.14.0)

> [C++/Python] Validate timezone passed to pa.timestamp
> -
>
> Key: ARROW-3779
> URL: https://issues.apache.org/jira/browse/ARROW-3779
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-04 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-5491.
--
Resolution: Fixed

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855722#comment-16855722
 ] 

Antoine Pitrou commented on ARROW-3877:
---

Do we have a use case currently for one-shot decompression without knowing the 
decompressed length?
Compressed files are read using streaming decompression (which is more 
reasonable for huge data anyway).

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5285:
--
Component/s: GPU
 C++ - Plasma

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-04 Thread JIRA


[ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855730#comment-16855730
 ] 

Romain François commented on ARROW-5488:


That sounds easier than what I am currently trying to do :)

 

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5331) [C++] FlightDataStream should be higher-level

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855776#comment-16855776
 ] 

Antoine Pitrou commented on ARROW-5331:
---

I had overlooked that the {{FlightDescriptor}} is unused where 
{{FlightDataStream}} is concerned. {{FlightDataStream}} is used for the 
server's {{DoGet}} implementation only. So {{RecordBatchStream}} should already 
be sufficient in all cases where a record batches-only data stream is desired.

This leaves the question of heterogenous Flight streams. They should be handled 
at the IPC layer first before adapting Flight to work with them. We probably 
need some kind of IPC {{Datum}} that can represent several different kinds of 
data (record batch, tensor...).

> [C++] FlightDataStream should be higher-level
> -
>
> Key: ARROW-5331
> URL: https://issues.apache.org/jira/browse/ARROW-5331
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Affects Versions: 0.13.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> Currently, {{FlightDataStream}} is expected to provide {{FlightPayload}} 
> objects. This requires the user to handle IPC serialization themselves.
> Instead, it could provide higher-level {{FlightData}} objects (perhaps a 
> simple struct containing a {{FlightDescriptor}} and a {{RecordBatch}}), 
> letting Flight handle IPC encoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3877:
--
Fix Version/s: (was: 0.14.0)

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5334) [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for consistency

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5334.
---
Resolution: Fixed

Issue resolved by pull request 4470
[https://github.com/apache/arrow/pull/4470]

> [C++] Add "Type" to names of arrow::Integer, arrow::FloatingPoint classes for 
> consistency
> -
>
> Key: ARROW-5334
> URL: https://issues.apache.org/jira/browse/ARROW-5334
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> These intermediate classes used for template metaprogramming (in particular, 
> {{std::is_base_of}}) have inconsistent names with the rest of data types. For 
> clarity, I think we should add "Type" to these class names and others like 
> them
> Please do after ARROW-3144



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855783#comment-16855783
 ] 

Antoine Pitrou commented on ARROW-3877:
---

If it's only to provide a compression toolbox to Python users then I think this 
is low priority.

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855784#comment-16855784
 ] 

Wes McKinney commented on ARROW-3877:
-

Agreed

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3877:
--
Priority: Minor  (was: Major)

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 0.14.0
>
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5507) [Plasma] [CUDA] Compile error

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5507:
-

Assignee: Antoine Pitrou

> [Plasma] [CUDA] Compile error
> -
>
> Key: ARROW-5507
> URL: https://issues.apache.org/jira/browse/ARROW-5507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, GPU
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I'm starting getting this today:
> {code}
> ../src/plasma/protocol.cc:546:55: error: no matching member function for call 
> to 'CreateVector'
>   handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
>   ^~~~
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
>  note: candidate function not viable: no known conversion from 
> 'std::shared_ptr' to 'const std::vector' for 1st argument
>   Offset> CreateVector(const std::vector ) {
>   ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
>  note: candidate template ignored: could not match 'vector' against 
> 'shared_ptr'
>   template Offset> CreateVector(const std::vector 
> ) {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(const T *v, size_t len) 
> {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   Offset>> CreateVector(const Offset *v, size_t len) {
> ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(size_t vector_size,
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
>  note: candidate function template not viable: requires 3 arguments, but 1 
> was provided
>   Offset> CreateVector(size_t vector_size, F f, S *state) {
> ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5507) [Plasma] [CUDA] Compile error

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5507.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4468
[https://github.com/apache/arrow/pull/4468]

> [Plasma] [CUDA] Compile error
> -
>
> Key: ARROW-5507
> URL: https://issues.apache.org/jira/browse/ARROW-5507
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma, GPU
>Reporter: Antoine Pitrou
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I'm starting getting this today:
> {code}
> ../src/plasma/protocol.cc:546:55: error: no matching member function for call 
> to 'CreateVector'
>   handles.push_back(fb::CreateCudaHandle(fbb, fbb.CreateVector(handle)));
>   ^~~~
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1484:27:
>  note: candidate function not viable: no known conversion from 
> 'std::shared_ptr' to 'const std::vector' for 1st argument
>   Offset> CreateVector(const std::vector ) {
>   ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1477:42:
>  note: candidate template ignored: could not match 'vector' against 
> 'shared_ptr'
>   template Offset> CreateVector(const std::vector 
> ) {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1443:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(const T *v, size_t len) 
> {
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1465:29:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   Offset>> CreateVector(const Offset *v, size_t len) {
> ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1501:42:
>  note: candidate function template not viable: requires 2 arguments, but 1 
> was provided
>   template Offset> CreateVector(size_t vector_size,
>  ^
> /home/antoine/miniconda3/envs/pyarrow/include/flatbuffers/flatbuffers.h:1520:21:
>  note: candidate function template not viable: requires 3 arguments, but 1 
> was provided
>   Offset> CreateVector(size_t vector_size, F f, S *state) {
> ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5285.
---
Resolution: Fixed

Issue resolved by pull request 4277
[https://github.com/apache/arrow/pull/4277]

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-04 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5285:
-

Assignee: shengjun.li

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3298) [C++] Move murmur3 hash implementation to arrow/util

2019-06-04 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855725#comment-16855725
 ] 

Antoine Pitrou commented on ARROW-3298:
---

What complicates things a bit is that there are several different versions of 
Murmur (murmur2, murmur3, 32-bit-hash-producing, 64-bit-hash-producing) and 
also potentially several different implementations of each (with different 
performance characteristics).

So some review of current usage accross the codebase (Arrow, Plasma, Parquet, 
Gandiva) is needed. [~fsaintjacques]

> [C++] Move murmur3 hash implementation to arrow/util
> 
>
> Key: ARROW-3298
> URL: https://issues.apache.org/jira/browse/ARROW-3298
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> It would be good to consolidate hashing utility code in a central place (this 
> is currently in src/parquet)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3877) [C++] Provide access to "maximum decompressed size" functions in compression libraries (if they exist)

2019-06-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855782#comment-16855782
 ] 

Wes McKinney commented on ARROW-3877:
-

If we wanted our {{pyarrow.compress}} and {{pyarrow.decompress}} functions to 
be interchangeable with their counterparts in such libraries like 
python-snappy, it would be helpful to be able to invoke decompress without 
knowing the exact uncompressed length. Some compressors require the 
uncompressed length so in that case NotImplemented would be returned

> [C++] Provide access to "maximum decompressed size" functions in compression 
> libraries (if they exist)
> --
>
> Key: ARROW-3877
> URL: https://issues.apache.org/jira/browse/ARROW-3877
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> As follow up to ARROW-3831, some compression libraries have a function to 
> provide a hint for sizing the output buffer (if it is not known already) for 
> one-shot decompression. This would be helpful for sizing allocations in such 
> cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5488:
--
Labels: pull-request-available  (was: )

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>  Labels: pull-request-available
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5485) [Gandiva][Crossbow] OSx builds failing

2019-06-04 Thread Praveen Kumar Desabandu (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855982#comment-16855982
 ] 

Praveen Kumar Desabandu commented on ARROW-5485:


[~wesmckinn] - any pointers will be highly appreciated and useful :)

> [Gandiva][Crossbow] OSx builds failing
> --
>
> Key: ARROW-5485
> URL: https://issues.apache.org/jira/browse/ARROW-5485
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.14.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
> Fix For: 0.14.0
>
>
> OSX builds are failing for the last 3 days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5077) [Rust] Release process should change Cargo.toml to use release versions

2019-06-04 Thread Sutou Kouhei (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sutou Kouhei resolved ARROW-5077.
-
Resolution: Fixed

Issue resolved by pull request 4460
[https://github.com/apache/arrow/pull/4460]

> [Rust] Release process should change Cargo.toml to use release versions
> ---
>
> Key: ARROW-5077
> URL: https://issues.apache.org/jira/browse/ARROW-5077
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Andy Grove
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the dev tree we use relative path dependencies between arrow, parquet, and 
> datafusion, which means we can't just run cargo publish for each crate from 
> the release source tarball.
> It would be good to have the relaese packaging change the Cargo.toml for 
> parquet and datafusion to have dependencies on a versioned release instead of 
> a relative path to remove this manual step when publishing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5508) [C++] Create reusable Iterator interface

2019-06-04 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5508:
---

 Summary: [C++] Create reusable Iterator interface 
 Key: ARROW-5508
 URL: https://issues.apache.org/jira/browse/ARROW-5508
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.14.0


We have various iterator-like classes. I envision a reusable interface like

{code}
template 
class Iterator {
 public:
  virtual ~Iterator() = default;
  virtual Status Next(T* out) = 0;
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5510) [Format] Feather V2

2019-06-04 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5510:
--

 Summary: [Format] Feather V2
 Key: ARROW-5510
 URL: https://issues.apache.org/jira/browse/ARROW-5510
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Neal Richardson
Assignee: Wes McKinney
 Fix For: 0.14.0


The initial Feather file format is a minimal subset of the Arrow IPC format. It 
has a number of limitations (see 
[https://wesmckinney.com/blog/feather-arrow-future/]). 

We want to retain "feather" as the name of the on-disk representation of Arrow 
memory, so in order to support everything that Arrow supports, we need a 
"feather 2.0" format.

IIUC, defining the file format is "done" (dump the memory to disk). Remaining 
issues include upgrading "feather" readers and writers in all languages to 
support both feather 1.0 and feather 2.0. (e.g. 
https://issues.apache.org/jira/browse/ARROW-5501)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5501) [R] read/write_feather/arrow?

2019-06-04 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856132#comment-16856132
 ] 

Neal Richardson commented on ARROW-5501:


Created here: https://issues.apache.org/jira/browse/ARROW-5510

> [R] read/write_feather/arrow?
> -
>
> Key: ARROW-5501
> URL: https://issues.apache.org/jira/browse/ARROW-5501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> read_feather and write_feather exist, and there is also write_arrow. But no 
> read_arrow.
> Some questions (which go beyond just R): There's talk of a "feather 2.0", 
> i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). 
> Are we going to continue to call the file format "Feather", and possibly 
> continue supporting the "feather 1.0" format as a subset/special case? Or 
> will "feather" mean this limited format and "arrow" be the name of the 
> full-featured file?
> In terms of this issue, should write_arrow be folded into write_feather and 
> there be an argument for indicating which version to write? Or should the 
> distinction be maintained, and we need to add a read_arrow() function?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2447) [C++] Create a device abstraction

2019-06-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2447:

Fix Version/s: (was: 0.14.0)

> [C++] Create a device abstraction
> -
>
> Key: ARROW-2447
> URL: https://issues.apache.org/jira/browse/ARROW-2447
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, GPU
>Affects Versions: 0.9.0
>Reporter: Antoine Pitrou
>Assignee: Pearu Peterson
>Priority: Major
>
> Right now, a plain Buffer doesn't carry information about where it actually 
> lies. That information also cannot be passed around, so you get APIs like 
> {{PlasmaClient}} which take or return device number integers, and have 
> implementations which hardcode operations on CUDA buffers. Also, unsuspecting 
> receivers of a {{Buffer}} pointer may try to act on the underlying memory 
> without knowing whether it's CPU-reachable or not.
> Here is a sketch for a proposed Device abstraction:
> {code}
> class Device {
> enum DeviceKind { KIND_CPU, KIND_CUDA };
> virtual DeviceKind kind() const;
> //MemoryPool* default_memory_pool() const;
> //std::shared_ptr Allocate(...);
> };
> class CpuDevice : public Device {};
> class CudaDevice : public Device {
> int device_num() const;
> };
> class Buffer {
> virtual DeviceKind device_kind() const;
> virtual std::shared_ptr device() const;
> virtual bool on_cpu() const {
> return true;
> }
> const uint8_t* cpu_data() const {
> return on_cpu() ? data() : nullptr;
> }
> uint8_t* cpu_mutable_data() {
> return on_cpu() ? mutable_data() : nullptr;
> }
> virtual CopyToCpu(std::shared_ptr dest) const;
> virtual CopyFromCpu(std::shared_ptr src);
> };
> class CudaBuffer : public Buffer {
> virtual bool on_cpu() const {
> return false;
> }
> };
> CopyBuffer(std::shared_ptr dest, const std::shared_ptr src);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2801) [Python] Implement splt_row_groups for ParquetDataset

2019-06-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2801:

Fix Version/s: (was: 0.14.0)
   0.15.0

> [Python] Implement splt_row_groups for ParquetDataset
> -
>
> Key: ARROW-2801
> URL: https://issues.apache.org/jira/browse/ARROW-2801
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Robbie Gruener
>Assignee: Robbie Gruener
>Priority: Minor
>  Labels: datasets, parquet, pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently the split_row_groups argument in ParquetDataset yields a not 
> implemented error. An easy and efficient way to implement this is by using 
> the summary metadata file instead of opening every footer file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5508) [C++] Create reusable Iterator interface

2019-06-04 Thread Liya Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856229#comment-16856229
 ] 

Liya Fan commented on ARROW-5508:
-

[~wesmckinn], thanks for the good point.

What is the standard way to know if there is a next element in the iterator?

> [C++] Create reusable Iterator interface 
> 
>
> Key: ARROW-5508
> URL: https://issues.apache.org/jira/browse/ARROW-5508
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> We have various iterator-like classes. I envision a reusable interface like
> {code}
> template 
> class Iterator {
>  public:
>   virtual ~Iterator() = default;
>   virtual Status Next(T* out) = 0;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5511) [Packaging] Enable Flight in Conda packages

2019-06-04 Thread David Li (JIRA)
David Li created ARROW-5511:
---

 Summary: [Packaging] Enable Flight in Conda packages
 Key: ARROW-5511
 URL: https://issues.apache.org/jira/browse/ARROW-5511
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Packaging, Python
Reporter: David Li
Assignee: David Li
 Fix For: 0.14.0


We should build Conda packages with Flight enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5511) [Packaging] Enable Flight in Conda packages

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5511:
--
Labels: pull-request-available  (was: )

> [Packaging] Enable Flight in Conda packages
> ---
>
> Key: ARROW-5511
> URL: https://issues.apache.org/jira/browse/ARROW-5511
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging, Python
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> We should build Conda packages with Flight enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-5285) [C++][Plasma] GpuProcessHandle is not released when GPU object deleted

2019-06-04 Thread shengjun.li (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shengjun.li closed ARROW-5285.
--

It is fixed.

> [C++][Plasma] GpuProcessHandle is not released when GPU object deleted
> --
>
> Key: ARROW-5285
> URL: https://issues.apache.org/jira/browse/ARROW-5285
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, C++ - Plasma, GPU
>Affects Versions: 0.13.0
>Reporter: shengjun.li
>Assignee: shengjun.li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> cpp/CMakeLists.txt
>   option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA toolkit)" 
> ON)
>   option(ARROW_PLASMA "Build the plasma object store along with Arrow" ON)
> In the plasma client, GpuProcessHandle is never released although GPU object 
> is deleted.
> Thus, cuIpcCloseMemHandle is never called.
> When I repeatly creat and delete gpu memory, the following error may occur.
> IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 155 failed with 
> code 208: cuIpcOpenMemHandle(, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS)
> Note: CUDA_ERROR_ALREADY_MAPPED = 208



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5509) [R] write_parquet()

2019-06-04 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5509:
--

 Summary: [R] write_parquet()
 Key: ARROW-5509
 URL: https://issues.apache.org/jira/browse/ARROW-5509
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


We can read but not yet write. The C++ library supports this and pyarrow does 
it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1837) [Java] Unable to read unsigned integers outside signed range for bit width in integration tests

2019-06-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1837.
-
Resolution: Fixed

Issue resolved by pull request 4432
[https://github.com/apache/arrow/pull/4432]

> [Java] Unable to read unsigned integers outside signed range for bit width in 
> integration tests
> ---
>
> Key: ARROW-1837
> URL: https://issues.apache.org/jira/browse/ARROW-1837
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Wes McKinney
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: columnar-format-1.0, pull-request-available
> Fix For: 0.14.0
>
> Attachments: generated_primitive.json
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> I believe this was introduced recently (perhaps in the refactors), but there 
> was a problem where the integration tests weren't being properly run that hid 
> the error from us
> see https://github.com/apache/arrow/pull/1294#issuecomment-345553066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5485) [Gandiva][Crossbow] OSx builds failing

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5485:
--
Labels: pull-request-available  (was: )

> [Gandiva][Crossbow] OSx builds failing
> --
>
> Key: ARROW-5485
> URL: https://issues.apache.org/jira/browse/ARROW-5485
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.14.0
>Reporter: Praveen Kumar Desabandu
>Assignee: Praveen Kumar Desabandu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> OSX builds are failing for the last 3 days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5242) [C++] Arrow doesn't compile cleanly with Visual Studio 2017 Update 9 or later due to narrowing

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5242:
--
Labels: pull-request-available  (was: )

> [C++] Arrow doesn't compile cleanly with Visual Studio 2017 Update 9 or later 
> due to narrowing
> --
>
> Key: ARROW-5242
> URL: https://issues.apache.org/jira/browse/ARROW-5242
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Billy Robert O'Neal III
>Assignee: Billy Robert O'Neal III
>Priority: Major
>  Labels: pull-request-available
>
> The std::string constructor call here is narrowing wchar_t to char, which 
> emits warning C4244 on current releases of Visual Studio: 
> [https://github.com/apache/arrow/blob/master/cpp/src/arrow/vendored/datetime/tz.cpp#L205]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5513) [Java] Refactor method name for getstartOffset to use camel case

2019-06-04 Thread Liya Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liya Fan updated ARROW-5513:

Description: 
The method getstartOffset in class 
org.apache.arrow.vector.BaseVariableWidthVector should be refactored to 
getStartOffset, to comply with the camel case.

Fortunately, this method is not public, so the changes are internal to Arrow.

> [Java] Refactor method name for getstartOffset to use camel case
> 
>
> Key: ARROW-5513
> URL: https://issues.apache.org/jira/browse/ARROW-5513
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Trivial
>
> The method getstartOffset in class 
> org.apache.arrow.vector.BaseVariableWidthVector should be refactored to 
> getStartOffset, to comply with the camel case.
> Fortunately, this method is not public, so the changes are internal to Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5335) [Python] Support for converting variable dictionaries to pandas

2019-06-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-5335:

Priority: Blocker  (was: Major)

> [Python] Support for converting variable dictionaries to pandas
> ---
>
> Key: ARROW-5335
> URL: https://issues.apache.org/jira/browse/ARROW-5335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Address after ARROW-3144. The current code presumes the dictionary is the 
> same for all chunks. We should check if the dictionary is the same for all 
> chunks, and if not, perform a {{DictionaryType::Unify}} operation and then 
> write out into the resulting {{CategoricalBlock}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5335) [Python] Support for converting variable dictionaries to pandas

2019-06-04 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856293#comment-16856293
 ] 

Wes McKinney commented on ARROW-5335:
-

I think it would be irresponsible to release without at least adding a check 
for all dictionaries being the same

> [Python] Support for converting variable dictionaries to pandas
> ---
>
> Key: ARROW-5335
> URL: https://issues.apache.org/jira/browse/ARROW-5335
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Blocker
> Fix For: 0.14.0
>
>
> Address after ARROW-3144. The current code presumes the dictionary is the 
> same for all chunks. We should check if the dictionary is the same for all 
> chunks, and if not, perform a {{DictionaryType::Unify}} operation and then 
> write out into the resulting {{CategoricalBlock}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5513) [Java] Refactor method name for getstartOffset to use camel case

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5513:
--
Labels: pull-request-available  (was: )

> [Java] Refactor method name for getstartOffset to use camel case
> 
>
> Key: ARROW-5513
> URL: https://issues.apache.org/jira/browse/ARROW-5513
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Trivial
>  Labels: pull-request-available
>
> The method getstartOffset in class 
> org.apache.arrow.vector.BaseVariableWidthVector should be refactored to 
> getStartOffset, to comply with the camel case.
> Fortunately, this method is not public, so the changes are internal to Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5115) [JS] Implement the Vector Builders

2019-06-04 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5115:
--
Labels: pull-request-available  (was: )

> [JS] Implement the Vector Builders
> --
>
> Key: ARROW-5115
> URL: https://issues.apache.org/jira/browse/ARROW-5115
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Affects Versions: 0.13.0
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
>  Labels: pull-request-available
>
> We should implement the streaming Vector Builders in JS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5020) [C++][Gandiva] Split Gandiva-related conda packages for builds into separate .yml conda env file

2019-06-04 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5020.
-
Resolution: Fixed

Issue resolved by pull request 4459
[https://github.com/apache/arrow/pull/4459]

> [C++][Gandiva] Split Gandiva-related conda packages for builds into separate 
> .yml conda env file
> 
>
> Key: ARROW-5020
> URL: https://issues.apache.org/jira/browse/ARROW-5020
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> These installs are large and should not be required unconditionally in CI and 
> elsewhere



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5512) [C++] Draft initial public APIs for Datasets project

2019-06-04 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5512:
---

 Summary: [C++] Draft initial public APIs for Datasets project
 Key: ARROW-5512
 URL: https://issues.apache.org/jira/browse/ARROW-5512
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.14.0


The objective of this is to ensure general alignment with the discussion 
document

https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=sharing

so that an initial working implementation can begin to take place



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5513) [Java] Refactor method name for getstartOffset to use camel case

2019-06-04 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5513:
---

 Summary: [Java] Refactor method name for getstartOffset to use 
camel case
 Key: ARROW-5513
 URL: https://issues.apache.org/jira/browse/ARROW-5513
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5506) [C++] Generic columnar format functionality

2019-06-04 Thread Andrei Gudkov (JIRA)
Andrei Gudkov created ARROW-5506:


 Summary: [C++] Generic columnar format functionality
 Key: ARROW-5506
 URL: https://issues.apache.org/jira/browse/ARROW-5506
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Andrei Gudkov


Discussion is here: [https://github.com/apache/arrow/pull/4066]

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5463) [Rust] Implement AsRef for Buffer

2019-06-04 Thread Chao Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved ARROW-5463.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4450
[https://github.com/apache/arrow/pull/4450]

> [Rust] Implement AsRef for Buffer
> -
>
> Key: ARROW-5463
> URL: https://issues.apache.org/jira/browse/ARROW-5463
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Rust
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Implement AsRef ArrowNativeType for Buffer



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)