[jira] [Commented] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests

2018-08-16 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583408#comment-16583408
 ] 

Wes McKinney commented on ARROW-1380:
-

The valgrind output for the Python unit tests where this occurs is being 
suppressed

see https://github.com/apache/arrow/pull/1883

> [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests
> -
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2797) [JS] comparison predicates don't work on 64-bit integers

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned ARROW-2797:


Assignee: Brian Hulette

> [JS] comparison predicates don't work on 64-bit integers
> 
>
> Key: ARROW-2797
> URL: https://issues.apache.org/jira/browse/ARROW-2797
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> The 64-bit integer vector {{get}} function returns a 2-element array, which 
> doesn't compare propery in the comparison predicates. We should special case 
> the comparisons for 64-bit integers and timestamps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2797) [JS] comparison predicates don't work on 64-bit integers

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2797:
-
Fix Version/s: JS-0.5.0

> [JS] comparison predicates don't work on 64-bit integers
> 
>
> Key: ARROW-2797
> URL: https://issues.apache.org/jira/browse/ARROW-2797
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> The 64-bit integer vector {{get}} function returns a 2-element array, which 
> doesn't compare propery in the comparison predicates. We should special case 
> the comparisons for 64-bit integers and timestamps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2235) [JS] Add tests for IPC messages split across multiple buffers

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2235:
-
Fix Version/s: JS-0.5.0

> [JS] Add tests for IPC messages split across multiple buffers
> -
>
> Key: ARROW-2235
> URL: https://issues.apache.org/jira/browse/ARROW-2235
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> See https://github.com/apache/arrow/pull/1670
> This is probably easiest to do after the JS IPC writer is finished 
> (ARROW-2116)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2410) [JS] Add DataFrame.scanAsync

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2410:
-
Fix Version/s: JS-0.5.0

> [JS] Add DataFrame.scanAsync
> 
>
> Key: ARROW-2410
> URL: https://issues.apache.org/jira/browse/ARROW-2410
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> Add a version of `DataFrame.scan`, `scanAsync` that yields periodically. The 
> yield frequency could be specified either as a number of record batches, or a 
> number of records.
> This scan should also be cancellable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1700) [JS] Implement Node.js client for Plasma store

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-1700:
-
Fix Version/s: JS-0.5.0

> [JS] Implement Node.js client for Plasma store
> --
>
> Key: ARROW-1700
> URL: https://issues.apache.org/jira/browse/ARROW-1700
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript, Plasma (C++)
>Reporter: Robert Nishihara
>Priority: Major
> Fix For: JS-0.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1700) [JS] Implement Node.js client for Plasma store

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-1700:
-
Summary: [JS] Implement Node.js client for Plasma store  (was: Implement 
Node.js client for Plasma store)

> [JS] Implement Node.js client for Plasma store
> --
>
> Key: ARROW-1700
> URL: https://issues.apache.org/jira/browse/ARROW-1700
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript, Plasma (C++)
>Reporter: Robert Nishihara
>Priority: Major
> Fix For: JS-0.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2766) [JS] Add ability to construct a Table from a list of Arrays/TypedArrays

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2766:
-
Fix Version/s: JS-0.5.0

> [JS] Add ability to construct a Table from a list of Arrays/TypedArrays
> ---
>
> Key: ARROW-2766
> URL: https://issues.apache.org/jira/browse/ARROW-2766
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> Something like 
> {code:javascript}
> Table.from({'col1': [...], 'col2': [...], 'col3': [...]})
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1744) [Plasma] Provide TensorFlow operator to read tensors from plasma

2018-08-16 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583295#comment-16583295
 ] 

Brian Hulette commented on ARROW-1744:
--

I think the same thing happened in ARROW-2940, ARROW-2451, ARROW-2437, 
ARROW-2458, and ARROW-2397 - I went ahead and updated them all. It looks like 
these mistakes prevented them from being added to CHANGELOG.md for v0.10.0

> [Plasma] Provide TensorFlow operator to read tensors from plasma
> 
>
> Key: ARROW-1744
> URL: https://issues.apache.org/jira/browse/ARROW-1744
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> see https://www.tensorflow.org/extend/adding_an_op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1744) [Plasma] Provide TensorFlow operator to read tensors from plasma

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-1744:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> [Plasma] Provide TensorFlow operator to read tensors from plasma
> 
>
> Key: ARROW-1744
> URL: https://issues.apache.org/jira/browse/ARROW-1744
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> see https://www.tensorflow.org/extend/adding_an_op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2397) Document changes in Tensor encoding in IPC.md.

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2397:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> Document changes in Tensor encoding in IPC.md.
> --
>
> Key: ARROW-2397
> URL: https://issues.apache.org/jira/browse/ARROW-2397
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Update IPC.md to reflect the changes in 
> https://github.com/apache/arrow/pull/1802.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2458:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2451) Handle more dtypes efficiently in custom numpy array serializer.

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2451:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> Handle more dtypes efficiently in custom numpy array serializer.
> 
>
> Key: ARROW-2451
> URL: https://issues.apache.org/jira/browse/ARROW-2451
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Robert Nishihara
>Assignee: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Right now certain dtypes like bool or fixed length strings are serialized as 
> lists, which is inefficient. We can handle these more efficiently by casting 
> them to uint8 and saving the original dtype as additional data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2437) [C++] Change of arrow::ipc::ReadMessage signature breaks ABI compability

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2437:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> [C++] Change of arrow::ipc::ReadMessage signature breaks ABI compability
> 
>
> Key: ARROW-2437
> URL: https://issues.apache.org/jira/browse/ARROW-2437
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We changed the signature of the method from
> {code}
> ReadMessage ( arrow::io::InputStream* file, std::unique_ptr std::default_delete >* message ) 
> {code}
> to
> {code}
> ReadMessage ( arrow::io::InputStream* file, std::unique_ptr std::default_delete >* message, bool aligned ) 
> {code}
> We should add the old signature so that the 0.9.1 release is ABI compatible 
> to 0.9.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2940) [Python] Import error with pytorch 0.3

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2940:
-
Fix Version/s: (was: JS-0.4.0)
   0.10.0

> [Python] Import error with pytorch 0.3
> --
>
> Key: ARROW-2940
> URL: https://issues.apache.org/jira/browse/ARROW-2940
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The fix in ARROW-2920 doesn't work in versions strictly before pytorch 0.4:
> {code:java}
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/home/ubuntu/arrow/python/pyarrow/__init__.py", line 57, in 
>     compat.import_pytorch_extension()
>   File "/home/ubuntu/arrow/python/pyarrow/compat.py", line 249, in 
> import_pytorch_extension
>     ctypes.CDLL(os.path.join(path, "lib/libcaffe2.so"))
>   File 
> "/home/ubuntu/anaconda3/envs/breaking-env2/lib/python3.5/ctypes/__init__.py", 
> line 351, in __init__
>     self._handle = _dlopen(self._name, mode)
> OSError: 
> /home/ubuntu/anaconda3/envs/breaking-env2/lib/python3.5/site-packages/torch/lib/libcaffe2.so:
>  cannot open shared object file: No such file or directory{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2975) [Plasma] TensorFlow op: Compilation only working if arrow found by pkg-config

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2975:
-
Fix Version/s: (was: JS-0.4.0)
   0.11.0

> [Plasma] TensorFlow op: Compilation only working if arrow found by pkg-config
> -
>
> Key: ARROW-2975
> URL: https://issues.apache.org/jira/browse/ARROW-2975
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the pyarrow/tensorflow/build.sh script uses pyarrow to discover the 
> arrow libraries to link against. However, this is not working on the pip 
> package of pyarrow (since the .pc files are not shipped with it).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1744) [Plasma] Provide TensorFlow operator to read tensors from plasma

2018-08-16 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583285#comment-16583285
 ] 

Brian Hulette commented on ARROW-1744:
--

It looks like this was actually merged for 0.10 (and certainly not JS-0.4.0) - 
is it too late to update the fix version?

> [Plasma] Provide TensorFlow operator to read tensors from plasma
> 
>
> Key: ARROW-1744
> URL: https://issues.apache.org/jira/browse/ARROW-1744
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: JS-0.4.0
>
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> see https://www.tensorflow.org/extend/adding_an_op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2764) [JS] Easy way to create a new Table with an additional column

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2764:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Easy way to create a new Table with an additional column
> -
>
> Key: ARROW-2764
> URL: https://issues.apache.org/jira/browse/ARROW-2764
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Brian Hulette
>Priority: Major
> Fix For: JS-0.5.0
>
>
> It should be easier to add a new column to a table. API could be either 
> `table.addColumn(vector)` or `table.merge(..tables or vectors)`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2819) [JS] Fails to build with TS 2.8.3

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette resolved ARROW-2819.
--
Resolution: Fixed

Fixed in [#2201](https://github.com/apache/arrow/pull/2201)

> [JS] Fails to build with TS 2.8.3
> -
>
> Key: ARROW-2819
> URL: https://issues.apache.org/jira/browse/ARROW-2819
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Affects Versions: JS-0.3.1
>Reporter: Brian Hulette
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.4.0
>
>
> See the [GitHub 
> issue|https://github.com/apache/arrow/issues/2115#issuecomment-403612925]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2828) [JS] Refactor Vector Data classes

2018-08-16 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-2828:
-
Fix Version/s: (was: JS-0.4.0)
   JS-0.5.0

> [JS] Refactor Vector Data classes
> -
>
> Key: ARROW-2828
> URL: https://issues.apache.org/jira/browse/ARROW-2828
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Paul Taylor
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.5.0
>
>
> In order to make it easier to build some of the higher-level APIs, we need to 
> slim the Vector Data classes down to just one base implementation.
> Initial WIP commit here, and work will continue in this branch: 
> https://github.com/trxcllnt/arrow/commit/dfad9023583bef4f8d2a50ea25f643e4bccbc805#diff-2512057432c4ebf55c6308cb06b43b08



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests

2018-08-16 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583276#comment-16583276
 ] 

Lukasz Bartnik edited comment on ARROW-1380 at 8/17/18 2:22 AM:


I took a quick look at a recent build 
([https://travis-ci.org/apache/arrow/builds/417014924).] Neither of its C++ 
jobs ([https://travis-ci.org/apache/arrow/jobs/417014931] and 
[https://travis-ci.org/apache/arrow/jobs/417014927)] seem to use valgrind. The 
only job that seems to use valgrind is the openjdk8/gcc one 
([https://travis-ci.org/apache/arrow/jobs/417014925)] but there are no reports 
from valgrind in the log; in fact, valgrind doesn't seem to be used there at 
all.

Looking at job descriptions: the original job where "still reachable" blocks 
are reported was a "gcc C++" one, but there were two such jobs back then 
(3786.1 and 3786.8) whereas there's only one now (9492.7).

It seem that the error has been fixed between builds 3786 and 9492.

I'm attaching the LastTest.log which does not contain any valgrind alarms: 
every "HEAP SUMMARY" line is followed by a "in use at exit: 0 bytes in 0 
blocks" line.


was (Author: lbartnik):
I took a quick look at a recent build 
([https://travis-ci.org/apache/arrow/builds/417014924).] Neither of its C++ 
jobs ([https://travis-ci.org/apache/arrow/jobs/417014931] and 
[https://travis-ci.org/apache/arrow/jobs/417014927)] seem to use valgrind. The 
only job that seems to use valgrind is the openjdk8/gcc one 
([https://travis-ci.org/apache/arrow/jobs/417014925)] but there are no reports 
from valgrind in the log; in fact, valgrind doesn't seem to be used there at 
all.

Looking at job descriptions: the original job where "still reachable" blocks 
are reported was a "gcc C++" one, but there were two such jobs back then 
(3786.1 and 3786.8) whereas there's only one now (9492.7).

It seem that the error has been fixed between builds 3786 and 9492.

I'm attaching the LastTest.log which does not contain any valgrind alarms: 
every "HEAP SUMMARY" line is followed by a "in use at exit: 0 bytes in 0 
blocks" line.

> [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests
> -
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests

2018-08-16 Thread Lukasz Bartnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Bartnik updated ARROW-1380:
--
Attachment: LastTest.log

> [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests
> -
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
> Attachments: LastTest.log
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1380) [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests

2018-08-16 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583276#comment-16583276
 ] 

Lukasz Bartnik commented on ARROW-1380:
---

I took a quick look at a recent build 
([https://travis-ci.org/apache/arrow/builds/417014924).] Neither of its C++ 
jobs ([https://travis-ci.org/apache/arrow/jobs/417014931] and 
[https://travis-ci.org/apache/arrow/jobs/417014927)] seem to use valgrind. The 
only job that seems to use valgrind is the openjdk8/gcc one 
([https://travis-ci.org/apache/arrow/jobs/417014925)] but there are no reports 
from valgrind in the log; in fact, valgrind doesn't seem to be used there at 
all.

Looking at job descriptions: the original job where "still reachable" blocks 
are reported was a "gcc C++" one, but there were two such jobs back then 
(3786.1 and 3786.8) whereas there's only one now (9492.7).

It seem that the error has been fixed between builds 3786 and 9492.

I'm attaching the LastTest.log which does not contain any valgrind alarms: 
every "HEAP SUMMARY" line is followed by a "in use at exit: 0 bytes in 0 
blocks" line.

> [C++] Fix "still reachable" valgrind warnings in Plasma Python unit tests
> -
>
> Key: ARROW-1380
> URL: https://issues.apache.org/jira/browse/ARROW-1380
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.11.0
>
>
> I thought I fixed this, but they seem to have recurred:
> https://travis-ci.org/apache/arrow/jobs/266421430#L5220



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3065) concat_tables() failing from bad Pandas Metadata

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-3065:

Fix Version/s: (was: 0.9.0)
   0.11.0

> concat_tables() failing from bad Pandas Metadata
> 
>
> Key: ARROW-3065
> URL: https://issues.apache.org/jira/browse/ARROW-3065
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
>Reporter: David Lee
>Priority: Major
> Fix For: 0.11.0
>
>
> Looks like the major bug from 
> https://issues.apache.org/jira/browse/ARROW-1941 is back...
> After I downgraded from 0.10.0 to 0.9.0, the error disappeared..
> {code:python}
> new_arrow_table = pa.concat_tables(my_arrow_tables)
>  File "pyarrow/table.pxi", line 1562, in pyarrow.lib.concat_tables
>   File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Schema at index 2 was different:
> {code}
> In order to debug this I saved the first 4 arrow tables to 4 parquet files 
> and inspected the parquet files. The parquet schema is identical, but the 
> Pandas Metadata is different.
> {code:python}
> for i in range(5):
>  pq.write_table(my_arrow_tables[i], "test" + str(i) + ".parquet")
> {code}
> It looks like a column which contains empty strings is getting typed as 
> float64.
> {code:python}
> >>> test1.schema
> HoldingDetail_Id: string
> metadata
> 
> {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [
> {"name": "HoldingDetail_Id", "field_name": "HoldingDetail_Id", "pandas_type": 
> "unicode", "numpy_type": "object", "metadata": null},
> >>> test1[0]
> 
> [
>   [
> "Z4",
> "SF",
> "J7",
> "W6",
> "L7",
> "Q9",
> "NE",
> "F7",
> >>> test2.schema
> HoldingDetail_Id: string
> metadata
> 
> {b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [
> {"name": "HoldingDetail_Id", "field_name": "HoldingDetail_Id", "pandas_type": 
> "unicode", "numpy_type": "float64", "metadata": null},
> >>> test2[0]
> 
> [
>   [
> "",
> "",
> "",
> "",
> "",
> "",
> "",
> "",
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3065) concat_tables() failing from bad Pandas Metadata

2018-08-16 Thread David Lee (JIRA)
David Lee created ARROW-3065:


 Summary: concat_tables() failing from bad Pandas Metadata
 Key: ARROW-3065
 URL: https://issues.apache.org/jira/browse/ARROW-3065
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.10.0
Reporter: David Lee
 Fix For: 0.9.0


Looks like the major bug from https://issues.apache.org/jira/browse/ARROW-1941 
is back...

After I downgraded from 0.10.0 to 0.9.0, the error disappeared..

{code:python}
new_arrow_table = pa.concat_tables(my_arrow_tables)

 File "pyarrow/table.pxi", line 1562, in pyarrow.lib.concat_tables
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Schema at index 2 was different:
{code}

In order to debug this I saved the first 4 arrow tables to 4 parquet files and 
inspected the parquet files. The parquet schema is identical, but the Pandas 
Metadata is different.

{code:python}
for i in range(5):
 pq.write_table(my_arrow_tables[i], "test" + str(i) + ".parquet")
{code}

It looks like a column which contains empty strings is getting typed as float64.

{code:python}
>>> test1.schema
HoldingDetail_Id: string
metadata

{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [
{"name": "HoldingDetail_Id", "field_name": "HoldingDetail_Id", "pandas_type": 
"unicode", "numpy_type": "object", "metadata": null},

>>> test1[0]

[
  [
"Z4",
"SF",
"J7",
"W6",
"L7",
"Q9",
"NE",
"F7",


>>> test2.schema
HoldingDetail_Id: string
metadata

{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [
{"name": "HoldingDetail_Id", "field_name": "HoldingDetail_Id", "pandas_type": 
"unicode", "numpy_type": "float64", "metadata": null},

>>> test2[0]

[
  [
"",
"",
"",
"",
"",
"",
"",
"",
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1799.
-
Resolution: Fixed

Issue resolved by pull request 2440
[https://github.com/apache/arrow/pull/2440]

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Assignee: Lukasz Bartnik
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1799:
---

Assignee: Lukasz Bartnik

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Assignee: Lukasz Bartnik
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3064) [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables

2018-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3064:
--
Labels: pull-request-available  (was: )

> [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for 
> particular unit test executables
> ---
>
> Key: ARROW-3064
> URL: https://issues.apache.org/jira/browse/ARROW-3064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> See ARROW-1799



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3064) [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3064:
---

Assignee: Lukasz Bartnik

> [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for 
> particular unit test executables
> ---
>
> Key: ARROW-3064
> URL: https://issues.apache.org/jira/browse/ARROW-3064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Lukasz Bartnik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> See ARROW-1799



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3064) [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3064.
-
Resolution: Fixed

Issue resolved by pull request 2439
[https://github.com/apache/arrow/pull/2439]

> [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for 
> particular unit test executables
> ---
>
> Key: ARROW-3064
> URL: https://issues.apache.org/jira/browse/ARROW-3064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> See ARROW-1799



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1799:
--
Labels: pull-request-available  (was: )

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3064) [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables

2018-08-16 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-3064:
---

 Summary: [C++] Add option to ADD_ARROW_TEST to indicate additional 
dependencies for particular unit test executables
 Key: ARROW-3064
 URL: https://issues.apache.org/jira/browse/ARROW-3064
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 0.11.0


See ARROW-1799



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582778#comment-16582778
 ] 

Wes McKinney edited comment on ARROW-1799 at 8/16/18 4:25 PM:
--

I missed this issue when it was reported. This can be resolved by adding 
dependencies to the tests on the plasma_store executable at

https://github.com/apache/arrow/blob/master/cpp/src/plasma/CMakeLists.txt#L199

we should probably add a {{DEPENDENCIES}} arg to {{ADD_ARROW_TEST}} to make 
this simpler


was (Author: wesmckinn):
I missed this issue when it was reported. This can be resolved by adding 
dependencies to the tests on the plasma_store executable at

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L355

we should probably add a {{DEPENDENCIES}} arg to {{ADD_ARROW_TEST}} to make 
this simpler

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
> Fix For: 0.11.0
>
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582778#comment-16582778
 ] 

Wes McKinney commented on ARROW-1799:
-

I missed this issue when it was reported. This can be resolved by adding 
dependencies to the tests on the plasma_store executable at

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L355

we should probably add a {{DEPENDENCIES}} arg to {{ADD_ARROW_TEST}} to make 
this simpler

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
> Fix For: 0.11.0
>
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1799:

Fix Version/s: 0.11.0

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
> Fix For: 0.11.0
>
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582765#comment-16582765
 ] 

Lukasz Bartnik edited comment on ARROW-1799 at 8/16/18 4:18 PM:


{{make unittest}} fails repeatedly unless {{make all}}, which creates 
{{libarrow.\*}} and {{libplasma.\*}} libraries, is run beforehand. Quite 
possibly, the {{unittest}} target needs additional dependencies.


was (Author: lbartnik):
{{make unittest}} fails repeatedly unless {{make all}}, which creates 
{{libarrow.*}} and {{libplasma.*}} libraries, is run beforehand. Quite 
possibly, the {{unittest}} target needs additional dependencies.

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1799) [Plasma C++] Make unittest does not create plasma store executable

2018-08-16 Thread Lukasz Bartnik (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582765#comment-16582765
 ] 

Lukasz Bartnik commented on ARROW-1799:
---

{{make unittest}} fails repeatedly unless {{make all}}, which creates 
{{libarrow.*}} and {{libplasma.*}} libraries, is run beforehand. Quite 
possibly, the {{unittest}} target needs additional dependencies.

> [Plasma C++] Make unittest does not create plasma store executable
> --
>
> Key: ARROW-1799
> URL: https://issues.apache.org/jira/browse/ARROW-1799
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Plasma (C++)
>Reporter: William Paul
>Priority: Minor
>
> Steps to reproduce from a fresh clone of Arrow:
> mkdir cpp/debug
> cd cpp/debug
> cmake .. -DARROW_PLASMA=on
> make -j8 unittest
> client_tests may then fail due to the store executable not being created. The 
> first time I reproduced the issue the test did fail, but the test passed on 
> subsequent reproductions of this issue. Regardless, if you look in 
> cpp/debug/debug, there is no plasma store executable. If you then call make, 
> the store executable is generated in that directory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3050) [C++] Adopt HiveServer2 client C++ codebase

2018-08-16 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582681#comment-16582681
 ] 

Wes McKinney commented on ARROW-3050:
-

In progress: https://github.com/wesm/arrow/tree/hs2client-fork

> [C++] Adopt HiveServer2 client C++ codebase
> ---
>
> Key: ARROW-3050
> URL: https://issues.apache.org/jira/browse/ARROW-3050
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> I helped develop a small C++/Python library for interacting with databases 
> like Hive and Impala via the HiveServer2 Thrift protocol and making them 
> accessible to Python / pandas:
> https://github.com/cloudera/hs2client
> Internally this interfaces with HS2's own columnar representation. Arrow is a 
> natural partner for this project, much of which could be discarded. I think 
> Arrow would make as much sense as any place to develop this codebase further. 
> It could be later split off into a new project if a large enough community 
> develops
> cc [~twmarshall] [~mjacobs] for thoughts
> If we did this, do we need to do a software grant (essentially what I'm 
> proposing is to fork)? Can we just attribute the original Cloudera authors in 
> LICENSE.txt?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3050) [C++] Adopt HiveServer2 client C++ codebase

2018-08-16 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582679#comment-16582679
 ] 

Wes McKinney commented on ARROW-3050:
-

Makes sense. I think this supports the argument to bring hs2client and Arrow 
closer together. 

I'm not proposing to include this in Arrow's CI since the testing procedure 
(with Hive or Impala) is more complicated. 

I started a branch to do the integration work, when I have something worth 
looking at I will put up a PR

> [C++] Adopt HiveServer2 client C++ codebase
> ---
>
> Key: ARROW-3050
> URL: https://issues.apache.org/jira/browse/ARROW-3050
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> I helped develop a small C++/Python library for interacting with databases 
> like Hive and Impala via the HiveServer2 Thrift protocol and making them 
> accessible to Python / pandas:
> https://github.com/cloudera/hs2client
> Internally this interfaces with HS2's own columnar representation. Arrow is a 
> natural partner for this project, much of which could be discarded. I think 
> Arrow would make as much sense as any place to develop this codebase further. 
> It could be later split off into a new project if a large enough community 
> develops
> cc [~twmarshall] [~mjacobs] for thoughts
> If we did this, do we need to do a software grant (essentially what I'm 
> proposing is to fork)? Can we just attribute the original Cloudera authors in 
> LICENSE.txt?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3059) [C++] Streamline namespace array::test

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3059.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2436
[https://github.com/apache/arrow/pull/2436]

> [C++] Streamline namespace array::test
> --
>
> Key: ARROW-3059
> URL: https://issues.apache.org/jira/browse/ARROW-3059
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently we have some test helpers that live in the {{arrow::test}} 
> namespace, some in {{arrow}} (or topic subnamespaces such as {{arrow::io}}). 
> I see no reason for the discrepancy.
> I propose the simple solution of removing the {{arrow::test}} namespace 
> altogether. If not desirable, then we should make sure we put all helpers in 
> that namespace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3062) [Python] Extend fast libtensorflow_framework.so compatibility workaround to Python 2.7

2018-08-16 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3062.
-
   Resolution: Fixed
Fix Version/s: 0.11.0

Issue resolved by pull request 2435
[https://github.com/apache/arrow/pull/2435]

> [Python] Extend fast libtensorflow_framework.so compatibility workaround to 
> Python 2.7
> --
>
> Key: ARROW-3062
> URL: https://issues.apache.org/jira/browse/ARROW-3062
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.10.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The workaround ARROW-2657 should be optimized a little bit and use the 
> loading of libtensorflow_framework.so (instead of doing a full "import 
> tensorflow") also for Python 2.7.
> We are running into this, since doing "import tensorflow" spawns a number of 
> threads, so without this optimization, using many python processes with 
> pyarrow will hit OS limits for threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3048) Import pyarrow fails if scikit-learn is installed from conda

2018-08-16 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582326#comment-16582326
 ] 

Uwe L. Korn commented on ARROW-3048:


This is because boost-cpp from defaults is installed. Please only use C++ 
packages only from defaults or conda-forge, don't mix the two repositorities.

> Import pyarrow fails if scikit-learn is installed from conda
> 
>
> Key: ARROW-3048
> URL: https://issues.apache.org/jira/browse/ARROW-3048
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.10.0
> Environment: Ubuntu 16.04
>Reporter: Jarno Seppanen
>Priority: Major
>
> Hi, installing both pyarrow 0.10.0 and scikit-learn 0.19.2 causes pyarrow 
> import to break.
> Steps to reproduce
>  # cat >environment.yml < {code:java}
> name: asdf
> channels:
> - defaults
> - conda-forge
> dependencies:
> - python=3.6
> - pyarrow=0.10.0
> - scikit-learn=0.19.2{code}
> EOF
>  # conda env create
>  # source activate asdf
>  # python -c 'import pyarrow'
> {code:java}
> Traceback (most recent call last):
> File "", line 1, in 
> File 
> "/home/jarno/miniconda3/envs/asdf/lib/python3.6/site-packages/pyarrow/__init__.py",
>  line 60, in 
> from pyarrow.lib import cpu_count, set_cpu_count
> ImportError: 
> /home/jarno/miniconda3/envs/asdf/lib/python3.6/site-packages/pyarrow/../../../libparquet.so.1:
>  undefined symbol: 
> _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcSsEESaINS_9sub_matchIS5_12maybe_assignERKS9_{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-3050) [C++] Adopt HiveServer2 client C++ codebase

2018-08-16 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1658#comment-1658
 ] 

Uwe L. Korn commented on ARROW-3050:


Hive is working on Arrow support for their connectors. Once that is in state to 
be developed against, this could then be used in hs2client. I think before that 
we should try to keep them separate to not overload the Arrow build.

> [C++] Adopt HiveServer2 client C++ codebase
> ---
>
> Key: ARROW-3050
> URL: https://issues.apache.org/jira/browse/ARROW-3050
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> I helped develop a small C++/Python library for interacting with databases 
> like Hive and Impala via the HiveServer2 Thrift protocol and making them 
> accessible to Python / pandas:
> https://github.com/cloudera/hs2client
> Internally this interfaces with HS2's own columnar representation. Arrow is a 
> natural partner for this project, much of which could be discarded. I think 
> Arrow would make as much sense as any place to develop this codebase further. 
> It could be later split off into a new project if a large enough community 
> develops
> cc [~twmarshall] [~mjacobs] for thoughts
> If we did this, do we need to do a software grant (essentially what I'm 
> proposing is to fork)? Can we just attribute the original Cloudera authors in 
> LICENSE.txt?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3059) [C++] Streamline namespace array::test

2018-08-16 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3059:
--
Labels: pull-request-available  (was: )

> [C++] Streamline namespace array::test
> --
>
> Key: ARROW-3059
> URL: https://issues.apache.org/jira/browse/ARROW-3059
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> Currently we have some test helpers that live in the {{arrow::test}} 
> namespace, some in {{arrow}} (or topic subnamespaces such as {{arrow::io}}). 
> I see no reason for the discrepancy.
> I propose the simple solution of removing the {{arrow::test}} namespace 
> altogether. If not desirable, then we should make sure we put all helpers in 
> that namespace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3059) [C++] Streamline namespace array::test

2018-08-16 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-3059:
-

Assignee: Antoine Pitrou

> [C++] Streamline namespace array::test
> --
>
> Key: ARROW-3059
> URL: https://issues.apache.org/jira/browse/ARROW-3059
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.10.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>
> Currently we have some test helpers that live in the {{arrow::test}} 
> namespace, some in {{arrow}} (or topic subnamespaces such as {{arrow::io}}). 
> I see no reason for the discrepancy.
> I propose the simple solution of removing the {{arrow::test}} namespace 
> altogether. If not desirable, then we should make sure we put all helpers in 
> that namespace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-3063) [Go] move list of supported/TODO features to confluence

2018-08-16 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-3063:
--

 Summary: [Go] move list of supported/TODO features to confluence
 Key: ARROW-3063
 URL: https://issues.apache.org/jira/browse/ARROW-3063
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Sebastien Binet


as mentioned in https://github.com/apache/arrow/pull/2421#discussion_r210033779 
we should move the list of supported features (and those that still need to be 
implemented) to confluence.

filing this so we don't forget about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)