[jira] [Closed] (ARROW-4521) Improve performance of row proxy object

2019-02-09 Thread Dominik Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-4521.
-
Resolution: Duplicate

> Improve performance of row proxy object
> ---
>
> Key: ARROW-4521
> URL: https://issues.apache.org/jira/browse/ARROW-4521
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> As noted in 
> https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784,
>  there may be some inefficiencies with the row proxy that could be mitigated 
> by defining properties on a prototype object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4523) [JS] Add row proxy generation benchmark

2019-02-09 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4523:


 Summary: [JS] Add row proxy generation benchmark
 Key: ARROW-4523
 URL: https://issues.apache.org/jira/browse/ARROW-4523
 Project: Apache Arrow
  Issue Type: Test
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4524) [JS] only invoke `Object.defineProperty` once per table

2019-02-09 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-4524:


 Summary: [JS] only invoke `Object.defineProperty` once per table
 Key: ARROW-4524
 URL: https://issues.apache.org/jira/browse/ARROW-4524
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 0.4.1


See 
https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4522) [JS] Remove Webpack dependency

2019-02-09 Thread Dominik Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz updated ARROW-4522:
--
Description: 
Webpack is only used in minify-task but I think the same could be done with 
just terser. The API is documented at 
https://github.com/terser-js/terser#api-reference. 

To bundle files, we could switch to the lighter rollup 
(https://github.com/rollup/rollup), which also supports tree shaking. 

  was:Webpack is only used in minify-task but I think the same could be done 
with just terser. The API is documented at 
https://github.com/terser-js/terser#api-reference. 


> [JS] Remove Webpack dependency
> --
>
> Key: ARROW-4522
> URL: https://issues.apache.org/jira/browse/ARROW-4522
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Minor
>
> Webpack is only used in minify-task but I think the same could be done with 
> just terser. The API is documented at 
> https://github.com/terser-js/terser#api-reference. 
> To bundle files, we could switch to the lighter rollup 
> (https://github.com/rollup/rollup), which also supports tree shaking. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4522) [JS] Remove Webpack dependency

2019-02-09 Thread Dominik Moritz (JIRA)
Dominik Moritz created ARROW-4522:
-

 Summary: [JS] Remove Webpack dependency
 Key: ARROW-4522
 URL: https://issues.apache.org/jira/browse/ARROW-4522
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: JavaScript
Reporter: Dominik Moritz


Webpack is only used in minify-task but I think the same could be done with 
just terser. The API is documented at 
https://github.com/terser-js/terser#api-reference. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-4521) Improve performance of row proxy object

2019-02-09 Thread Dominik Moritz (JIRA)
Dominik Moritz created ARROW-4521:
-

 Summary: Improve performance of row proxy object
 Key: ARROW-4521
 URL: https://issues.apache.org/jira/browse/ARROW-4521
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Dominik Moritz


As noted in 
https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784,
 there may be some inefficiencies with the row proxy that could be mitigated by 
defining properties on a prototype object. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-4517) [JS] remove version number as it is not used

2019-02-09 Thread Dominik Moritz (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominik Moritz closed ARROW-4517.
-

Resolved in 
https://github.com/apache/arrow/commit/f7656b6dc8bf69e63476cfe88ed8f20d85a4df7b

> [JS] remove version number as it is not used
> 
>
> Key: ARROW-4517
> URL: https://issues.apache.org/jira/browse/ARROW-4517
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4388) [Go] add DimNames() method to tensor Interface?

2019-02-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4388:
--
Labels: easyfix pull-request-available  (was: easyfix)

> [Go] add DimNames() method to tensor Interface?
> ---
>
> Key: ARROW-4388
> URL: https://issues.apache.org/jira/browse/ARROW-4388
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Go
>Affects Versions: 0.12.0
>Reporter: Randall O'Reilly
>Priority: Major
>  Labels: easyfix, pull-request-available
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> It would be convenient to get access to the entire slice of dimension names 
> via a DimNames() []string method, in addition to the existing DimName(i int) 
> string method.  Here is a patch:
>  
> {{--- a/go/arrow/tensor/tensor.go}}
> {{+++ b/go/arrow/tensor/tensor.go}}
> {{@@ -52,6 +52,9 @@ type Interface interface {}}
> {{ // DimName returns the name of the i-th dimension.}}
> {{ DimName(i int) string}}
> {{+ // DimNames returns the full slice of dimension names.}}
> {{+ DimNames() []string}}
> {{+}}
> {{ DataType() arrow.DataType}}
> {{ Data() *array.Data}}
> {{@@ -102,6 +105,7 @@ func (tb *tensorBase) Shape() []int64 \{ return 
> tb.shape }}}
> {{ func (tb *tensorBase) Strides() []int64 \{ return tb.strides }}}
> {{ func (tb *tensorBase) NumDims() int \{ return len(tb.shape) }}}
> {{ func (tb *tensorBase) DimName(i int) string \{ return tb.names[i] }}}
> {{+func (tb *tensorBase) DimNames() []string \{ return tb.names }}}
> {{ func (tb *tensorBase) DataType() arrow.DataType \{ return tb.dtype }}}
> {{ func (tb *tensorBase) Data() *array.Data \{ return tb.data }}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-585) [C++] Define public API for user-defined data types

2019-02-09 Thread Leif Walsh (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764273#comment-16764273
 ] 

Leif Walsh commented on ARROW-585:
--

Could you describe the planned interface for defining and consuming custom data 
types in the c++ and java APIs? For example, how would one define a new type 
and associate a name with the physical type and custom serializer/deserializer, 
and if any, how would one recognize such a field and dispatch to the 
appropriate serializer/deserializer in client code? 

> [C++] Define public API for user-defined data types
> ---
>
> Key: ARROW-585
> URL: https://issues.apache.org/jira/browse/ARROW-585
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> This will include:
> * Implementing a subclass of DataType
> * A "fallback" mechanism for receivers that do not understand our custom 
> metadata
> * Implementing a serializer interface for custom metadata (to be send and 
> received in an IPC setting)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4124) [C++] Abstract aggregation kernel API

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4124.
-
Resolution: Fixed

Issue resolved by pull request 3407
[https://github.com/apache/arrow/pull/3407]

> [C++] Abstract aggregation kernel API
> -
>
> Key: ARROW-4124
> URL: https://issues.apache.org/jira/browse/ARROW-4124
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Related to the particular details of implementing various aggregation types, 
> we should first put a bit of energy into the abstract API for aggregating 
> data in a multi-threaded setting
> Aggregators must support both hash/group (e.g. "group by" in SQL or data 
> frame libraries) modes and non-group modes. 
> Aggregations ideally should also support filter pushdown. For example:
> {code}
> select $AGG($EXPR)
> from $TABLE
> where $PREDICATE
> {code}
> Some systems might materialize the post-predicate / filtered version of 
> {{$EXPR}}, then aggregate that. pandas does this for example. Vectorized 
> performance can be much improved by filtering inside the aggregation kernel. 
> How the predicate true/false values are handled may depend on the 
> implementation details of the kernel (e.g. SUM or MEAN will be a bit 
> different from PRODUCT)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4024.
-
Resolution: Fixed

Issue resolved by pull request 3590
[https://github.com/apache/arrow/pull/3590]

> [Python] Cython compilation error on cython==0.27.3
> ---
>
> Key: ARROW-4024
> URL: https://issues.apache.org/jira/browse/ARROW-4024
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On the latest master, I'm getting the following error:
> {code:java}
> [ 11%] Compiling Cython CXX source for lib...
> Error compiling Cython file:
> 
> ...
>     out.init(type)
>     return out
> cdef object pyarrow_wrap_metadata(
>     ^
> 
> pyarrow/public-api.pxi:95:5: Function signature does not match previous 
> declaration
> CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' 
> failed{code}
> With 0.29.0 it is working. This might have been introduced in 
> [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b]
>  but I'm not sure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4370) [Python] Table to pandas conversion fails for list of bool

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4370.
-
Resolution: Fixed

Issue resolved by pull request 3593
[https://github.com/apache/arrow/pull/3593]

> [Python] Table to pandas conversion fails for list of bool
> --
>
> Key: ARROW-4370
> URL: https://issues.apache.org/jira/browse/ARROW-4370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
>Reporter: Francisco Sanchez
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When converting a Table with a list of bool to pandas it doesn't work and 
> throws an error. For list of int is working correctly.
> {code:java}
>  In [94]: x1 = [1,2,3,4]
>  In [95]: f1 = pa.field('test', pa.list_(pa.int32()))
>  In [96]: array1 = pa.array([x1], type=f1.type)
>  In [97]: pa.Table.from_arrays([array1], names=['col1']).to_pandas()
>  Out[97]: 
>  col1
>  0 [1, 2, 3, 4]
>  In [98]: x2 = [True, True, False, False]
>  In [99]: f2 = pa.field('test', pa.list_(pa.bool_()))
>  In [100]: array2 = pa.array([x2], type=f2.type)
>  In [101]: pa.Table.from_arrays([array2], names=['col1']).to_pandas()
>  ---
>  ArrowNotImplementedError Traceback (most recent call last)
>   in ()
>  > 1 pa.Table.from_arrays([array2], names=['col1']).to_pandas()
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/array.pxi in 
> pyarrow.lib._PandasConvertible.to_pandas()
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.Table._to_pandas()
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/pandas_compat.py 
> in table_to_blockmanager(options, table, categories, ignore_metadata)
>  630 
>  631 blocks = _table_to_blocks(options, block_table, pa.default_memory_pool(),
>  --> 632 categories)
>  633 
>  634 # Construct the row index
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/pandas_compat.py 
> in _table_to_blocks(options, block_table, memory_pool, categories)
>  803 # Convert an arrow table to Block from the internal pandas API
>  804 result = pa.lib.table_to_blocks(options, block_table, memory_pool,
>  --> 805 categories)
>  806 
>  807 # Defined above
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/table.pxi in 
> pyarrow.lib.table_to_blocks()
>  ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/error.pxi in 
> pyarrow.lib.check_status()
>  ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: 
> bool
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4264) [C++] Document why DCHECKs are used in kernels

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4264.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3588
[https://github.com/apache/arrow/pull/3588]

> [C++] Document why DCHECKs are used in kernels
> --
>
> Key: ARROW-4264
> URL: https://issues.apache.org/jira/browse/ARROW-4264
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> DCHECKs seem to be used where Status::Invalid might be considered more 
> appropriate (so programs don't crash).  See conversation on 
> [https://github.com/apache/arrow/pull/3287/files]
> based on conversation on this Jira and on the CL it seems DCHECKS are in fact 
> desired but we should document appropriate use for them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing

2019-02-09 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764215#comment-16764215
 ] 

Wes McKinney commented on ARROW-4413:
-

Sorry, unfortunately the build instructions that use conda / conda-forge are 
out of date since the compiler migration that occurred on January 15. 

There is at least https://issues.apache.org/jira/browse/ARROW-3096 about fixing

> [Python] pyarrow.hdfs.connect() failing
> ---
>
> Key: ARROW-4413
> URL: https://issues.apache.org/jira/browse/ARROW-4413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: Python 2.7
> Hadoop distribution: Amazon 2.7.3
> Hive 2.1.1 
> Spark 2.1.1
> Tez 0.8.4
> Linux 4.4.35-33.55.amzn1.x86_64
>Reporter: Bradley Grantham
>Priority: Major
> Fix For: 0.13.0
>
>
> Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}.
> This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used 
> the same environment when testing that it still worked on {{v0.11.1}})
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: fs = pa.hdfs.connect()
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 fs = pa.hdfs.connect()
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, 
> port, user, kerb_ticket, driver, extra_conf)
> 205 fs = HadoopFileSystem(host=host, port=port, user=user,
> 206   kerb_ticket=kerb_ticket, driver=driver,
> --> 207   extra_conf=extra_conf)
> 208 return fs
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, 
> host, port, user, kerb_ticket, driver, extra_conf)
>  36 _maybe_set_hadoop_classpath()
>  37 
> ---> 38 self._connect(host, port, user, kerb_ticket, driver, 
> extra_conf)
>  39 
>  40 def __reduce__(self):
> /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in 
> pyarrow.lib.HadoopFileSystem._connect()
>  72 if host is not None:
>  73 conf.host = tobytes(host)
> ---> 74 self.host = host
>  75 
>  76 conf.port = port
> TypeError: Expected unicode, got str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing

2019-02-09 Thread Bradley Grantham (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764219#comment-16764219
 ] 

Bradley Grantham commented on ARROW-4413:
-

Ok cool. I would really like to help so I will give this another stab tomorrow 
and let you know how it goes!

> [Python] pyarrow.hdfs.connect() failing
> ---
>
> Key: ARROW-4413
> URL: https://issues.apache.org/jira/browse/ARROW-4413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: Python 2.7
> Hadoop distribution: Amazon 2.7.3
> Hive 2.1.1 
> Spark 2.1.1
> Tez 0.8.4
> Linux 4.4.35-33.55.amzn1.x86_64
>Reporter: Bradley Grantham
>Priority: Major
> Fix For: 0.13.0
>
>
> Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}.
> This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used 
> the same environment when testing that it still worked on {{v0.11.1}})
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: fs = pa.hdfs.connect()
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 fs = pa.hdfs.connect()
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, 
> port, user, kerb_ticket, driver, extra_conf)
> 205 fs = HadoopFileSystem(host=host, port=port, user=user,
> 206   kerb_ticket=kerb_ticket, driver=driver,
> --> 207   extra_conf=extra_conf)
> 208 return fs
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, 
> host, port, user, kerb_ticket, driver, extra_conf)
>  36 _maybe_set_hadoop_classpath()
>  37 
> ---> 38 self._connect(host, port, user, kerb_ticket, driver, 
> extra_conf)
>  39 
>  40 def __reduce__(self):
> /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in 
> pyarrow.lib.HadoopFileSystem._connect()
>  72 if host is not None:
>  73 conf.host = tobytes(host)
> ---> 74 self.host = host
>  75 
>  76 conf.port = port
> TypeError: Expected unicode, got str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing

2019-02-09 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764216#comment-16764216
 ] 

Wes McKinney commented on ARROW-4413:
-

If you install the {{gcc_linux-64}} and {{gxx_linux-64}} packages before 
activating the conda environment it should work, but we'll get those 
instructions updated hopefully in the coming weeks

> [Python] pyarrow.hdfs.connect() failing
> ---
>
> Key: ARROW-4413
> URL: https://issues.apache.org/jira/browse/ARROW-4413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: Python 2.7
> Hadoop distribution: Amazon 2.7.3
> Hive 2.1.1 
> Spark 2.1.1
> Tez 0.8.4
> Linux 4.4.35-33.55.amzn1.x86_64
>Reporter: Bradley Grantham
>Priority: Major
> Fix For: 0.13.0
>
>
> Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}.
> This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used 
> the same environment when testing that it still worked on {{v0.11.1}})
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: fs = pa.hdfs.connect()
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 fs = pa.hdfs.connect()
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, 
> port, user, kerb_ticket, driver, extra_conf)
> 205 fs = HadoopFileSystem(host=host, port=port, user=user,
> 206   kerb_ticket=kerb_ticket, driver=driver,
> --> 207   extra_conf=extra_conf)
> 208 return fs
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, 
> host, port, user, kerb_ticket, driver, extra_conf)
>  36 _maybe_set_hadoop_classpath()
>  37 
> ---> 38 self._connect(host, port, user, kerb_ticket, driver, 
> extra_conf)
>  39 
>  40 def __reduce__(self):
> /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in 
> pyarrow.lib.HadoopFileSystem._connect()
>  72 if host is not None:
>  73 conf.host = tobytes(host)
> ---> 74 self.host = host
>  75 
>  76 conf.port = port
> TypeError: Expected unicode, got str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4462) [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4462:
---

Assignee: Areg Melik-Adamyan

> [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
> -
>
> Key: ARROW-4462
> URL: https://issues.apache.org/jira/browse/ARROW-4462
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Developer Tools
>Reporter: Areg Melik-Adamyan
>Assignee: Areg Melik-Adamyan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> By upgrading to LZ4 v1.8.3 the patch and patching step can be removed as it 
> is incorporated into a newer version of VS2010 solution and also VS2017 
> solution is provided which ease the usage with newer versions. Is there a 
> reason or fixed dependency on v1.7.5?
> There is still an issue with newer than v8.1 MS Build Tools, and requires 
> manual retargeting. Which can be fixed in CMake by introducing complex logic 
> of reading registry tree 
> HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSBuild\ToolsVersions\4.0\1*.0 and 
> analyzing which version of tools are installed and then patching the solution 
> and projects. But as this is an external dependency better to submit patch 
> there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4517) [JS] remove version number as it is not used

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-4517:
---

Assignee: Dominik Moritz

> [JS] remove version number as it is not used
> 
>
> Key: ARROW-4517
> URL: https://issues.apache.org/jira/browse/ARROW-4517
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Assignee: Dominik Moritz
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4462) [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4462.
-
Resolution: Fixed

Issue resolved by pull request 3585
[https://github.com/apache/arrow/pull/3585]

> [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
> -
>
> Key: ARROW-4462
> URL: https://issues.apache.org/jira/browse/ARROW-4462
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Developer Tools
>Reporter: Areg Melik-Adamyan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> By upgrading to LZ4 v1.8.3 the patch and patching step can be removed as it 
> is incorporated into a newer version of VS2010 solution and also VS2017 
> solution is provided which ease the usage with newer versions. Is there a 
> reason or fixed dependency on v1.7.5?
> There is still an issue with newer than v8.1 MS Build Tools, and requires 
> manual retargeting. Which can be fixed in CMake by introducing complex logic 
> of reading registry tree 
> HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSBuild\ToolsVersions\4.0\1*.0 and 
> analyzing which version of tools are installed and then patching the solution 
> and projects. But as this is an external dependency better to submit patch 
> there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4517) [JS] remove version number as it is not used

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4517.
-
   Resolution: Fixed
Fix Version/s: 0.13.0

Issue resolved by pull request 3595
[https://github.com/apache/arrow/pull/3595]

> [JS] remove version number as it is not used
> 
>
> Key: ARROW-4517
> URL: https://issues.apache.org/jira/browse/ARROW-4517
> Project: Apache Arrow
>  Issue Type: Task
>  Components: JavaScript
>Reporter: Dominik Moritz
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4014) [C++] Fix "LIBCMT" warnings on MSVC

2019-02-09 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4014.
-
Resolution: Fixed

Resolved in 
https://github.com/apache/arrow/commit/2b9155a3f0ab5c8b15986aa683dcdcbeeec967ff

> [C++] Fix "LIBCMT" warnings on MSVC
> ---
>
> Key: ARROW-4014
> URL: https://issues.apache.org/jira/browse/ARROW-4014
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.13.0
>
>
> These warnings have been present for a while, and they are a nuisance
> {code}
> "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) ->
> "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) ->
> "C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow-allocator-test.vcxproj" 
> (default target) (4) ->
> "C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow_shared.vcxproj" (default 
> target) (5) ->
> (Link target) ->
>   LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other 
> libs; use /NODEFAULTLIB:library 
> [C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow_shared.vcxproj]
> "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) ->
> "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) ->
> "C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-file-deserialize-test.vcxproj"
>  (default target) (70) ->
>   MSVCRT.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-file-deserialize-test.vcxproj]
> "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) ->
> "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) ->
> "C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-schema-test.vcxproj" 
> (default target) (78) ->
>   MSVCRT.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' 
> conflicts with use of other libs; use /NODEFAULTLIB:library 
> [C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-schema-test.vcxproj]
> 3 Warning(s)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4520) [C++] DCHECK custom messages are unreachable in release

2019-02-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-4520:
--
Labels: pull-request-available  (was: )

> [C++] DCHECK custom messages are unreachable in release
> ---
>
> Key: ARROW-4520
> URL: https://issues.apache.org/jira/browse/ARROW-4520
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
>
> For release builds, {{DCHECK( x ) << y << z;}} currently expands to
> {code}
> ((void)(x)); 
> while (false) ::arrow::util::ArrowLogBase() << y << z;
> {code}
> This is unreachable which is an error using clang-7



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing

2019-02-09 Thread Bradley Grantham (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764166#comment-16764166
 ] 

Bradley Grantham commented on ARROW-4413:
-

[~wesmckinn] Unfortunately I can't build the package. I've spent all day trying 
to get it working (I also spent some time last weekend). I can build it on my 
Mac, but can't get hdfs to work with it so that's obviously useless for this 
problem. And on Amazon EMR, which is where I encountered the problem 
originally, I can't build the package at all. I keep getting an error when 
running
{code:java}
make -j4
{code}
...

 
{code:java}
/usr/bin/ld: 
/home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o):
 unrecognized relocation (0x29) in section `.text'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [release/arrow-file-to-stream] Error 1
make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-file-to-stream.dir/all] Error 2
/usr/bin/ld: 
/home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o):
 unrecognized relocation (0x29) in section `.text'
/usr/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [release/arrow-stream-to-file] Error 1
make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-stream-to-file.dir/all] Error 2
make: *** [all] Error 2
{code}
Sorry!

 

> [Python] pyarrow.hdfs.connect() failing
> ---
>
> Key: ARROW-4413
> URL: https://issues.apache.org/jira/browse/ARROW-4413
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.12.0
> Environment: Python 2.7
> Hadoop distribution: Amazon 2.7.3
> Hive 2.1.1 
> Spark 2.1.1
> Tez 0.8.4
> Linux 4.4.35-33.55.amzn1.x86_64
>Reporter: Bradley Grantham
>Priority: Major
> Fix For: 0.13.0
>
>
> Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}.
> This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used 
> the same environment when testing that it still worked on {{v0.11.1}})
>  
> {code:java}
> In [1]: import pyarrow as pa
> In [2]: fs = pa.hdfs.connect()
> ---
> TypeError Traceback (most recent call last)
>  in ()
> > 1 fs = pa.hdfs.connect()
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, 
> port, user, kerb_ticket, driver, extra_conf)
> 205 fs = HadoopFileSystem(host=host, port=port, user=user,
> 206   kerb_ticket=kerb_ticket, driver=driver,
> --> 207   extra_conf=extra_conf)
> 208 return fs
> /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, 
> host, port, user, kerb_ticket, driver, extra_conf)
>  36 _maybe_set_hadoop_classpath()
>  37 
> ---> 38 self._connect(host, port, user, kerb_ticket, driver, 
> extra_conf)
>  39 
>  40 def __reduce__(self):
> /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in 
> pyarrow.lib.HadoopFileSystem._connect()
>  72 if host is not None:
>  73 conf.host = tobytes(host)
> ---> 74 self.host = host
>  75 
>  76 conf.port = port
> TypeError: Expected unicode, got str
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)