[jira] [Closed] (ARROW-4521) Improve performance of row proxy object
[ https://issues.apache.org/jira/browse/ARROW-4521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Moritz closed ARROW-4521. - Resolution: Duplicate > Improve performance of row proxy object > --- > > Key: ARROW-4521 > URL: https://issues.apache.org/jira/browse/ARROW-4521 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Dominik Moritz >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > As noted in > https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784, > there may be some inefficiencies with the row proxy that could be mitigated > by defining properties on a prototype object. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4523) [JS] Add row proxy generation benchmark
Brian Hulette created ARROW-4523: Summary: [JS] Add row proxy generation benchmark Key: ARROW-4523 URL: https://issues.apache.org/jira/browse/ARROW-4523 Project: Apache Arrow Issue Type: Test Components: JavaScript Reporter: Brian Hulette Assignee: Brian Hulette -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4524) [JS] only invoke `Object.defineProperty` once per table
Brian Hulette created ARROW-4524: Summary: [JS] only invoke `Object.defineProperty` once per table Key: ARROW-4524 URL: https://issues.apache.org/jira/browse/ARROW-4524 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Brian Hulette Assignee: Brian Hulette Fix For: 0.4.1 See https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4522) [JS] Remove Webpack dependency
[ https://issues.apache.org/jira/browse/ARROW-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Moritz updated ARROW-4522: -- Description: Webpack is only used in minify-task but I think the same could be done with just terser. The API is documented at https://github.com/terser-js/terser#api-reference. To bundle files, we could switch to the lighter rollup (https://github.com/rollup/rollup), which also supports tree shaking. was:Webpack is only used in minify-task but I think the same could be done with just terser. The API is documented at https://github.com/terser-js/terser#api-reference. > [JS] Remove Webpack dependency > -- > > Key: ARROW-4522 > URL: https://issues.apache.org/jira/browse/ARROW-4522 > Project: Apache Arrow > Issue Type: Sub-task > Components: JavaScript >Reporter: Dominik Moritz >Priority: Minor > > Webpack is only used in minify-task but I think the same could be done with > just terser. The API is documented at > https://github.com/terser-js/terser#api-reference. > To bundle files, we could switch to the lighter rollup > (https://github.com/rollup/rollup), which also supports tree shaking. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4522) [JS] Remove Webpack dependency
Dominik Moritz created ARROW-4522: - Summary: [JS] Remove Webpack dependency Key: ARROW-4522 URL: https://issues.apache.org/jira/browse/ARROW-4522 Project: Apache Arrow Issue Type: Sub-task Components: JavaScript Reporter: Dominik Moritz Webpack is only used in minify-task but I think the same could be done with just terser. The API is documented at https://github.com/terser-js/terser#api-reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4521) Improve performance of row proxy object
Dominik Moritz created ARROW-4521: - Summary: Improve performance of row proxy object Key: ARROW-4521 URL: https://issues.apache.org/jira/browse/ARROW-4521 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Dominik Moritz As noted in https://github.com/vega/vega-loader-arrow/commit/19c88e130aaeeae9d0166360db467121e5724352#r32253784, there may be some inefficiencies with the row proxy that could be mitigated by defining properties on a prototype object. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-4517) [JS] remove version number as it is not used
[ https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominik Moritz closed ARROW-4517. - Resolved in https://github.com/apache/arrow/commit/f7656b6dc8bf69e63476cfe88ed8f20d85a4df7b > [JS] remove version number as it is not used > > > Key: ARROW-4517 > URL: https://issues.apache.org/jira/browse/ARROW-4517 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4388) [Go] add DimNames() method to tensor Interface?
[ https://issues.apache.org/jira/browse/ARROW-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4388: -- Labels: easyfix pull-request-available (was: easyfix) > [Go] add DimNames() method to tensor Interface? > --- > > Key: ARROW-4388 > URL: https://issues.apache.org/jira/browse/ARROW-4388 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Affects Versions: 0.12.0 >Reporter: Randall O'Reilly >Priority: Major > Labels: easyfix, pull-request-available > Original Estimate: 5m > Remaining Estimate: 5m > > It would be convenient to get access to the entire slice of dimension names > via a DimNames() []string method, in addition to the existing DimName(i int) > string method. Here is a patch: > > {{--- a/go/arrow/tensor/tensor.go}} > {{+++ b/go/arrow/tensor/tensor.go}} > {{@@ -52,6 +52,9 @@ type Interface interface {}} > {{ // DimName returns the name of the i-th dimension.}} > {{ DimName(i int) string}} > {{+ // DimNames returns the full slice of dimension names.}} > {{+ DimNames() []string}} > {{+}} > {{ DataType() arrow.DataType}} > {{ Data() *array.Data}} > {{@@ -102,6 +105,7 @@ func (tb *tensorBase) Shape() []int64 \{ return > tb.shape }}} > {{ func (tb *tensorBase) Strides() []int64 \{ return tb.strides }}} > {{ func (tb *tensorBase) NumDims() int \{ return len(tb.shape) }}} > {{ func (tb *tensorBase) DimName(i int) string \{ return tb.names[i] }}} > {{+func (tb *tensorBase) DimNames() []string \{ return tb.names }}} > {{ func (tb *tensorBase) DataType() arrow.DataType \{ return tb.dtype }}} > {{ func (tb *tensorBase) Data() *array.Data \{ return tb.data }}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-585) [C++] Define public API for user-defined data types
[ https://issues.apache.org/jira/browse/ARROW-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764273#comment-16764273 ] Leif Walsh commented on ARROW-585: -- Could you describe the planned interface for defining and consuming custom data types in the c++ and java APIs? For example, how would one define a new type and associate a name with the physical type and custom serializer/deserializer, and if any, how would one recognize such a field and dispatch to the appropriate serializer/deserializer in client code? > [C++] Define public API for user-defined data types > --- > > Key: ARROW-585 > URL: https://issues.apache.org/jira/browse/ARROW-585 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > This will include: > * Implementing a subclass of DataType > * A "fallback" mechanism for receivers that do not understand our custom > metadata > * Implementing a serializer interface for custom metadata (to be send and > received in an IPC setting) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4124) [C++] Abstract aggregation kernel API
[ https://issues.apache.org/jira/browse/ARROW-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4124. - Resolution: Fixed Issue resolved by pull request 3407 [https://github.com/apache/arrow/pull/3407] > [C++] Abstract aggregation kernel API > - > > Key: ARROW-4124 > URL: https://issues.apache.org/jira/browse/ARROW-4124 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10.5h > Remaining Estimate: 0h > > Related to the particular details of implementing various aggregation types, > we should first put a bit of energy into the abstract API for aggregating > data in a multi-threaded setting > Aggregators must support both hash/group (e.g. "group by" in SQL or data > frame libraries) modes and non-group modes. > Aggregations ideally should also support filter pushdown. For example: > {code} > select $AGG($EXPR) > from $TABLE > where $PREDICATE > {code} > Some systems might materialize the post-predicate / filtered version of > {{$EXPR}}, then aggregate that. pandas does this for example. Vectorized > performance can be much improved by filtering inside the aggregation kernel. > How the predicate true/false values are handled may depend on the > implementation details of the kernel (e.g. SUM or MEAN will be a bit > different from PRODUCT) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4024) [Python] Cython compilation error on cython==0.27.3
[ https://issues.apache.org/jira/browse/ARROW-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4024. - Resolution: Fixed Issue resolved by pull request 3590 [https://github.com/apache/arrow/pull/3590] > [Python] Cython compilation error on cython==0.27.3 > --- > > Key: ARROW-4024 > URL: https://issues.apache.org/jira/browse/ARROW-4024 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 20m > Remaining Estimate: 0h > > On the latest master, I'm getting the following error: > {code:java} > [ 11%] Compiling Cython CXX source for lib... > Error compiling Cython file: > > ... > out.init(type) > return out > cdef object pyarrow_wrap_metadata( > ^ > > pyarrow/public-api.pxi:95:5: Function signature does not match previous > declaration > CMakeFiles/lib_pyx.dir/build.make:57: recipe for target 'CMakeFiles/lib_pyx' > failed{code} > With 0.29.0 it is working. This might have been introduced in > [https://github.com/apache/arrow/commit/12201841212967c78e31b2d2840b55b1707c4e7b] > but I'm not sure. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4370) [Python] Table to pandas conversion fails for list of bool
[ https://issues.apache.org/jira/browse/ARROW-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4370. - Resolution: Fixed Issue resolved by pull request 3593 [https://github.com/apache/arrow/pull/3593] > [Python] Table to pandas conversion fails for list of bool > -- > > Key: ARROW-4370 > URL: https://issues.apache.org/jira/browse/ARROW-4370 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 >Reporter: Francisco Sanchez >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When converting a Table with a list of bool to pandas it doesn't work and > throws an error. For list of int is working correctly. > {code:java} > In [94]: x1 = [1,2,3,4] > In [95]: f1 = pa.field('test', pa.list_(pa.int32())) > In [96]: array1 = pa.array([x1], type=f1.type) > In [97]: pa.Table.from_arrays([array1], names=['col1']).to_pandas() > Out[97]: > col1 > 0 [1, 2, 3, 4] > In [98]: x2 = [True, True, False, False] > In [99]: f2 = pa.field('test', pa.list_(pa.bool_())) > In [100]: array2 = pa.array([x2], type=f2.type) > In [101]: pa.Table.from_arrays([array2], names=['col1']).to_pandas() > --- > ArrowNotImplementedError Traceback (most recent call last) > in () > > 1 pa.Table.from_arrays([array2], names=['col1']).to_pandas() > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/array.pxi in > pyarrow.lib._PandasConvertible.to_pandas() > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/table.pxi in > pyarrow.lib.Table._to_pandas() > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in table_to_blockmanager(options, table, categories, ignore_metadata) > 630 > 631 blocks = _table_to_blocks(options, block_table, pa.default_memory_pool(), > --> 632 categories) > 633 > 634 # Construct the row index > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/pandas_compat.py > in _table_to_blocks(options, block_table, memory_pool, categories) > 803 # Convert an arrow table to Block from the internal pandas API > 804 result = pa.lib.table_to_blocks(options, block_table, memory_pool, > --> 805 categories) > 806 > 807 # Defined above > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/table.pxi in > pyarrow.lib.table_to_blocks() > ~/.virtualenvs/py3-gpu/lib/python3.5/site-packages/pyarrow/error.pxi in > pyarrow.lib.check_status() > ArrowNotImplementedError: Not implemented type for list in DataFrameBlock: > bool > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4264) [C++] Document why DCHECKs are used in kernels
[ https://issues.apache.org/jira/browse/ARROW-4264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4264. - Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3588 [https://github.com/apache/arrow/pull/3588] > [C++] Document why DCHECKs are used in kernels > -- > > Key: ARROW-4264 > URL: https://issues.apache.org/jira/browse/ARROW-4264 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h > Remaining Estimate: 0h > > DCHECKs seem to be used where Status::Invalid might be considered more > appropriate (so programs don't crash). See conversation on > [https://github.com/apache/arrow/pull/3287/files] > based on conversation on this Jira and on the CL it seems DCHECKS are in fact > desired but we should document appropriate use for them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing
[ https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764215#comment-16764215 ] Wes McKinney commented on ARROW-4413: - Sorry, unfortunately the build instructions that use conda / conda-forge are out of date since the compiler migration that occurred on January 15. There is at least https://issues.apache.org/jira/browse/ARROW-3096 about fixing > [Python] pyarrow.hdfs.connect() failing > --- > > Key: ARROW-4413 > URL: https://issues.apache.org/jira/browse/ARROW-4413 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 > Environment: Python 2.7 > Hadoop distribution: Amazon 2.7.3 > Hive 2.1.1 > Spark 2.1.1 > Tez 0.8.4 > Linux 4.4.35-33.55.amzn1.x86_64 >Reporter: Bradley Grantham >Priority: Major > Fix For: 0.13.0 > > > Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}. > This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used > the same environment when testing that it still worked on {{v0.11.1}}) > > {code:java} > In [1]: import pyarrow as pa > In [2]: fs = pa.hdfs.connect() > --- > TypeError Traceback (most recent call last) > in () > > 1 fs = pa.hdfs.connect() > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, > port, user, kerb_ticket, driver, extra_conf) > 205 fs = HadoopFileSystem(host=host, port=port, user=user, > 206 kerb_ticket=kerb_ticket, driver=driver, > --> 207 extra_conf=extra_conf) > 208 return fs > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, > host, port, user, kerb_ticket, driver, extra_conf) > 36 _maybe_set_hadoop_classpath() > 37 > ---> 38 self._connect(host, port, user, kerb_ticket, driver, > extra_conf) > 39 > 40 def __reduce__(self): > /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in > pyarrow.lib.HadoopFileSystem._connect() > 72 if host is not None: > 73 conf.host = tobytes(host) > ---> 74 self.host = host > 75 > 76 conf.port = port > TypeError: Expected unicode, got str > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing
[ https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764219#comment-16764219 ] Bradley Grantham commented on ARROW-4413: - Ok cool. I would really like to help so I will give this another stab tomorrow and let you know how it goes! > [Python] pyarrow.hdfs.connect() failing > --- > > Key: ARROW-4413 > URL: https://issues.apache.org/jira/browse/ARROW-4413 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 > Environment: Python 2.7 > Hadoop distribution: Amazon 2.7.3 > Hive 2.1.1 > Spark 2.1.1 > Tez 0.8.4 > Linux 4.4.35-33.55.amzn1.x86_64 >Reporter: Bradley Grantham >Priority: Major > Fix For: 0.13.0 > > > Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}. > This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used > the same environment when testing that it still worked on {{v0.11.1}}) > > {code:java} > In [1]: import pyarrow as pa > In [2]: fs = pa.hdfs.connect() > --- > TypeError Traceback (most recent call last) > in () > > 1 fs = pa.hdfs.connect() > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, > port, user, kerb_ticket, driver, extra_conf) > 205 fs = HadoopFileSystem(host=host, port=port, user=user, > 206 kerb_ticket=kerb_ticket, driver=driver, > --> 207 extra_conf=extra_conf) > 208 return fs > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, > host, port, user, kerb_ticket, driver, extra_conf) > 36 _maybe_set_hadoop_classpath() > 37 > ---> 38 self._connect(host, port, user, kerb_ticket, driver, > extra_conf) > 39 > 40 def __reduce__(self): > /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in > pyarrow.lib.HadoopFileSystem._connect() > 72 if host is not None: > 73 conf.host = tobytes(host) > ---> 74 self.host = host > 75 > 76 conf.port = port > TypeError: Expected unicode, got str > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing
[ https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764216#comment-16764216 ] Wes McKinney commented on ARROW-4413: - If you install the {{gcc_linux-64}} and {{gxx_linux-64}} packages before activating the conda environment it should work, but we'll get those instructions updated hopefully in the coming weeks > [Python] pyarrow.hdfs.connect() failing > --- > > Key: ARROW-4413 > URL: https://issues.apache.org/jira/browse/ARROW-4413 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 > Environment: Python 2.7 > Hadoop distribution: Amazon 2.7.3 > Hive 2.1.1 > Spark 2.1.1 > Tez 0.8.4 > Linux 4.4.35-33.55.amzn1.x86_64 >Reporter: Bradley Grantham >Priority: Major > Fix For: 0.13.0 > > > Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}. > This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used > the same environment when testing that it still worked on {{v0.11.1}}) > > {code:java} > In [1]: import pyarrow as pa > In [2]: fs = pa.hdfs.connect() > --- > TypeError Traceback (most recent call last) > in () > > 1 fs = pa.hdfs.connect() > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, > port, user, kerb_ticket, driver, extra_conf) > 205 fs = HadoopFileSystem(host=host, port=port, user=user, > 206 kerb_ticket=kerb_ticket, driver=driver, > --> 207 extra_conf=extra_conf) > 208 return fs > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, > host, port, user, kerb_ticket, driver, extra_conf) > 36 _maybe_set_hadoop_classpath() > 37 > ---> 38 self._connect(host, port, user, kerb_ticket, driver, > extra_conf) > 39 > 40 def __reduce__(self): > /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in > pyarrow.lib.HadoopFileSystem._connect() > 72 if host is not None: > 73 conf.host = tobytes(host) > ---> 74 self.host = host > 75 > 76 conf.port = port > TypeError: Expected unicode, got str > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4462) [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
[ https://issues.apache.org/jira/browse/ARROW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4462: --- Assignee: Areg Melik-Adamyan > [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017 > - > > Key: ARROW-4462 > URL: https://issues.apache.org/jira/browse/ARROW-4462 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Developer Tools >Reporter: Areg Melik-Adamyan >Assignee: Areg Melik-Adamyan >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 1h > Remaining Estimate: 0h > > By upgrading to LZ4 v1.8.3 the patch and patching step can be removed as it > is incorporated into a newer version of VS2010 solution and also VS2017 > solution is provided which ease the usage with newer versions. Is there a > reason or fixed dependency on v1.7.5? > There is still an issue with newer than v8.1 MS Build Tools, and requires > manual retargeting. Which can be fixed in CMake by introducing complex logic > of reading registry tree > HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSBuild\ToolsVersions\4.0\1*.0 and > analyzing which version of tools are installed and then patching the solution > and projects. But as this is an external dependency better to submit patch > there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4517) [JS] remove version number as it is not used
[ https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-4517: --- Assignee: Dominik Moritz > [JS] remove version number as it is not used > > > Key: ARROW-4517 > URL: https://issues.apache.org/jira/browse/ARROW-4517 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Assignee: Dominik Moritz >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4462) [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
[ https://issues.apache.org/jira/browse/ARROW-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4462. - Resolution: Fixed Issue resolved by pull request 3585 [https://github.com/apache/arrow/pull/3585] > [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017 > - > > Key: ARROW-4462 > URL: https://issues.apache.org/jira/browse/ARROW-4462 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Developer Tools >Reporter: Areg Melik-Adamyan >Priority: Minor > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 50m > Remaining Estimate: 0h > > By upgrading to LZ4 v1.8.3 the patch and patching step can be removed as it > is incorporated into a newer version of VS2010 solution and also VS2017 > solution is provided which ease the usage with newer versions. Is there a > reason or fixed dependency on v1.7.5? > There is still an issue with newer than v8.1 MS Build Tools, and requires > manual retargeting. Which can be fixed in CMake by introducing complex logic > of reading registry tree > HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSBuild\ToolsVersions\4.0\1*.0 and > analyzing which version of tools are installed and then patching the solution > and projects. But as this is an external dependency better to submit patch > there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4517) [JS] remove version number as it is not used
[ https://issues.apache.org/jira/browse/ARROW-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4517. - Resolution: Fixed Fix Version/s: 0.13.0 Issue resolved by pull request 3595 [https://github.com/apache/arrow/pull/3595] > [JS] remove version number as it is not used > > > Key: ARROW-4517 > URL: https://issues.apache.org/jira/browse/ARROW-4517 > Project: Apache Arrow > Issue Type: Task > Components: JavaScript >Reporter: Dominik Moritz >Priority: Trivial > Labels: pull-request-available > Fix For: 0.13.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4014) [C++] Fix "LIBCMT" warnings on MSVC
[ https://issues.apache.org/jira/browse/ARROW-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4014. - Resolution: Fixed Resolved in https://github.com/apache/arrow/commit/2b9155a3f0ab5c8b15986aa683dcdcbeeec967ff > [C++] Fix "LIBCMT" warnings on MSVC > --- > > Key: ARROW-4014 > URL: https://issues.apache.org/jira/browse/ARROW-4014 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > These warnings have been present for a while, and they are a nuisance > {code} > "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) -> > "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) -> > "C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow-allocator-test.vcxproj" > (default target) (4) -> > "C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow_shared.vcxproj" (default > target) (5) -> > (Link target) -> > LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other > libs; use /NODEFAULTLIB:library > [C:\Users\wesm\code\arrow\cpp\build\src\arrow\arrow_shared.vcxproj] > "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) -> > "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) -> > "C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-file-deserialize-test.vcxproj" > (default target) (70) -> > MSVCRT.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-file-deserialize-test.vcxproj] > "C:\Users\wesm\code\arrow\cpp\build\INSTALL.vcxproj" (default target) (1) -> > "C:\Users\wesm\code\arrow\cpp\build\ALL_BUILD.vcxproj" (default target) (3) -> > "C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-schema-test.vcxproj" > (default target) (78) -> > MSVCRT.lib(initializers.obj) : warning LNK4098: defaultlib 'libcmt.lib' > conflicts with use of other libs; use /NODEFAULTLIB:library > [C:\Users\wesm\code\arrow\cpp\build\src\parquet\parquet-schema-test.vcxproj] > 3 Warning(s) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4520) [C++] DCHECK custom messages are unreachable in release
[ https://issues.apache.org/jira/browse/ARROW-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4520: -- Labels: pull-request-available (was: ) > [C++] DCHECK custom messages are unreachable in release > --- > > Key: ARROW-4520 > URL: https://issues.apache.org/jira/browse/ARROW-4520 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Labels: pull-request-available > > For release builds, {{DCHECK( x ) << y << z;}} currently expands to > {code} > ((void)(x)); > while (false) ::arrow::util::ArrowLogBase() << y << z; > {code} > This is unreachable which is an error using clang-7 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4413) [Python] pyarrow.hdfs.connect() failing
[ https://issues.apache.org/jira/browse/ARROW-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764166#comment-16764166 ] Bradley Grantham commented on ARROW-4413: - [~wesmckinn] Unfortunately I can't build the package. I've spent all day trying to get it working (I also spent some time last weekend). I can build it on my Mac, but can't get hdfs to work with it so that's obviously useless for this problem. And on Amazon EMR, which is where I encountered the problem originally, I can't build the package at all. I keep getting an error when running {code:java} make -j4 {code} ... {code:java} /usr/bin/ld: /home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o): unrecognized relocation (0x29) in section `.text' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [release/arrow-file-to-stream] Error 1 make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-file-to-stream.dir/all] Error 2 /usr/bin/ld: /home/hadoop/miniconda3/envs/pyarrow-dev/lib/libglog.a(libglog_la-signalhandler.o): unrecognized relocation (0x29) in section `.text' /usr/bin/ld: final link failed: Bad value collect2: error: ld returned 1 exit status make[2]: *** [release/arrow-stream-to-file] Error 1 make[1]: *** [src/arrow/ipc/CMakeFiles/arrow-stream-to-file.dir/all] Error 2 make: *** [all] Error 2 {code} Sorry! > [Python] pyarrow.hdfs.connect() failing > --- > > Key: ARROW-4413 > URL: https://issues.apache.org/jira/browse/ARROW-4413 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.12.0 > Environment: Python 2.7 > Hadoop distribution: Amazon 2.7.3 > Hive 2.1.1 > Spark 2.1.1 > Tez 0.8.4 > Linux 4.4.35-33.55.amzn1.x86_64 >Reporter: Bradley Grantham >Priority: Major > Fix For: 0.13.0 > > > Trying to connect to hdfs using the below snippet. Using {{hadoop-libhdfs}}. > This error appears in {{v0.12.0}}. It doesn't appear in {{v0.11.1}}. (I used > the same environment when testing that it still worked on {{v0.11.1}}) > > {code:java} > In [1]: import pyarrow as pa > In [2]: fs = pa.hdfs.connect() > --- > TypeError Traceback (most recent call last) > in () > > 1 fs = pa.hdfs.connect() > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in connect(host, > port, user, kerb_ticket, driver, extra_conf) > 205 fs = HadoopFileSystem(host=host, port=port, user=user, > 206 kerb_ticket=kerb_ticket, driver=driver, > --> 207 extra_conf=extra_conf) > 208 return fs > /usr/local/lib64/python2.7/site-packages/pyarrow/hdfs.pyc in __init__(self, > host, port, user, kerb_ticket, driver, extra_conf) > 36 _maybe_set_hadoop_classpath() > 37 > ---> 38 self._connect(host, port, user, kerb_ticket, driver, > extra_conf) > 39 > 40 def __reduce__(self): > /usr/local/lib64/python2.7/site-packages/pyarrow/io-hdfs.pxi in > pyarrow.lib.HadoopFileSystem._connect() > 72 if host is not None: > 73 conf.host = tobytes(host) > ---> 74 self.host = host > 75 > 76 conf.port = port > TypeError: Expected unicode, got str > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)