[jira] [Updated] (ARROW-3246) [Python][Parquet] direct reading/writing of pandas categoricals in parquet
[ https://issues.apache.org/jira/browse/ARROW-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3246: Summary: [Python][Parquet] direct reading/writing of pandas categoricals in parquet (was: [Python] direct reading/writing of pandas categoricals in parquet) > [Python][Parquet] direct reading/writing of pandas categoricals in parquet > -- > > Key: ARROW-3246 > URL: https://issues.apache.org/jira/browse/ARROW-3246 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Martin Durant >Priority: Minor > Labels: parquet > Fix For: 0.14.0 > > > Parquet supports "dictionary encoding" of column data in a manner very > similar to the concept of Categoricals in pandas. It is natural to use this > encoding for a column which originated as a categorical. Conversely, when > loading, if the file metadata says that a given column came from a pandas (or > arrow) categorical, then we can trust that the whole of the column is > dictionary-encoded and load the data directly into a categorical column, > rather than expanding the labels upon load and recategorising later. > If the data does not have the pandas metadata, then the guarantee cannot > hold, and we cannot assume either that the whole column is dictionary encoded > or that the labels are the same throughout. In this case, the current > behaviour is fine. > > (please forgive that some of this has already been mentioned elsewhere; this > is one of the entries in the list at > [https://github.com/dask/fastparquet/issues/374] as a feature that is useful > in fastparquet) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-3066) [Wiki] Add "How to contribute" to developer wiki
[ https://issues.apache.org/jira/browse/ARROW-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-3066. --- Resolution: Fixed Assignee: Wes McKinney Fix Version/s: (was: 0.14.0) 0.13.0 This is now part of the main documentation site. I updated https://cwiki.apache.org/confluence/display/ARROW/Contributing+to+Apache+Arrow > [Wiki] Add "How to contribute" to developer wiki > > > Key: ARROW-3066 > URL: https://issues.apache.org/jira/browse/ARROW-3066 > Project: Apache Arrow > Issue Type: Improvement > Components: Wiki >Reporter: okkez >Assignee: Wes McKinney >Priority: Major > Fix For: 0.13.0 > > > The [website|https://arrow.apache.org/] describes: > > Interested in contributing? Join the mailing list or check out the > > developer wiki. > But I could not find "How to contribute" on [the > Wiki|https://cwiki.apache.org/confluence/display/ARROW]. > Though I can find it in the repository: > * https://github.com/apache/arrow#how-to-contribute > * > https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md#how-to-contribute-patches > We can add the contents to find "How to contribute" more easily. > Or, we can unify duplicated contents to [the > Wiki|https://cwiki.apache.org/confluence/display/ARROW]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3080) [Python] Unify Arrow to Python object conversion paths
[ https://issues.apache.org/jira/browse/ARROW-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3080: Fix Version/s: (was: 0.14.0) 0.15.0 > [Python] Unify Arrow to Python object conversion paths > -- > > Key: ARROW-3080 > URL: https://issues.apache.org/jira/browse/ARROW-3080 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > Similar to ARROW-2814, we have inconsistent support for converting Arrow > nested types back to object sequences. For example, a list of structs fails > when calling {{to_pandas}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3066) [Wiki] Add "How to contribute" to developer wiki
[ https://issues.apache.org/jira/browse/ARROW-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3066: Fix Version/s: 0.14.0 > [Wiki] Add "How to contribute" to developer wiki > > > Key: ARROW-3066 > URL: https://issues.apache.org/jira/browse/ARROW-3066 > Project: Apache Arrow > Issue Type: Improvement > Components: Wiki >Reporter: okkez >Priority: Major > Fix For: 0.14.0 > > > The [website|https://arrow.apache.org/] describes: > > Interested in contributing? Join the mailing list or check out the > > developer wiki. > But I could not find "How to contribute" on [the > Wiki|https://cwiki.apache.org/confluence/display/ARROW]. > Though I can find it in the repository: > * https://github.com/apache/arrow#how-to-contribute > * > https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md#how-to-contribute-patches > We can add the contents to find "How to contribute" more easily. > Or, we can unify duplicated contents to [the > Wiki|https://cwiki.apache.org/confluence/display/ARROW]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3052) [C++] Detect ORC system packages
[ https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806012#comment-16806012 ] Wes McKinney commented on ARROW-3052: - I just repurposed this issue to be about fixing ORC to pull from the system (or conda) toolchain since that's the only thing left > [C++] Detect ORC system packages > > > Key: ARROW-3052 > URL: https://issues.apache.org/jira/browse/ARROW-3052 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > See > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L155. > After the CMake refactor it is possible to use built ORC packages with > {{$ORC_HOME}} but not detected like the other toolchain dependencies -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3052) [C++] Detect ORC system packages
[ https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3052: Fix Version/s: 0.14.0 > [C++] Detect ORC system packages > > > Key: ARROW-3052 > URL: https://issues.apache.org/jira/browse/ARROW-3052 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > See > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L155. > After the CMake refactor it is possible to use built ORC packages with > {{$ORC_HOME}} but not detected like the other toolchain dependencies -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3052) [C++] Detect ORC system packages
[ https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3052: Summary: [C++] Detect ORC system packages (was: [C++] Support ORC, GRPC, Thrift, and Protobuf when using $ARROW_BUILD_TOOLCHAIN) > [C++] Detect ORC system packages > > > Key: ARROW-3052 > URL: https://issues.apache.org/jira/browse/ARROW-3052 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > It would be good to support these additional toolchain components without > having to set extra environment variables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3052) [C++] Detect ORC system packages
[ https://issues.apache.org/jira/browse/ARROW-3052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3052: Description: See https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L155. After the CMake refactor it is possible to use built ORC packages with {{$ORC_HOME}} but not detected like the other toolchain dependencies (was: It would be good to support these additional toolchain components without having to set extra environment variables) > [C++] Detect ORC system packages > > > Key: ARROW-3052 > URL: https://issues.apache.org/jira/browse/ARROW-3052 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > See > https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L155. > After the CMake refactor it is possible to use built ORC packages with > {{$ORC_HOME}} but not detected like the other toolchain dependencies -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3434) [Packaging] Add Apache ORC C++ library to conda-forge
[ https://issues.apache.org/jira/browse/ARROW-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3434. - Resolution: Fixed Assignee: Uwe L. Korn Fix Version/s: (was: 0.14.0) 0.13.0 This has been completed > [Packaging] Add Apache ORC C++ library to conda-forge > - > > Key: ARROW-3434 > URL: https://issues.apache.org/jira/browse/ARROW-3434 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Wes McKinney >Assignee: Uwe L. Korn >Priority: Major > Labels: toolchain > Fix For: 0.13.0 > > > In the vein of "toolchain all the things", it would be useful to be able to > obtain the ORC static libraries from a conda package rather than building > from source every time -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3032) [Python] Clean up NumPy-related C++ headers
[ https://issues.apache.org/jira/browse/ARROW-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3032: Fix Version/s: 0.15.0 > [Python] Clean up NumPy-related C++ headers > --- > > Key: ARROW-3032 > URL: https://issues.apache.org/jira/browse/ARROW-3032 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > There are 4 different headers. After ARROW-2814, we can probably eliminate > numpy_convert.h and combine with numpy_to_arrow.h -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3016) [C++] Add ability to enable call stack logging for each memory allocation
[ https://issues.apache.org/jira/browse/ARROW-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3016: Fix Version/s: 0.14.0 > [C++] Add ability to enable call stack logging for each memory allocation > - > > Key: ARROW-3016 > URL: https://issues.apache.org/jira/browse/ARROW-3016 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > It is possible to gain programmatic access to the call stack in C/C++, e.g. > https://eli.thegreenplace.net/2015/programmatic-access-to-the-call-stack-in-c/ > It would be valuable to have a debugging option to log the sizes of memory > allocations as well as showing the call stack where that allocation is > performed. In complex programs, this could help determine the origin of a > memory leak -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3399) [Python] Cannot serialize numpy matrix object
[ https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3399: Fix Version/s: 0.14.0 > [Python] Cannot serialize numpy matrix object > - > > Key: ARROW-3399 > URL: https://issues.apache.org/jira/browse/ARROW-3399 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mitar >Priority: Major > Fix For: 0.14.0 > > > This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on > Linux. > {code:java} > from pyarrow import plasma > import numpy > import time > import subprocess > import os > import signal > m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) > process = subprocess.Popen(['plasma_store', '-m', '100', '-s', > '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, > stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) > time.sleep(5) > client = plasma.connect('/tmp/plasma', '', 0) > try: > client.put(m) > finally: > client.disconnect() > os.killpg(os.getpgid(process.pid), signal.SIGTERM) > {code} > Error: > {noformat} > File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put > File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum > recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2984) [JS] Refactor release verification script to share code with main source release verification script
[ https://issues.apache.org/jira/browse/ARROW-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2984: Fix Version/s: (was: JS-0.5.0) 0.14.0 > [JS] Refactor release verification script to share code with main source > release verification script > > > Key: ARROW-2984 > URL: https://issues.apache.org/jira/browse/ARROW-2984 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > There is some possible code duplication. See discussion in ARROW-2977 > https://github.com/apache/arrow/pull/2369 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2938) [Packaging] Make the source release via crossbow
[ https://issues.apache.org/jira/browse/ARROW-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806005#comment-16806005 ] Wes McKinney commented on ARROW-2938: - I'm not sure this is desirable from a security standpoint > [Packaging] Make the source release via crossbow > > > Key: ARROW-2938 > URL: https://issues.apache.org/jira/browse/ARROW-2938 > Project: Apache Arrow > Issue Type: Task > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > > And make it possible to upload source distribution (signature and checksums > as well) to github releases. This will make ARROW-2910 testable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3399) [Python] Cannot serialize numpy matrix object
[ https://issues.apache.org/jira/browse/ARROW-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806006#comment-16806006 ] Mitar commented on ARROW-3399: -- This is still happening in 0.12.1. I think this should be fixed because it will be quite some time before nobody will be using matrix class anymore, even if it is deprecated. > [Python] Cannot serialize numpy matrix object > - > > Key: ARROW-3399 > URL: https://issues.apache.org/jira/browse/ARROW-3399 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Mitar >Priority: Major > > This is a regression from 0.9.0 and happens with 0.10.0 with Python 3.6.5 on > Linux. > {code:java} > from pyarrow import plasma > import numpy > import time > import subprocess > import os > import signal > m = numpy.matrix(numpy.array([[1, 2], [3, 4]])) > process = subprocess.Popen(['plasma_store', '-m', '100', '-s', > '/tmp/plasma', '-d', '/dev/shm'], stdout=subprocess.DEVNULL, > stderr=subprocess.DEVNULL, encoding='utf8', preexec_fn=os.setpgrp) > time.sleep(5) > client = plasma.connect('/tmp/plasma', '', 0) > try: > client.put(m) > finally: > client.disconnect() > os.killpg(os.getpgid(process.pid), signal.SIGTERM) > {code} > Error: > {noformat} > File "pyarrow/_plasma.pyx", line 397, in pyarrow._plasma.PlasmaClient.put > File "pyarrow/serialization.pxi", line 338, in pyarrow.lib.serialize > File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum > recursion depth. It may contain itself recursively.{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2967) [Python] Add option to treat invalid PyObject* values as null in pyarrow.array
[ https://issues.apache.org/jira/browse/ARROW-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2967: Fix Version/s: 0.14.0 > [Python] Add option to treat invalid PyObject* values as null in pyarrow.array > -- > > Key: ARROW-2967 > URL: https://issues.apache.org/jira/browse/ARROW-2967 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > See discussion in ARROW-2966 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2939) [Python] API documentation version doesn't match latest on PyPI
[ https://issues.apache.org/jira/browse/ARROW-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2939: Fix Version/s: 0.14.0 > [Python] API documentation version doesn't match latest on PyPI > --- > > Key: ARROW-2939 > URL: https://issues.apache.org/jira/browse/ARROW-2939 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Ian Robertson >Priority: Minor > Labels: documentation > Fix For: 0.14.0 > > > Hey folks, apologies if this isn't the right place to raise this. In poking > around the web documentation (for pyarrow specifically), it looks like the > auto-generated API docs contain commits past the release of 0.9.0. For > example: > * > [https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.column] > * Contains differences merged here: > [https://github.com/apache/arrow/pull/1923] > * But latest pypi/conda versions of pyarrow are 0.9.0, which don't include > that change. > Not sure if the docs are auto-built off master somewhere, I couldn't find > anything about building docs in the docs itself. I would guess that you may > want some of the usage docs to be published in between releases if they're > not about new functionality, but the API reference being out of date can be > confusing. Is it possible to anchor the API docs to the latest released > version? Or even something like how Pandas has a whole bunch of old versions > still available? (e.g. [https://pandas.pydata.org/pandas-docs/stable/] vs. > old versions like [http://pandas.pydata.org/pandas-docs/version/0.17.0/]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2882: Summary: [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets (was: [Python] Support AWS Firehose partition_scheme implementation for Parquet datasets) > [C++][Python] Support AWS Firehose partition_scheme implementation for > Parquet datasets > --- > > Key: ARROW-2882 > URL: https://issues.apache.org/jira/browse/ARROW-2882 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Pablo Javier Takara >Priority: Major > Labels: parquet > > I'd like to be able to read a ParquetDataset generated by AWS Firehose. > The only implementation at the time of writting was the partition scheme > created by hive (year=2018/month=01/day=11). > AWS Firehose partition scheme is a little bit different (2018/01/11). > > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806004#comment-16806004 ] Wes McKinney commented on ARROW-2882: - I added the C++ component since this will be handled as part of the Datasets project > [C++][Python] Support AWS Firehose partition_scheme implementation for > Parquet datasets > --- > > Key: ARROW-2882 > URL: https://issues.apache.org/jira/browse/ARROW-2882 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Pablo Javier Takara >Priority: Major > Labels: dataset, parquet > > I'd like to be able to read a ParquetDataset generated by AWS Firehose. > The only implementation at the time of writting was the partition scheme > created by hive (year=2018/month=01/day=11). > AWS Firehose partition scheme is a little bit different (2018/01/11). > > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2882: Labels: dataset parquet (was: parquet) > [C++][Python] Support AWS Firehose partition_scheme implementation for > Parquet datasets > --- > > Key: ARROW-2882 > URL: https://issues.apache.org/jira/browse/ARROW-2882 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Pablo Javier Takara >Priority: Major > Labels: dataset, parquet > > I'd like to be able to read a ParquetDataset generated by AWS Firehose. > The only implementation at the time of writting was the partition scheme > created by hive (year=2018/month=01/day=11). > AWS Firehose partition scheme is a little bit different (2018/01/11). > > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2860) [Python][Parquet] Null values in a single partition of Parquet dataset, results in invalid schema on read
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2860: Summary: [Python][Parquet] Null values in a single partition of Parquet dataset, results in invalid schema on read (was: [Python] Null values in a single partition of Parquet dataset, results in invalid schema on read) > [Python][Parquet] Null values in a single partition of Parquet dataset, > results in invalid schema on read > - > > Key: ARROW-2860 > URL: https://issues.apache.org/jira/browse/ARROW-2860 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Sam Oluwalana >Assignee: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > {code:python} > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > from datetime import datetime, timedelta > def generate_data(event_type, event_id, offset=0): > """Generate data.""" > now = datetime.utcnow() + timedelta(seconds=offset) > obj = { > 'event_type': event_type, > 'event_id': event_id, > 'event_date': now.date(), > 'foo': None, > 'bar': u'hello', > } > if event_type == 2: > obj['foo'] = 1 > obj['bar'] = u'world' > if event_type == 3: > obj['different'] = u'data' > obj['bar'] = u'event type 3' > else: > obj['different'] = None > return obj > data = [ > generate_data(1, 1, 1), > generate_data(1, 1, 3600 * 72), > generate_data(2, 1, 1), > generate_data(2, 1, 3600 * 72), > generate_data(3, 1, 1), > generate_data(3, 1, 3600 * 72), > ] > df = pd.DataFrame.from_records(data, index='event_id') > table = pa.Table.from_pandas(df) > pq.write_to_dataset(table, root_path='/tmp/events', > partition_cols=['event_type', 'event_date']) > dataset = pq.ParquetDataset('/tmp/events') > table = dataset.read() > print(table.num_rows) > {code} > Expected output: > {code:python} > 6 > {code} > Actual: > {code:python} > python example_failure.py > Traceback (most recent call last): > File "example_failure.py", line 43, in > dataset = pq.ParquetDataset('/tmp/events') > File > "/Users/sam/.virtualenvs/test-parquet/lib/python2.7/site-packages/pyarrow/parquet.py", > line 745, in __init__ > self.validate_schemas() > File > "/Users/sam/.virtualenvs/test-parquet/lib/python2.7/site-packages/pyarrow/parquet.py", > line 775, in validate_schemas > dataset_schema)) > ValueError: Schema in partition[event_type=2, event_date=0] > /tmp/events/event_type=3/event_date=2018-07-16 > 00:00:00/be001bf576674d09825539f20e99ebe5.parquet was different. > bar: string > different: string > foo: double > event_id: int64 > metadata > > {'pandas': '{"pandas_version": "0.23.3", "index_columns": ["event_id"], > "columns": [{"metadata": null, "field_name": "bar", "name": "bar", > "numpy_type": "object", "pandas_type": "unicode"}, {"metadata": null, > "field_name": "different", "name": "different", "numpy_type": "object", > "pandas_type": "unicode"}, {"metadata": null, "field_name": "foo", "name": > "foo", "numpy_type": "float64", "pandas_type": "float64"}, {"metadata": null, > "field_name": "event_id", "name": "event_id", "numpy_type": "int64", > "pandas_type": "int64"}], "column_indexes": [{"metadata": null, "field_name": > null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]}'} > vs > bar: string > different: null > foo: double > event_id: int64 > metadata > > {'pandas': '{"pandas_version": "0.23.3", "index_columns": ["event_id"], > "columns": [{"metadata": null, "field_name": "bar", "name": "bar", > "numpy_type": "object", "pandas_type": "unicode"}, {"metadata": null, > "field_name": "different", "name": "different", "numpy_type": "object", > "pandas_type": "empty"}, {"metadata": null, "field_name": "foo", "name": > "foo", "numpy_type": "float64", "pandas_type": "float64"}, {"metadata": null, > "field_name": "event_id", "name": "event_id", "numpy_type": "int64", > "pandas_type": "int64"}], "column_indexes": [{"metadata": null, "field_name": > null, "name": null, "numpy_type": "object", "pandas_type": "bytes"}]}'} > {code} > Apparently what is happening is that pyarrow is interpreting the schema from > each of the partitions individually and the partitions for `event_type=3 / > event_date=*` both have values for the column `different` whereas the other > columns do not. The discrepancy causes the `None` values of the other > partitions to be labeled as `pandas_type` `empty` instead of `unicode`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-465) [C++] Investigate usage of madvise
[ https://issues.apache.org/jira/browse/ARROW-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806002#comment-16806002 ] Wes McKinney commented on ARROW-465: We would need to have a benchmark that exhibits the page faulting behavior. Since this issue was over 2 years ago it is a bit stale > [C++] Investigate usage of madvise > --- > > Key: ARROW-465 > URL: https://issues.apache.org/jira/browse/ARROW-465 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Uwe L. Korn >Priority: Major > Fix For: 0.14.0 > > > In some usecases (e.g. Pandas->Arrow conversion) our main constraint is page > faulting not yet accessed pages. > With {{madvise}} we can indicate our planned actions to the OS and may > improve the performance a bit in these cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-2743) [Java] Travis CI test scripts did not catch POM file bug fixed in ARROW-2727
[ https://issues.apache.org/jira/browse/ARROW-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-2743. --- Resolution: Cannot Reproduce Closing for now until the issue recurs > [Java] Travis CI test scripts did not catch POM file bug fixed in ARROW-2727 > > > Key: ARROW-2743 > URL: https://issues.apache.org/jira/browse/ARROW-2743 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > This bug was introduced in ARROW-1780. It is unclear why the bug was not > triggered in Travis CI; we should see about fixing that -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule
[ https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805996#comment-16805996 ] Wes McKinney commented on ARROW-4301: - [~kou] FYI -- master build will be broken after you rebase once the release vote closes. See https://github.com/apache/arrow/pull/3435 for the past fix > [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva > submodule > --- > > Key: ARROW-4301 > URL: https://issues.apache.org/jira/browse/ARROW-4301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva, Java >Reporter: Wes McKinney >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See > https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550. > This is breaking the build so I'm going to patch manually -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule
[ https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reopened ARROW-4301: - > [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva > submodule > --- > > Key: ARROW-4301 > URL: https://issues.apache.org/jira/browse/ARROW-4301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva, Java >Reporter: Wes McKinney >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See > https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550. > This is breaking the build so I'm going to patch manually -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule
[ https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805995#comment-16805995 ] Wes McKinney commented on ARROW-4301: - Reopening, this did not work for the 0.13 release either https://github.com/apache/arrow/tree/apache-arrow-0.13.0/java https://github.com/apache/arrow/commit/dfb9e7af3cd92722893a3819b6676dfdef08f896 > [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva > submodule > --- > > Key: ARROW-4301 > URL: https://issues.apache.org/jira/browse/ARROW-4301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva, Java >Reporter: Wes McKinney >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See > https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550. > This is breaking the build so I'm going to patch manually -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4301) [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva submodule
[ https://issues.apache.org/jira/browse/ARROW-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-4301: Fix Version/s: (was: 0.13.0) 0.14.0 > [Java][Gandiva] Maven snapshot version update does not seem to update Gandiva > submodule > --- > > Key: ARROW-4301 > URL: https://issues.apache.org/jira/browse/ARROW-4301 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva, Java >Reporter: Wes McKinney >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > See > https://github.com/apache/arrow/commit/a486db8c1476be1165981c4fe22996639da8e550. > This is breaking the build so I'm going to patch manually -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2572) [Python] Add factory function to create a Table from Columns and Schema.
[ https://issues.apache.org/jira/browse/ARROW-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2572: Fix Version/s: 0.14.0 > [Python] Add factory function to create a Table from Columns and Schema. > > > Key: ARROW-2572 > URL: https://issues.apache.org/jira/browse/ARROW-2572 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.9.0 >Reporter: Thomas Buhrmann >Priority: Minor > Labels: beginner > Fix For: 0.14.0 > > > At the moment it seems to be impossible in Python to add custom metadata to a > Table or Column. The closest I've come is to create a list of new Fields (by > "appending" metadata to existing Fields), and then creating a new Schema from > these Fields using the Schema factory function. But I can't see how to create > a new table from the existing Columns and my new Schema, which I understand > would be the way to do it in C++? > Essentially, wrappers for the Table's Make(...) functions seem to be missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2572) [Python] Add factory function to create a Table from Columns and Schema.
[ https://issues.apache.org/jira/browse/ARROW-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805981#comment-16805981 ] Wes McKinney commented on ARROW-2572: - Do you want to try contributing a patch? > [Python] Add factory function to create a Table from Columns and Schema. > > > Key: ARROW-2572 > URL: https://issues.apache.org/jira/browse/ARROW-2572 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.9.0 >Reporter: Thomas Buhrmann >Priority: Minor > Labels: beginner > Fix For: 0.14.0 > > > At the moment it seems to be impossible in Python to add custom metadata to a > Table or Column. The closest I've come is to create a list of new Fields (by > "appending" metadata to existing Fields), and then creating a new Schema from > these Fields using the Schema factory function. But I can't see how to create > a new table from the existing Columns and my new Schema, which I understand > would be the way to do it in C++? > Essentially, wrappers for the Table's Make(...) functions seem to be missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2512) [Python] Enable direct interaction of GPU Objects in Python
[ https://issues.apache.org/jira/browse/ARROW-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2512: Summary: [Python] Enable direct interaction of GPU Objects in Python (was: [Python ]Enable direct interaction of GPU Objects in Python) > [Python] Enable direct interaction of GPU Objects in Python > --- > > Key: ARROW-2512 > URL: https://issues.apache.org/jira/browse/ARROW-2512 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma, GPU, Python >Reporter: William Paul >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Plasma can now manage objects on the GPU, but in order to use this > functionality in Python, there needs to be some way to represent these GPU > objects in Python that allows computation on the GPU. > The easiest way to enable this is to rely on a third party library, such as > Pytorch, which will allow us to use all of its existing functionality. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2358) [C++][Python] API for Writing to Multiple Feather Files
[ https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2358: Summary: [C++][Python] API for Writing to Multiple Feather Files (was: API for Writing to Multiple Feather Files) > [C++][Python] API for Writing to Multiple Feather Files > --- > > Key: ARROW-2358 > URL: https://issues.apache.org/jira/browse/ARROW-2358 > Project: Apache Arrow > Issue Type: New Feature > Components: C, C++, Python >Affects Versions: 0.9.0 >Reporter: Dhruv Madeka >Priority: Minor > > It would be really great to have an API which can write a Table to a > `FeatherDataset`. Essentially, taking a name for a file - it would split the > table into N-equal parts (which could be determined by the user or the code) > and then write the data to N files with a suffix (which is `_part` by default > but could be user specificed). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2186) [C++] Clean up architecture specific compiler flags
[ https://issues.apache.org/jira/browse/ARROW-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2186: Fix Version/s: 0.14.0 > [C++] Clean up architecture specific compiler flags > --- > > Key: ARROW-2186 > URL: https://issues.apache.org/jira/browse/ARROW-2186 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I noticed that {{-maltivec}} is being passed to the compiler on Linux, with > an x86_64 processor. That seemed odd to me. It prompted me to look more > generally at our compiler flags related to hardware optimizations. We have > the ability to pass {{-msse3}}, but there is a {{ARROW_USE_SSE}} which is > only used as a define in some headers. There is {{ARROW_ALTIVEC}}, but no > option to pass {{-march}}. Nothing related to AVX/AVX2/AVX512. I think this > could do for an overhaul -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2164) [C++] Clean up unnecessary decimal module refs
[ https://issues.apache.org/jira/browse/ARROW-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-2164: Fix Version/s: 0.14.0 > [C++] Clean up unnecessary decimal module refs > -- > > Key: ARROW-2164 > URL: https://issues.apache.org/jira/browse/ARROW-2164 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.8.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > Fix For: 0.14.0 > > > See this comment: > https://github.com/apache/arrow/pull/1610#discussion_r168533239 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2186) [C++] Clean up architecture specific compiler flags
[ https://issues.apache.org/jira/browse/ARROW-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805979#comment-16805979 ] Wes McKinney commented on ARROW-2186: - Is this fixed now maybe? > [C++] Clean up architecture specific compiler flags > --- > > Key: ARROW-2186 > URL: https://issues.apache.org/jira/browse/ARROW-2186 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I noticed that {{-maltivec}} is being passed to the compiler on Linux, with > an x86_64 processor. That seemed odd to me. It prompted me to look more > generally at our compiler flags related to hardware optimizations. We have > the ability to pass {{-msse3}}, but there is a {{ARROW_USE_SSE}} which is > only used as a define in some headers. There is {{ARROW_ALTIVEC}}, but no > option to pass {{-march}}. Nothing related to AVX/AVX2/AVX512. I think this > could do for an overhaul -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-1880) [Python] Plasma test flakiness in Travis CI
[ https://issues.apache.org/jira/browse/ARROW-1880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-1880. --- Resolution: Cannot Reproduce These issues seem to not be occurring lately > [Python] Plasma test flakiness in Travis CI > --- > > Key: ARROW-1880 > URL: https://issues.apache.org/jira/browse/ARROW-1880 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > > We've been seeing intermittent flakiness of the variety: > {code} > ERRORS > > __ ERROR at setup of TestPlasmaClient.test_use_one_memory_mapped_file > __ > self = > test_method = of > > [1mdef setup_method(self, test_method):[0m > [1muse_one_memory_mapped_file = (test_method ==[0m > [1m > self.test_use_one_memory_mapped_file)[0m > [1m[0m > [1mimport pyarrow.plasma as plasma[0m > [1m# Start Plasma store.[0m > [1mplasma_store_name, self.p = start_plasma_store([0m > [1muse_valgrind=os.getenv("PLASMA_VALGRIND") == "1",[0m > [1muse_one_memory_mapped_file=use_one_memory_mapped_file)[0m > [1m# Connect to Plasma.[0m > [1m> self.plasma_client = plasma.connect(plasma_store_name, "", 64)[0m > [1m[31mpyarrow-test-3.6/lib/python3.6/site-packages/pyarrow/tests/test_plasma.py[0m:164: > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > [1m[31mplasma.pyx[0m:672: in pyarrow.plasma.connect > [1m???[0m > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > [1m> ???[0m > [1m[31mE pyarrow.lib.ArrowIOError: Could not connect to socket > /tmp/plasma_store43998835[0m > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1846) [C++] Implement "any" reduction kernel for boolean data, with the ability to short circuit when applying on chunked data
[ https://issues.apache.org/jira/browse/ARROW-1846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1846: Labels: analytics (was: ) > [C++] Implement "any" reduction kernel for boolean data, with the ability to > short circuit when applying on chunked data > > > Key: ARROW-1846 > URL: https://issues.apache.org/jira/browse/ARROW-1846 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: analytics > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1798) [C++] Implement x86 SIMD-accelerated binary arithmetic kernels
[ https://issues.apache.org/jira/browse/ARROW-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1798: Labels: analytics (was: ) > [C++] Implement x86 SIMD-accelerated binary arithmetic kernels > -- > > Key: ARROW-1798 > URL: https://issues.apache.org/jira/browse/ARROW-1798 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: analytics > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1797) [C++] Implement binary arithmetic kernels for numeric arrays
[ https://issues.apache.org/jira/browse/ARROW-1797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1797: Labels: analytics (was: ) > [C++] Implement binary arithmetic kernels for numeric arrays > > > Key: ARROW-1797 > URL: https://issues.apache.org/jira/browse/ARROW-1797 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: analytics > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-1843) Merge tool occasionally leaves JIRAs in an invalid state
[ https://issues.apache.org/jira/browse/ARROW-1843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-1843. --- Resolution: Not A Problem Haven't seen this in a long time. I'm going to assume it was a hiccup with the ASF JIRA instance > Merge tool occasionally leaves JIRAs in an invalid state > > > Key: ARROW-1843 > URL: https://issues.apache.org/jira/browse/ARROW-1843 > Project: Apache Arrow > Issue Type: Bug >Reporter: Wes McKinney >Priority: Major > > I have been noticing some patches are getting left in "In Progress" or "Open" > state but in the web UI, the JIRA appears to be resolved. I have been having > to reopen these issues, then press "Resolve" in the web UI. This isn't > happening 100% of the time, but has happened several times today -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1761) [C++] Multi argument operator kernel behavior for decimal columns
[ https://issues.apache.org/jira/browse/ARROW-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1761: Summary: [C++] Multi argument operator kernel behavior for decimal columns (was: Multi argument operator kernel behavior for decimal columns) > [C++] Multi argument operator kernel behavior for decimal columns > - > > Key: ARROW-1761 > URL: https://issues.apache.org/jira/browse/ARROW-1761 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Java >Affects Versions: 0.7.1 >Reporter: Phillip Cloud >Assignee: Phillip Cloud >Priority: Major > > This is a JIRA to discuss the behavior of operator kernels that require more > than one decimal column input where the column types have a different > {{scale}} parameter. > For example: > {code} > a: decimal(12, 2) > b: decimal(10, 3) > c = a + b > {code} > Arithmetic is the primary use case, but anything that needs to efficiently > operate on decimal columns with different scales would require this > functionality. > I imagine that [~jnadeau] and folks at Dremio have thought about and solved > the problem in Java. If so, we should consider implementing this behavior in > C++. Otherwise, I'll do a bit of reading and digging to see how existing > systems efficiently handle this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1699) [C++] Forward, backward fill kernel functions
[ https://issues.apache.org/jira/browse/ARROW-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1699: Labels: analytics (was: ) > [C++] Forward, backward fill kernel functions > - > > Key: ARROW-1699 > URL: https://issues.apache.org/jira/browse/ARROW-1699 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: analytics > > Like ffill / bfill in pandas (with limit) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1644) [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels
[ https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1644: Component/s: C++ > [C++][Parquet] Read and write nested Parquet data with a mix of struct and > list nesting levels > -- > > Key: ARROW-1644 > URL: https://issues.apache.org/jira/browse/ARROW-1644 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python >Affects Versions: 0.8.0 >Reporter: DB Tsai >Assignee: Joshua Storck >Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.14.0 > > > We have many nested parquet files generated from Apache Spark for ranking > problems, and we would like to load them in python for other programs to > consume. > The schema looks like > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- show_title_id: integer (nullable = true) > |||-- duration: double (nullable = true) > {code} > And when I tried to load it with nightly build pyarrow on Oct 4, 2017, I got > the following error. > {code:python} > Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57) > [GCC 7.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> import pandas as pd > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> table2 = pq.read_table('part-0') > Traceback (most recent call last): > File "", line 1, in > File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", > line 823, in read_table > use_pandas_metadata=use_pandas_metadata) > File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", > line 119, in read > nthreads=nthreads) > File "_parquet.pyx", line 466, in pyarrow._parquet.ParquetReader.read_all > File "error.pxi", line 85, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: lists with structs are not supported. > {code} > I somehow get the impression that after > https://issues.apache.org/jira/browse/PARQUET-911 is merged, we should be > able to load the nested parquet in pyarrow. > Any insight about this? > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1644) [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels
[ https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1644: Summary: [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels (was: [Python] Read and write nested Parquet data with a mix of struct and list nesting levels) > [C++][Parquet] Read and write nested Parquet data with a mix of struct and > list nesting levels > -- > > Key: ARROW-1644 > URL: https://issues.apache.org/jira/browse/ARROW-1644 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Affects Versions: 0.8.0 >Reporter: DB Tsai >Assignee: Joshua Storck >Priority: Major > Labels: parquet, pull-request-available > Fix For: 0.14.0 > > > We have many nested parquet files generated from Apache Spark for ranking > problems, and we would like to load them in python for other programs to > consume. > The schema looks like > {code:java} > root > |-- profile_id: long (nullable = true) > |-- country_iso_code: string (nullable = true) > |-- items: array (nullable = false) > ||-- element: struct (containsNull = false) > |||-- show_title_id: integer (nullable = true) > |||-- duration: double (nullable = true) > {code} > And when I tried to load it with nightly build pyarrow on Oct 4, 2017, I got > the following error. > {code:python} > Python 3.6.2 |Anaconda, Inc.| (default, Sep 30 2017, 18:42:57) > [GCC 7.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> import pandas as pd > >>> import pyarrow as pa > >>> import pyarrow.parquet as pq > >>> table2 = pq.read_table('part-0') > Traceback (most recent call last): > File "", line 1, in > File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", > line 823, in read_table > use_pandas_metadata=use_pandas_metadata) > File "/home/dbt/miniconda3/lib/python3.6/site-packages/pyarrow/parquet.py", > line 119, in read > nthreads=nthreads) > File "_parquet.pyx", line 466, in pyarrow._parquet.ParquetReader.read_all > File "error.pxi", line 85, in pyarrow.lib.check_status > pyarrow.lib.ArrowNotImplementedError: lists with structs are not supported. > {code} > I somehow get the impression that after > https://issues.apache.org/jira/browse/PARQUET-911 is merged, we should be > able to load the nested parquet in pyarrow. > Any insight about this? > Thanks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1599) [C++][Parquet] Unable to read Parquet files with list inside struct
[ https://issues.apache.org/jira/browse/ARROW-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1599: Summary: [C++][Parquet] Unable to read Parquet files with list inside struct (was: [Python] Unable to read Parquet files with list inside struct) > [C++][Parquet] Unable to read Parquet files with list inside struct > --- > > Key: ARROW-1599 > URL: https://issues.apache.org/jira/browse/ARROW-1599 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 > Environment: Ubuntu >Reporter: Jovann Kung >Assignee: Joshua Storck >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > Is PyArrow currently unable to read in Parquet files with a vector as a > column? For example, the schema of such a file is below: > {{ > mbc: FLOAT > deltae: FLOAT > labels: FLOAT > features.type: INT32 INT_8 > features.size: INT32 > features.indices.list.element: INT32 > features.values.list.element: DOUBLE}} > Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() > yields the following error: ArrowNotImplementedError: Currently only nesting > with Lists is supported. > From the error I assume that this may be implemented in further releases? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1599) [C++][Parquet] Unable to read Parquet files with list inside struct
[ https://issues.apache.org/jira/browse/ARROW-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1599: Component/s: C++ > [C++][Parquet] Unable to read Parquet files with list inside struct > --- > > Key: ARROW-1599 > URL: https://issues.apache.org/jira/browse/ARROW-1599 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.7.0 > Environment: Ubuntu >Reporter: Jovann Kung >Assignee: Joshua Storck >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > Is PyArrow currently unable to read in Parquet files with a vector as a > column? For example, the schema of such a file is below: > {{ > mbc: FLOAT > deltae: FLOAT > labels: FLOAT > features.type: INT32 INT_8 > features.size: INT32 > features.indices.list.element: INT32 > features.values.list.element: DOUBLE}} > Using either pq.read_table() or pq.ParquetDataset('/path/to/parquet').read() > yields the following error: ArrowNotImplementedError: Currently only nesting > with Lists is supported. > From the error I assume that this may be implemented in further releases? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1894) [Python] Treat CPython memoryview or buffer objects equivalently to pyarrow.Buffer in pyarrow.serialize
[ https://issues.apache.org/jira/browse/ARROW-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1894: Labels: (was: beginner) > [Python] Treat CPython memoryview or buffer objects equivalently to > pyarrow.Buffer in pyarrow.serialize > --- > > Key: ARROW-1894 > URL: https://issues.apache.org/jira/browse/ARROW-1894 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > These should be treated as Buffer-like on serialize. We should consider how > to "box" the buffers as the appropriate kind of object (Buffer, memoryview, > etc.) when being deserialized -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-1552) [C++] Enable Arrow production builds on Linux / macOS without Boost dependency
[ https://issues.apache.org/jira/browse/ARROW-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-1552. --- Resolution: Won't Fix I'm not sure this is the best use of our time. If someone comes along and wants to work on this, please feel free > [C++] Enable Arrow production builds on Linux / macOS without Boost dependency > -- > > Key: ARROW-1552 > URL: https://issues.apache.org/jira/browse/ARROW-1552 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > > Currently, we use Boost on a very limited basis. We should consider making > Boost a non-dependency on POSIX-based systems (i.e. we can continue to use > boost::filesystem on Windows), and still use Boost where useful in the test > suite. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1389) [Python] Support arbitrary precision integers
[ https://issues.apache.org/jira/browse/ARROW-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1389: Component/s: Python > [Python] Support arbitrary precision integers > - > > Key: ARROW-1389 > URL: https://issues.apache.org/jira/browse/ARROW-1389 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Philipp Moritz >Priority: Minor > > For Python serialization it would be great if we had Arrow support for > arbitrary precision integers, see the comment in > https://github.com/apache/arrow/blob/de7c6715ba244e119913bfa31b8de46dbbd450bf/python/pyarrow/tests/test_serialization.py#L183 > Long integers are for example used in the uuid python module and having this > would increase serialization performance for uuids and also make the code > cleaner. > I wonder if this is more generally useful too, any thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1389) [Python] Support arbitrary precision integers
[ https://issues.apache.org/jira/browse/ARROW-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1389: Summary: [Python] Support arbitrary precision integers (was: Support arbitrary precision integers) > [Python] Support arbitrary precision integers > - > > Key: ARROW-1389 > URL: https://issues.apache.org/jira/browse/ARROW-1389 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Minor > > For Python serialization it would be great if we had Arrow support for > arbitrary precision integers, see the comment in > https://github.com/apache/arrow/blob/de7c6715ba244e119913bfa31b8de46dbbd450bf/python/pyarrow/tests/test_serialization.py#L183 > Long integers are for example used in the uuid python module and having this > would increase serialization performance for uuids and also make the code > cleaner. > I wonder if this is more generally useful too, any thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-1059) [C++] Define API for embedding user-defined metadata / Flatbuffer message types in Arrow IPC machinery
[ https://issues.apache.org/jira/browse/ARROW-1059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1059: Fix Version/s: 0.15.0 > [C++] Define API for embedding user-defined metadata / Flatbuffer message > types in Arrow IPC machinery > -- > > Key: ARROW-1059 > URL: https://issues.apache.org/jira/browse/ARROW-1059 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > Currently, the {{MessageHeader}} Flatbuffer union must be modified to > serialize new kinds of metadata: > https://github.com/apache/arrow/blob/master/format/Message.fbs#L85 > It would be interesting if user metadata could be embedded within a > particular application that wishes to use the Arrow C++ libraries' zero-copy > IPC machinery for serialization of other kinds of data structures. > As one approach, the message metadata could be an application-dependent > unique identifier for the user defined type, which would internally dispatch > to an implementation of an abstract deserializer interface. So in addition to > describing the serialized representation of the user type, we also will have > to create the abstract API for the user to implement so that the code in > {{arrow/ipc}} can be configured to dispatch appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas
[ https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805967#comment-16805967 ] Wes McKinney commented on ARROW-843: I changed this to a C++ issue since a lot of the datasets logic will be migrated to C++ from where it's in Python now > [C++] Parquet merging unequal but equivalent schemas > > > Key: ARROW-843 > URL: https://issues.apache.org/jira/browse/ARROW-843 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: dataset, parquet > Fix For: 0.14.0 > > > Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL > repetition types. While such schemas aren't strictly equal, we will need to > consider them equivalent on the read path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas
[ https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-843: --- Labels: dataset parquet (was: parquet) > [C++] Parquet merging unequal but equivalent schemas > > > Key: ARROW-843 > URL: https://issues.apache.org/jira/browse/ARROW-843 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: dataset, parquet > Fix For: 0.14.0 > > > Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL > repetition types. While such schemas aren't strictly equal, we will need to > consider them equivalent on the read path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas
[ https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-843: --- Component/s: (was: Python) C++ > [C++] Parquet merging unequal but equivalent schemas > > > Key: ARROW-843 > URL: https://issues.apache.org/jira/browse/ARROW-843 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: dataset, parquet > Fix For: 0.14.0 > > > Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL > repetition types. While such schemas aren't strictly equal, we will need to > consider them equivalent on the read path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-843) [C++] Parquet merging unequal but equivalent schemas
[ https://issues.apache.org/jira/browse/ARROW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-843: --- Summary: [C++] Parquet merging unequal but equivalent schemas (was: [Python] Parquet merging unequal but equivalent schemas) > [C++] Parquet merging unequal but equivalent schemas > > > Key: ARROW-843 > URL: https://issues.apache.org/jira/browse/ARROW-843 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: parquet > Fix For: 0.14.0 > > > Some Parquet datasets may contain schemas with mixed REQUIRED/OPTIONAL > repetition types. While such schemas aren't strictly equal, we will need to > consider them equivalent on the read path -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-840) [Python] Provide Python API for creating user-defined data types that can survive Arrow IPC
[ https://issues.apache.org/jira/browse/ARROW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-840: --- Fix Version/s: 0.14.0 > [Python] Provide Python API for creating user-defined data types that can > survive Arrow IPC > --- > > Key: ARROW-840 > URL: https://issues.apache.org/jira/browse/ARROW-840 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > The user will provide: > * Data type subclass that can indicate the physical storage type > * "get state" and "set state" functions for serializing custom metadata to > bytes > * An optional function for "boxing" scalar values from the physical array > storage > Internally, this will build on an analogous C++ API for defining user data > types -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-840) [Python] Provide Python API for creating user-defined data types that can survive Arrow IPC
[ https://issues.apache.org/jira/browse/ARROW-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805966#comment-16805966 ] Wes McKinney commented on ARROW-840: This is within reach now because of the C++ extension types > [Python] Provide Python API for creating user-defined data types that can > survive Arrow IPC > --- > > Key: ARROW-840 > URL: https://issues.apache.org/jira/browse/ARROW-840 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > The user will provide: > * Data type subclass that can indicate the physical storage type > * "get state" and "set state" functions for serializing custom metadata to > bytes > * An optional function for "boxing" scalar values from the physical array > storage > Internally, this will build on an analogous C++ API for defining user data > types -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-823) [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages
[ https://issues.apache.org/jira/browse/ARROW-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805964#comment-16805964 ] Wes McKinney commented on ARROW-823: This can be implemented now with ExtensionType > [Python] Devise a means to serialize arrays of arbitrary Python objects in > Arrow IPC messages > - > > Key: ARROW-823 > URL: https://issues.apache.org/jira/browse/ARROW-823 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > > Practically speaking, this would involve a "custom" logical type that is > "pyobject", represented physically as an array of 64-bit pointers. On > serialization, this would need to be converted to a BinaryArray containing > pickled objects as binary values > At the moment, we don't yet have the machinery to deal with "custom" types > where the in-memory representation is different from the on-wire > representation. This would be a useful use case to work through the design > issues > Interestingly, if done properly, this would enable other Arrow > implementations to manipulate (filter, etc.) serialized Python objects as > binary blobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-823) [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages
[ https://issues.apache.org/jira/browse/ARROW-823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-823: --- Fix Version/s: 0.14.0 > [Python] Devise a means to serialize arrays of arbitrary Python objects in > Arrow IPC messages > - > > Key: ARROW-823 > URL: https://issues.apache.org/jira/browse/ARROW-823 > Project: Apache Arrow > Issue Type: New Feature > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Practically speaking, this would involve a "custom" logical type that is > "pyobject", represented physically as an array of 64-bit pointers. On > serialization, this would need to be converted to a BinaryArray containing > pickled objects as binary values > At the moment, we don't yet have the machinery to deal with "custom" types > where the in-memory representation is different from the on-wire > representation. This would be a useful use case to work through the design > issues > Interestingly, if done properly, this would enable other Arrow > implementations to manipulate (filter, etc.) serialized Python objects as > binary blobs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-792) [Java] Allow loading/unloading vectors without using FieldNodes
[ https://issues.apache.org/jira/browse/ARROW-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-792: --- Component/s: Java > [Java] Allow loading/unloading vectors without using FieldNodes > --- > > Key: ARROW-792 > URL: https://issues.apache.org/jira/browse/ARROW-792 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > > The information stored in FieldNode structure is not strictly necessary for > serializing/deserializing vectors. We should allow loading/unloading of > vectors without it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-799) [Java] Provide guidance in documentation for using Arrow in an uberjar setting
[ https://issues.apache.org/jira/browse/ARROW-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-799: --- Component/s: Java > [Java] Provide guidance in documentation for using Arrow in an uberjar > setting > --- > > Key: ARROW-799 > URL: https://issues.apache.org/jira/browse/ARROW-799 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Jingyuan Wang >Assignee: Li Jin >Priority: Major > Labels: beginner > > Currently, ArrowBuf class directly access the package-private fields of > AbstractByteBuf class which makes shading Apache Arrow problematic. If we > relocate io.netty namespace excluding io.netty.buffer.ArrowBuf, it would > throw out IllegalAccessException. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-730) [Format] Define Flatbuffers metadata for random-access compressed block memory format
[ https://issues.apache.org/jira/browse/ARROW-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-730: --- Fix Version/s: 0.15.0 > [Format] Define Flatbuffers metadata for random-access compressed block > memory format > - > > Key: ARROW-730 > URL: https://issues.apache.org/jira/browse/ARROW-730 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > It would be useful to be able to create a compressed buffer stream as a > series of fixed-size blocks, with metadata written at the footer of the > stream, so that random access is possible. > {code} > compressed_block[0] > ... > compressed_block[N-1] > compression metadata > metadata_size (int32) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-791) [Java] Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
[ https://issues.apache.org/jira/browse/ARROW-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-791: --- Component/s: Java > [Java] Check if ArrowBuf is empty buffer in getActualConsumedMemory() and > getPossibleConsumedMemory() > - > > Key: ARROW-791 > URL: https://issues.apache.org/jira/browse/ARROW-791 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > > Most of the methods related to memory accounting in ArrowBuf have special > handling for the case when then Buffer is the empty buffer instance. This > check is missing in these two methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-790) [Java] Fix getField() for NullableMapVector
[ https://issues.apache.org/jira/browse/ARROW-790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-790: --- Component/s: Java > [Java] Fix getField() for NullableMapVector > --- > > Key: ARROW-790 > URL: https://issues.apache.org/jira/browse/ARROW-790 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > > Needs to call super.getField() and return a nullable version of that field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-473) [C++/Python] Add public API for retrieving block locations for a particular HDFS file
[ https://issues.apache.org/jira/browse/ARROW-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-473: --- Labels: hdfs pull-request-available (was: pull-request-available) > [C++/Python] Add public API for retrieving block locations for a particular > HDFS file > - > > Key: ARROW-473 > URL: https://issues.apache.org/jira/browse/ARROW-473 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python >Reporter: Wes McKinney >Priority: Major > Labels: hdfs, pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This is necessary for applications looking to schedule data-local work. > libhdfs does not have APIs to request the block locations directly, so we > need to see if the {{hdfsGetHosts}} function will do what we need. For > libhdfs3 there is a public API function -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-412) [Format] Handling of buffer padding in the IPC metadata
[ https://issues.apache.org/jira/browse/ARROW-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805962#comment-16805962 ] Wes McKinney commented on ARROW-412: My inclination on this would be that the {{Buffer}} Flatbuffers struct reflects the intent of the materialized Buffer object in the client language. So if a sender of the protocol intends for the receiver to have a 64-byte padded buffer, then this padding should be included in the Buffer struct. I can propose some language in the Format documentation to make this clear > [Format] Handling of buffer padding in the IPC metadata > --- > > Key: ARROW-412 > URL: https://issues.apache.org/jira/browse/ARROW-412 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > See discussion in ARROW-399. Do we include padding bytes in the metadata or > set the actual used bytes? In the latter case, the padding would be a part of > the format (any buffers continue to be expected to be 64-byte padded, to > permit AVX512 instructions) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-300) [Format] Add buffer compression option to IPC file format
[ https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-300: --- Fix Version/s: (was: 0.14.0) 0.15.0 > [Format] Add buffer compression option to IPC file format > - > > Key: ARROW-300 > URL: https://issues.apache.org/jira/browse/ARROW-300 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.15.0 > > > It may be useful if data is to be sent over the wire to compress the data > buffers themselves as their being written in the file layout. > I would propose that we keep this extremely simple with a global buffer > compression setting in the file Footer. Probably only two compressors worth > supporting out of the box would be zlib (higher compression ratios) and lz4 > (better performance). > What does everyone think? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-114) Bring in java-unsafe-tools as utility library for Arrow
[ https://issues.apache.org/jira/browse/ARROW-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-114: --- Component/s: Java > Bring in java-unsafe-tools as utility library for Arrow > --- > > Key: ARROW-114 > URL: https://issues.apache.org/jira/browse/ARROW-114 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Jacques Nadeau >Priority: Minor > > Originally here: > https://github.com/alexkasko/unsafe-tools > SGA signed off and received by Secretary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-258) [Format] clarify definition of Buffer in context of RPC, IPC, File
[ https://issues.apache.org/jira/browse/ARROW-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-258: --- Fix Version/s: (was: 1.0.0) 0.14.0 > [Format] clarify definition of Buffer in context of RPC, IPC, File > -- > > Key: ARROW-258 > URL: https://issues.apache.org/jira/browse/ARROW-258 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Julien Le Dem >Priority: Major > Fix For: 0.14.0 > > > currently Buffer has a loosely defined page field used for shared memory only. > https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L109 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-114) [Java] Bring in java-unsafe-tools as utility library for Arrow
[ https://issues.apache.org/jira/browse/ARROW-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-114: --- Summary: [Java] Bring in java-unsafe-tools as utility library for Arrow (was: [JBring in java-unsafe-tools as utility library for Arrow) > [Java] Bring in java-unsafe-tools as utility library for Arrow > -- > > Key: ARROW-114 > URL: https://issues.apache.org/jira/browse/ARROW-114 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Jacques Nadeau >Priority: Minor > > Originally here: > https://github.com/alexkasko/unsafe-tools > SGA signed off and received by Secretary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-241) [Java] Implement splitAndTransfer for UnionVector
[ https://issues.apache.org/jira/browse/ARROW-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-241: --- Summary: [Java] Implement splitAndTransfer for UnionVector (was: Implement splitAndTransfer for UnionVector) > [Java] Implement splitAndTransfer for UnionVector > - > > Key: ARROW-241 > URL: https://issues.apache.org/jira/browse/ARROW-241 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Priority: Major > > This method was never implemented, and currently is a no op. We should at > least do the naive "copy" version of the method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-114) [JBring in java-unsafe-tools as utility library for Arrow
[ https://issues.apache.org/jira/browse/ARROW-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-114: --- Summary: [JBring in java-unsafe-tools as utility library for Arrow (was: Bring in java-unsafe-tools as utility library for Arrow) > [JBring in java-unsafe-tools as utility library for Arrow > - > > Key: ARROW-114 > URL: https://issues.apache.org/jira/browse/ARROW-114 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Jacques Nadeau >Priority: Minor > > Originally here: > https://github.com/alexkasko/unsafe-tools > SGA signed off and received by Secretary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-258) [Format] clarify definition of Buffer in context of RPC, IPC, File
[ https://issues.apache.org/jira/browse/ARROW-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-258: --- Summary: [Format] clarify definition of Buffer in context of RPC, IPC, File (was: clarify definition of Buffer in context of RPC, IPC, File) > [Format] clarify definition of Buffer in context of RPC, IPC, File > -- > > Key: ARROW-258 > URL: https://issues.apache.org/jira/browse/ARROW-258 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Julien Le Dem >Priority: Major > Fix For: 1.0.0 > > > currently Buffer has a loosely defined page field used for shared memory only. > https://github.com/apache/arrow/blob/34e7f48cb71428c4d78cf00d8fdf0045532d6607/format/Message.fbs#L109 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-110) [C++] Decide on optimal growth factor when appending to buffers/arrays
[ https://issues.apache.org/jira/browse/ARROW-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-110. -- Resolution: Done It seems we'll settle on 2x for now until it can be demonstrated to be a performance problem > [C++] Decide on optimal growth factor when appending to buffers/arrays > -- > > Key: ARROW-110 > URL: https://issues.apache.org/jira/browse/ARROW-110 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Micah Kornfield >Priority: Major > > There is some evidence that powers of 2 might not be optimal (the facebook > folly library suggests this in there explanation of why they have there own > vector type). They use 1.5 (as do other implementations that don't use two). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-41) C++: Convert RecordBatch to StructArray, and back
[ https://issues.apache.org/jira/browse/ARROW-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-41. - Resolution: Won't Fix Closing also like ARROW-40 until there is a clearly articulated need > C++: Convert RecordBatch to StructArray, and back > - > > Key: ARROW-41 > URL: https://issues.apache.org/jira/browse/ARROW-41 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > With {{arrow::TableBatchReader}}, we can turn a Table into a sequence of one > or more RecordBatches. It would be useful to be able to easily convert > between RecordBatch and a StructArray (which can be semantically equivalent > in some contexts) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-61) [Java] Method can return the value bigger than long MAX_VALUE
[ https://issues.apache.org/jira/browse/ARROW-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805954#comment-16805954 ] Wes McKinney commented on ARROW-61: --- Is this still an issue? > [Java] Method can return the value bigger than long MAX_VALUE > - > > Key: ARROW-61 > URL: https://issues.apache.org/jira/browse/ARROW-61 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Environment: Apache Drill, Apache Arrow >Reporter: Vitalii Diravka >Priority: Major > Labels: adjustScale, arrow, decimal > > Method org.apache.drill.exec.util.DecimalUtility.adjustScaleMultiply(long > input, int factor) can return the value bigger than long max value. > For example by comparison two decimal18 values 9223372036854775807 and 0.001. > To adjust first value scale this method should return 9223372036854775807 * > 1000 - bigger than long max value. > Class DecimalUtility.java will be a part of org.apache.arrow after renaming > described in [DRILL-4455 Depend on Apache Arrow for Vector and Memory| > https://issues.apache.org/jira/browse/DRILL-4455] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-40) C++: Reinterpret Struct arrays as tables
[ https://issues.apache.org/jira/browse/ARROW-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-40. - Resolution: Won't Fix I suspect if this is every needed it will be implemented as part of some other patch > C++: Reinterpret Struct arrays as tables > > > Key: ARROW-40 > URL: https://issues.apache.org/jira/browse/ARROW-40 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > This is mostly a question of layering container types, but will be provided > as an API convenience. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-61) [Java] Method can return the value bigger than long MAX_VALUE
[ https://issues.apache.org/jira/browse/ARROW-61?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-61: -- Summary: [Java] Method can return the value bigger than long MAX_VALUE (was: Method can return the value bigger than long MAX_VALUE) > [Java] Method can return the value bigger than long MAX_VALUE > - > > Key: ARROW-61 > URL: https://issues.apache.org/jira/browse/ARROW-61 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Environment: Apache Drill, Apache Arrow >Reporter: Vitalii Diravka >Priority: Major > Labels: adjustScale, arrow, decimal > > Method org.apache.drill.exec.util.DecimalUtility.adjustScaleMultiply(long > input, int factor) can return the value bigger than long max value. > For example by comparison two decimal18 values 9223372036854775807 and 0.001. > To adjust first value scale this method should return 9223372036854775807 * > 1000 - bigger than long max value. > Class DecimalUtility.java will be a part of org.apache.arrow after renaming > described in [DRILL-4455 Depend on Apache Arrow for Vector and Memory| > https://issues.apache.org/jira/browse/DRILL-4455] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-4985) [C++] arrow/testing headers are not installed
[ https://issues.apache.org/jira/browse/ARROW-4985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-4985. --- Resolution: Duplicate Fix Version/s: (was: 0.14.0) this was resolved in ARROW-5012 > [C++] arrow/testing headers are not installed > - > > Key: ARROW-4985 > URL: https://issues.apache.org/jira/browse/ARROW-4985 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5068) [Gandiva][Packaging] Fix gandiva nightly builds after the CMake refactor
[ https://issues.apache.org/jira/browse/ARROW-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805862#comment-16805862 ] Praveen Kumar Desabandu commented on ARROW-5068: [~kszucs] I am fixing it as part of ARROW-4959 Could you please review and let me know if you would want me to handle anything more as part of the other JIRA. > [Gandiva][Packaging] Fix gandiva nightly builds after the CMake refactor > > > Key: ARROW-5068 > URL: https://issues.apache.org/jira/browse/ARROW-5068 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Gandiva, Packaging >Reporter: Krisztian Szucs >Priority: Major > > Currently this is the only filing nightly build: > [https://travis-ci.org/kszucs/crossbow/builds/512474452] > > cc [~Pindikura Ravindra] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion
[ https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805787#comment-16805787 ] Jeroen edited comment on ARROW-4844 at 3/30/19 12:10 PM: - For example opencv ships the vendored static libs in a special dir lib/opencv4/3rdparty. Thereby we can statically link to the library, and the vendored libs won't conflict with anything else on the system. I think that is more user friendly than restricting static builds by "refusing to vendor anything at all". {code} [MSYS2 CI] mingw-w64-opencv: Checking Binaries ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_annotation.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_interactive-calibration.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version_win32.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_visualisation.exe ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_calib3d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_core.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_features2d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_flann.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_gapi.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_highgui.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgcodecs.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgproc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_ml.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_objdetect.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_photo.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_stitching.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_video.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_videoio.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libade.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libquirc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/pkgconfig/opencv4.pc {code} was (Author: jeroenooms): For example opencv ships the vendored static libs in a special dir lib/opencv4/3rdparty. I think that is more user friendly than restricting static builds by "refusing to vendor anything at all". {code} [MSYS2 CI] mingw-w64-opencv: Checking Binaries ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_annotation.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_interactive-calibration.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version_win32.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_visualisation.exe ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_calib3d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_core.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_features2d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_flann.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_gapi.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_highgui.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgcodecs.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgproc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_ml.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_objdetect.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_photo.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_stitching.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_video.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_videoio.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libade.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libquirc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/pkgconfig/opencv4.pc {code} > Static libarrow is missing vendored libdouble-conversion > > > Key: ARROW-4844 > URL: https://issues.apache.org/jira/browse/ARROW-4844 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.1 >Reporter: Jeroen >Assignee: Uwe L. Korn >Priority: Major > > When trying to statically link the R bindings to libarrow.a, I get linking > errors which suggest that libdouble-conversion.a was not properly embedded in > libarrow.a. This problem happens on both MacOS and Windows. > Here is the arrow build log: > https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7 > {code} > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, > int*) const' > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > >
[jira] [Commented] (ARROW-4844) Static libarrow is missing vendored libdouble-conversion
[ https://issues.apache.org/jira/browse/ARROW-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16805787#comment-16805787 ] Jeroen commented on ARROW-4844: --- For example opencv ships the vendored static libs in a special dir lib/opencv4/3rdparty. I think that is more user friendly than restricting static builds by "refusing to vendor anything at all". {code} [MSYS2 CI] mingw-w64-opencv: Checking Binaries ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_annotation.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_interactive-calibration.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_version_win32.exe ./pkg/mingw-w64-i686-opencv/mingw32/bin/opencv_visualisation.exe ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_calib3d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_core.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_features2d.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_flann.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_gapi.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_highgui.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgcodecs.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_imgproc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_ml.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_objdetect.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_photo.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_stitching.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_video.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/libopencv_videoio.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libade.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/opencv4/3rdparty/libquirc.a ./pkg/mingw-w64-i686-opencv/mingw32/lib/pkgconfig/opencv4.pc {code} > Static libarrow is missing vendored libdouble-conversion > > > Key: ARROW-4844 > URL: https://issues.apache.org/jira/browse/ARROW-4844 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.1 >Reporter: Jeroen >Assignee: Uwe L. Korn >Priority: Major > > When trying to statically link the R bindings to libarrow.a, I get linking > errors which suggest that libdouble-conversion.a was not properly embedded in > libarrow.a. This problem happens on both MacOS and Windows. > Here is the arrow build log: > https://ci.appveyor.com/project/jeroen/rtools-packages/builds/23015303/job/mtgl6rvfde502iu7 > {code} > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(cast.cc.obj):(.text+0x1c77c): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, > int*) const' > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x5fda): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, > int*) const' > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6097): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToDouble(char const*, int, > int*) const' > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6589): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, > int*) const' > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../x86_64-w64-mingw32/bin/ld.exe: > > C:/rtools40/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/8.3.0/../../../../lib/libarrow.a(converter.cc.obj):(.text+0x6647): > undefined reference to > `double_conversion::StringToDoubleConverter::StringToFloat(char const*, int, > int*) const' > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4959) [Gandiva][Crossbow] Builds broken
[ https://issues.apache.org/jira/browse/ARROW-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4959: -- Labels: pull-request-available (was: ) > [Gandiva][Crossbow] Builds broken > - > > Key: ARROW-4959 > URL: https://issues.apache.org/jira/browse/ARROW-4959 > Project: Apache Arrow > Issue Type: Task >Reporter: Praveen Kumar Desabandu >Assignee: Praveen Kumar Desabandu >Priority: Major > Labels: pull-request-available > > Looks like cross bow builds for Gandiva is broken for the last few days. -- This message was sent by Atlassian JIRA (v7.6.3#76005)