[jira] [Created] (ARROW-1664) Support for xarray.DataArray and xarray.Dataset
Mitar created ARROW-1664: Summary: Support for xarray.DataArray and xarray.Dataset Key: ARROW-1664 URL: https://issues.apache.org/jira/browse/ARROW-1664 Project: Apache Arrow Issue Type: Bug Reporter: Mitar DataArray and Dataset are efficient in-memory representations for multi dimensional data. It would be great if one could share them between processes using Arrow. http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles
[ https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1662: --- Assignee: Stephen Groat > Move OSX Dependency management into brew bundle Brewfiles > - > > Key: ARROW-1662 > URL: https://issues.apache.org/jira/browse/ARROW-1662 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration > Environment: osx >Reporter: Stephen Groat >Assignee: Stephen Groat >Priority: Minor > Labels: pull-request-available > > For dependency management on osx, brew in moving towards using brew bundle > command and Brewfiles heavily. Brewfiles allow a single place for osx > dependencies and brew bundle doesn't error if a dependency is already > installed. This speeds up development environment setup for OSX users -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles
[ https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1662. - Resolution: Fixed Issue resolved by pull request 1143 [https://github.com/apache/arrow/pull/1143] > Move OSX Dependency management into brew bundle Brewfiles > - > > Key: ARROW-1662 > URL: https://issues.apache.org/jira/browse/ARROW-1662 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration > Environment: osx >Reporter: Stephen Groat >Priority: Minor > Labels: pull-request-available > > For dependency management on osx, brew in moving towards using brew bundle > command and Brewfiles heavily. Brewfiles allow a single place for osx > dependencies and brew bundle doesn't error if a dependency is already > installed. This speeds up development environment setup for OSX users -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles
[ https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199769#comment-16199769 ] ASF GitHub Bot commented on ARROW-1662: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1143 > Move OSX Dependency management into brew bundle Brewfiles > - > > Key: ARROW-1662 > URL: https://issues.apache.org/jira/browse/ARROW-1662 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration > Environment: osx >Reporter: Stephen Groat >Priority: Minor > Labels: pull-request-available > > For dependency management on osx, brew in moving towards using brew bundle > command and Brewfiles heavily. Brewfiles allow a single place for osx > dependencies and brew bundle doesn't error if a dependency is already > installed. This speeds up development environment setup for OSX users -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible
[ https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1663: -- Labels: pull-request-available (was: ) > Follow up on ARROW-1347 and make schema backward compatible > --- > > Key: ARROW-1663 > URL: https://issues.apache.org/jira/browse/ARROW-1663 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > Labels: pull-request-available > > ARROW-1347 covered ListVector to have name of the field $data$ instead of > [DEFAULT] > We left FixedSizeListVector behind. > Another case is backward compatibility - if schema was created before > ARROW-1347 was in place application may still suffer from side effects as it > would not be updated based on new code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible
[ https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199693#comment-16199693 ] ASF GitHub Bot commented on ARROW-1663: --- GitHub user yufeldman opened a pull request: https://github.com/apache/arrow/pull/1193 ARROW-1663: use consistent name for null and not-null in FixedSizeLis… …t, add backward compatibility while deserializing schema that was generated before this JIRA checkin You can merge this pull request into a Git repository by running: $ git pull https://github.com/yufeldman/arrow ARROW-1663 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1193.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1193 commit c9d36e1a3ac68e306a993bcc208670a3c3127a17 Author: Yuliya FeldmanDate: 2017-10-11T01:59:32Z ARROW-1663: use consistent name for null and not-null in FixedSizeList, add backward compatibility while deserializing schema that was generated before this JIRA checkin > Follow up on ARROW-1347 and make schema backward compatible > --- > > Key: ARROW-1663 > URL: https://issues.apache.org/jira/browse/ARROW-1663 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Yuliya Feldman >Assignee: Yuliya Feldman > Labels: pull-request-available > > ARROW-1347 covered ListVector to have name of the field $data$ instead of > [DEFAULT] > We left FixedSizeListVector behind. > Another case is backward compatibility - if schema was created before > ARROW-1347 was in place application may still suffer from side effects as it > would not be updated based on new code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible
Yuliya Feldman created ARROW-1663: - Summary: Follow up on ARROW-1347 and make schema backward compatible Key: ARROW-1663 URL: https://issues.apache.org/jira/browse/ARROW-1663 Project: Apache Arrow Issue Type: Bug Components: Java - Vectors Reporter: Yuliya Feldman Assignee: Yuliya Feldman ARROW-1347 covered ListVector to have name of the field $data$ instead of [DEFAULT] We left FixedSizeListVector behind. Another case is backward compatibility - if schema was created before ARROW-1347 was in place application may still suffer from side effects as it would not be updated based on new code. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1630) [Serialization] Support Python datetime objects
[ https://issues.apache.org/jira/browse/ARROW-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199622#comment-16199622 ] ASF GitHub Bot commented on ARROW-1630: --- Github user pcmoritz commented on the issue: https://github.com/apache/arrow/pull/1153 This seems to be working, thanks for the suggestion! The last remaining bit is to get rid of gmtime_r so it runs on windows; it's a little tricky to get it right with negative seconds since the epoch, will look into it later tonight. > [Serialization] Support Python datetime objects > --- > > Key: ARROW-1630 > URL: https://issues.apache.org/jira/browse/ARROW-1630 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Assignee: Philipp Moritz > Labels: pull-request-available > Fix For: 0.8.0 > > > This was brought up in https://github.com/ray-project/ray/issues/1041 > It is related but not the same as > https://issues.apache.org/jira/projects/ARROW/issues/ARROW-1628 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1455) [Python] Add Dockerfile for validating Dask integration outside of usual CI
[ https://issues.apache.org/jira/browse/ARROW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heimir Thor Sverrisson reassigned ARROW-1455: - Assignee: Heimir Thor Sverrisson > [Python] Add Dockerfile for validating Dask integration outside of usual CI > --- > > Key: ARROW-1455 > URL: https://issues.apache.org/jira/browse/ARROW-1455 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Heimir Thor Sverrisson > > Introducing the Dask stack into Arrow's CI might be a bit heavyweight at the > moment, but we can add a testing set up in > https://github.com/apache/arrow/tree/master/python/testing so that this can > be validated on an ad hoc basis in a reproducible way. > see also ARROW-1417 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1658) [Python] Out of bounds dictionary indices causes segfault after converting to pandas
[ https://issues.apache.org/jira/browse/ARROW-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199321#comment-16199321 ] Nick White commented on ARROW-1658: --- I'd definitely encourage failing loudly as early as possible, as trying to work out where corrupt data came from is ...hard > [Python] Out of bounds dictionary indices causes segfault after converting to > pandas > > > Key: ARROW-1658 > URL: https://issues.apache.org/jira/browse/ARROW-1658 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.1 >Reporter: Wes McKinney > Fix For: 0.8.0 > > > Minimal reproduction: > {code} > import numpy as np > import pandas as pd > import pyarrow as pa > > num = 100 > arr = pa.DictionaryArray.from_arrays( > np.arange(0, num), > np.array(['a'], np.object), > np.zeros(num, np.bool), > True) > print(arr.to_pandas()) > {code} > At no time in the Arrow codebase do we validate that the dictionary indices > are in bounds. It seems that pandas is overly trusting of the validity of the > indices. So we should add a method someplace to validate that the dictionary > non-null indices are not out of bounds (perhaps in > {{CategoricalBlock::WriteIndices}}). > As an aside: there may be other times when doing analytics on categorical > data that external data will have out of bounds index values. We should plan > for these and decide whether to raise an exception or treat them as null -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
[ https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199225#comment-16199225 ] ASF GitHub Bot commented on ARROW-1503: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1192 > [Python] Add serialization callbacks for pandas objects in pyarrow.serialize > > > Key: ARROW-1503 > URL: https://issues.apache.org/jira/browse/ARROW-1503 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
[ https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199216#comment-16199216 ] ASF GitHub Bot commented on ARROW-1503: --- Github user pcmoritz commented on the issue: https://github.com/apache/arrow/pull/1192 +1 LGTM > [Python] Add serialization callbacks for pandas objects in pyarrow.serialize > > > Key: ARROW-1503 > URL: https://issues.apache.org/jira/browse/ARROW-1503 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
[ https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199056#comment-16199056 ] ASF GitHub Bot commented on ARROW-1503: --- Github user robertnishihara commented on the issue: https://github.com/apache/arrow/pull/1192 @pcmoritz want to take a look? > [Python] Add serialization callbacks for pandas objects in pyarrow.serialize > > > Key: ARROW-1503 > URL: https://issues.apache.org/jira/browse/ARROW-1503 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)