[jira] [Created] (ARROW-1664) Support for xarray.DataArray and xarray.Dataset

2017-10-10 Thread Mitar (JIRA)
Mitar created ARROW-1664:


 Summary: Support for xarray.DataArray and xarray.Dataset
 Key: ARROW-1664
 URL: https://issues.apache.org/jira/browse/ARROW-1664
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Mitar


DataArray and Dataset are efficient in-memory representations for multi 
dimensional data. It would be great if one could share them between processes 
using Arrow.

http://xarray.pydata.org/en/stable/generated/xarray.DataArray.html#xarray.DataArray
http://xarray.pydata.org/en/stable/generated/xarray.Dataset.html#xarray.Dataset




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles

2017-10-10 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1662:
---

Assignee: Stephen Groat

> Move OSX Dependency management into brew bundle Brewfiles
> -
>
> Key: ARROW-1662
> URL: https://issues.apache.org/jira/browse/ARROW-1662
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
> Environment: osx
>Reporter: Stephen Groat
>Assignee: Stephen Groat
>Priority: Minor
>  Labels: pull-request-available
>
> For dependency management on osx, brew in moving towards using brew bundle 
> command and Brewfiles heavily. Brewfiles allow a single place for osx 
> dependencies and brew bundle doesn't error if a dependency is already 
> installed. This speeds up development environment setup for OSX users



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles

2017-10-10 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-1662.
-
Resolution: Fixed

Issue resolved by pull request 1143
[https://github.com/apache/arrow/pull/1143]

> Move OSX Dependency management into brew bundle Brewfiles
> -
>
> Key: ARROW-1662
> URL: https://issues.apache.org/jira/browse/ARROW-1662
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
> Environment: osx
>Reporter: Stephen Groat
>Priority: Minor
>  Labels: pull-request-available
>
> For dependency management on osx, brew in moving towards using brew bundle 
> command and Brewfiles heavily. Brewfiles allow a single place for osx 
> dependencies and brew bundle doesn't error if a dependency is already 
> installed. This speeds up development environment setup for OSX users



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1662) Move OSX Dependency management into brew bundle Brewfiles

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199769#comment-16199769
 ] 

ASF GitHub Bot commented on ARROW-1662:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1143


> Move OSX Dependency management into brew bundle Brewfiles
> -
>
> Key: ARROW-1662
> URL: https://issues.apache.org/jira/browse/ARROW-1662
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
> Environment: osx
>Reporter: Stephen Groat
>Priority: Minor
>  Labels: pull-request-available
>
> For dependency management on osx, brew in moving towards using brew bundle 
> command and Brewfiles heavily. Brewfiles allow a single place for osx 
> dependencies and brew bundle doesn't error if a dependency is already 
> installed. This speeds up development environment setup for OSX users



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible

2017-10-10 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-1663:
--
Labels: pull-request-available  (was: )

> Follow up on ARROW-1347 and make schema backward compatible
> ---
>
> Key: ARROW-1663
> URL: https://issues.apache.org/jira/browse/ARROW-1663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
>  Labels: pull-request-available
>
> ARROW-1347 covered ListVector to have name of the field $data$ instead of 
> [DEFAULT]
> We left FixedSizeListVector behind.
> Another case is backward compatibility - if schema was created before 
> ARROW-1347 was in place  application may still suffer from side effects as it 
> would not be updated based on new code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199693#comment-16199693
 ] 

ASF GitHub Bot commented on ARROW-1663:
---

GitHub user yufeldman opened a pull request:

https://github.com/apache/arrow/pull/1193

ARROW-1663: use consistent name for null and not-null in FixedSizeLis…

…t, add backward compatibility while deserializing schema that was 
generated before this JIRA checkin

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yufeldman/arrow ARROW-1663

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/arrow/pull/1193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1193


commit c9d36e1a3ac68e306a993bcc208670a3c3127a17
Author: Yuliya Feldman 
Date:   2017-10-11T01:59:32Z

ARROW-1663: use consistent name for null and not-null in FixedSizeList, add 
backward compatibility while deserializing schema that was generated before 
this JIRA checkin




> Follow up on ARROW-1347 and make schema backward compatible
> ---
>
> Key: ARROW-1663
> URL: https://issues.apache.org/jira/browse/ARROW-1663
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java - Vectors
>Reporter: Yuliya Feldman
>Assignee: Yuliya Feldman
>  Labels: pull-request-available
>
> ARROW-1347 covered ListVector to have name of the field $data$ instead of 
> [DEFAULT]
> We left FixedSizeListVector behind.
> Another case is backward compatibility - if schema was created before 
> ARROW-1347 was in place  application may still suffer from side effects as it 
> would not be updated based on new code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ARROW-1663) Follow up on ARROW-1347 and make schema backward compatible

2017-10-10 Thread Yuliya Feldman (JIRA)
Yuliya Feldman created ARROW-1663:
-

 Summary: Follow up on ARROW-1347 and make schema backward 
compatible
 Key: ARROW-1663
 URL: https://issues.apache.org/jira/browse/ARROW-1663
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java - Vectors
Reporter: Yuliya Feldman
Assignee: Yuliya Feldman


ARROW-1347 covered ListVector to have name of the field $data$ instead of 
[DEFAULT]
We left FixedSizeListVector behind.

Another case is backward compatibility - if schema was created before 
ARROW-1347 was in place  application may still suffer from side effects as it 
would not be updated based on new code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1630) [Serialization] Support Python datetime objects

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199622#comment-16199622
 ] 

ASF GitHub Bot commented on ARROW-1630:
---

Github user pcmoritz commented on the issue:

https://github.com/apache/arrow/pull/1153
  
This seems to be working, thanks for the suggestion! The last remaining bit 
is to get rid of gmtime_r so it runs on windows; it's a little tricky to get it 
right with negative seconds since the epoch, will look into it later tonight.


> [Serialization] Support Python datetime objects
> ---
>
> Key: ARROW-1630
> URL: https://issues.apache.org/jira/browse/ARROW-1630
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>
> This was brought up in https://github.com/ray-project/ray/issues/1041
> It is related but not the same as 
> https://issues.apache.org/jira/projects/ARROW/issues/ARROW-1628



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (ARROW-1455) [Python] Add Dockerfile for validating Dask integration outside of usual CI

2017-10-10 Thread Heimir Thor Sverrisson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heimir Thor Sverrisson reassigned ARROW-1455:
-

Assignee: Heimir Thor Sverrisson

> [Python] Add Dockerfile for validating Dask integration outside of usual CI
> ---
>
> Key: ARROW-1455
> URL: https://issues.apache.org/jira/browse/ARROW-1455
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Heimir Thor Sverrisson
>
> Introducing the Dask stack into Arrow's CI might be a bit heavyweight at the 
> moment, but we can add a testing set up in 
> https://github.com/apache/arrow/tree/master/python/testing so that this can 
> be validated on an ad hoc basis in a reproducible way.
> see also ARROW-1417



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1658) [Python] Out of bounds dictionary indices causes segfault after converting to pandas

2017-10-10 Thread Nick White (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199321#comment-16199321
 ] 

Nick White commented on ARROW-1658:
---

I'd definitely encourage failing loudly as early as possible, as trying to work 
out where corrupt data came from is ...hard

> [Python] Out of bounds dictionary indices causes segfault after converting to 
> pandas
> 
>
> Key: ARROW-1658
> URL: https://issues.apache.org/jira/browse/ARROW-1658
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
>Reporter: Wes McKinney
> Fix For: 0.8.0
>
>
> Minimal reproduction:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
>  
> num = 100
> arr = pa.DictionaryArray.from_arrays(
> np.arange(0, num),
> np.array(['a'], np.object),
> np.zeros(num, np.bool),
> True)
> print(arr.to_pandas())
> {code}
> At no time in the Arrow codebase do we validate that the dictionary indices 
> are in bounds. It seems that pandas is overly trusting of the validity of the 
> indices. So we should add a method someplace to validate that the dictionary 
> non-null indices are not out of bounds (perhaps in 
> {{CategoricalBlock::WriteIndices}}).
> As an aside: there may be other times when doing analytics on categorical 
> data that external data will have out of bounds index values. We should plan 
> for these and decide whether to raise an exception or treat them as null



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199225#comment-16199225
 ] 

ASF GitHub Bot commented on ARROW-1503:
---

Github user asfgit closed the pull request at:

https://github.com/apache/arrow/pull/1192


> [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
> 
>
> Key: ARROW-1503
> URL: https://issues.apache.org/jira/browse/ARROW-1503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199216#comment-16199216
 ] 

ASF GitHub Bot commented on ARROW-1503:
---

Github user pcmoritz commented on the issue:

https://github.com/apache/arrow/pull/1192
  
+1 LGTM


> [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
> 
>
> Key: ARROW-1503
> URL: https://issues.apache.org/jira/browse/ARROW-1503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-1503) [Python] Add serialization callbacks for pandas objects in pyarrow.serialize

2017-10-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199056#comment-16199056
 ] 

ASF GitHub Bot commented on ARROW-1503:
---

Github user robertnishihara commented on the issue:

https://github.com/apache/arrow/pull/1192
  
@pcmoritz want to take a look?


> [Python] Add serialization callbacks for pandas objects in pyarrow.serialize
> 
>
> Key: ARROW-1503
> URL: https://issues.apache.org/jira/browse/ARROW-1503
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)