[jira] [Commented] (ARROW-317) [C++] Implement zero-copy Slice method on arrow::Buffer that retains reference to parent

2016-10-13 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573092#comment-15573092
 ] 

Wes McKinney commented on ARROW-317:


I implemented this in pandas prototyping:

https://github.com/pandas-dev/pandas2/pull/40/files#diff-3d95cc22572f59dbfc29d31c19c9bed4R41

per https://github.com/pandas-dev/pandas2/issues/44 my working plan is to make 
pandas depend on libarrow so we can all this buffer wrangling / memory 
management in one place

> [C++] Implement zero-copy Slice method on arrow::Buffer that retains 
> reference to parent
> 
>
> Key: ARROW-317
> URL: https://issues.apache.org/jira/browse/ARROW-317
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>
> This will help prevent referenced memory from being garbage-collected while 
> it is referenced by other buffers (for example, in an IPC setting, where we 
> construct Arrow vectors/arrays without copying the input memory).
> Closely related to this: resizeable buffers that are referenced by other 
> buffers should return error status when calling {{Resize}} (if possible). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-288) Implement Arrow adapter for Spark Datasets

2016-10-13 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573017#comment-15573017
 ] 

Wes McKinney commented on ARROW-288:


hi [~freiss] and [~jlaskowski] -- we made pretty big progress on the C++ side 
to be able to be closer to full interoperability with the Arrow Java library. 
We still need to do some integration testing, but it would be great to start 
exploring the technical plan for making this happen. I was just talking with 
[~rxin] about this the other day, so there may be someone on the Spark side who 
could help with this effort, too. 

The first step is to convert a Spark Dataset into 1 or more Arrow record 
batches, including metadata conversion, and then converting back. The Java <-> 
C++ data movement itself is a comparatively minor task because that is just 
sending a serialized byte buffer through the existing protocol. We can test 
this out in Python using the Arrow <-> pandas bridge which has already been 
completed. 

Let me know if anyone will have the bandwidth to work on this and we can 
coordinate. thanks!

> Implement Arrow adapter for Spark Datasets
> --
>
> Key: ARROW-288
> URL: https://issues.apache.org/jira/browse/ARROW-288
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Java - Vectors
>Reporter: Wes McKinney
>
> It would be valuable for applications that use Arrow to be able to 
> * Convert between Spark DataFrames/Datasets and Java Arrow vectors
> * Send / Receive Arrow record batches / Arrow file format RPCs to / from 
> Spark 
> * Allow PySpark to use Arrow for messaging in UDF evaluation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-336) Run Apache Rat in Travis builds

2016-10-13 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572841#comment-15572841
 ] 

Uwe L. Korn commented on ARROW-336:
---

PR: https://github.com/apache/arrow/pull/174

> Run Apache Rat in Travis builds
> ---
>
> Key: ARROW-336
> URL: https://issues.apache.org/jira/browse/ARROW-336
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ARROW-191) Python: Provide infrastructure for manylinux1 wheels

2016-10-13 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572778#comment-15572778
 ] 

Uwe L. Korn commented on ARROW-191:
---

PR: https://github.com/apache/arrow/pull/173

> Python: Provide infrastructure for manylinux1 wheels 
> -
>
> Key: ARROW-191
> URL: https://issues.apache.org/jira/browse/ARROW-191
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>
> After carefully reading the spec, we actually should be able to build 
> manylinux1 compatiable pyarrow wheels (and then upload them to PyPI).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)