[jira] [Commented] (ARROW-1643) [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390885#comment-16390885
 ] 

ASF GitHub Bot commented on ARROW-1643:
---

xhochy commented on issue #1668: ARROW-1643: [Python] Accept hdfs:// prefixes 
in parquet.read_table and attempt to connect to HDFS
URL: https://github.com/apache/arrow/pull/1668#issuecomment-371409149
 
 
   The error could be legitimate, someone with access to a Windows machine 
should have a look into it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect 
> to HDFS
> -
>
> Key: ARROW-1643
> URL: https://issues.apache.org/jira/browse/ARROW-1643
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1643) [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390449#comment-16390449
 ] 

ASF GitHub Bot commented on ARROW-1643:
---

ehsantn commented on issue #1668: ARROW-1643: [Python] Accept hdfs:// prefixes 
in parquet.read_table and attempt to connect to HDFS
URL: https://github.com/apache/arrow/pull/1668#issuecomment-371325413
 
 
   Appveyor didn't pass after rebase.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect 
> to HDFS
> -
>
> Key: ARROW-1643
> URL: https://issues.apache.org/jira/browse/ARROW-1643
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-640) [Python] Arrow scalar values should have a sensible __hash__ and comparison

2018-03-07 Thread Alex Hagerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Hagerman reassigned ARROW-640:
---

Assignee: Alex Hagerman

> [Python] Arrow scalar values should have a sensible __hash__ and comparison
> ---
>
> Key: ARROW-640
> URL: https://issues.apache.org/jira/browse/ARROW-640
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Miki Tebeka
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> {noformat}
> In [86]: arr = pa.from_pylist([1, 1, 1, 2])
> In [87]: set(arr)
> Out[87]: {1, 2, 1, 1}
> In [88]: arr[0] == arr[1]
> Out[88]: False
> In [89]: arr
> Out[89]: 
> 
> [
>   1,
>   1,
>   1,
>   2
> ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390076#comment-16390076
 ] 

ASF GitHub Bot commented on ARROW-2280:
---

wesm closed pull request #1719: ARROW-2280: [Python] Return the offset for the 
buffers in pyarrow.Array
URL: https://github.com/apache/arrow/pull/1719
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 7899d9dbbd..e785c0ec5c 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -483,10 +483,23 @@ cdef class Array:
 with nogil:
 check_status(ValidateArray(deref(self.ap)))
 
+property offset:
+
+def __get__(self):
+"""
+A relative position into another array's data, to enable zero-copy
+slicing. This value defaults to zero but must be applied on all
+operations with the physical storage buffers.
+"""
+return self.sp_array.get().offset()
+
 def buffers(self):
 """
 Return a list of Buffer objects pointing to this array's physical
 storage.
+
+To correctly interpret these buffers, you need to also apply the offset
+multiplied with the size of the stored data type.
 """
 res = []
 _append_array_buffers(self.sp_array.get().data().get(), res)
diff --git a/python/pyarrow/includes/libarrow.pxd 
b/python/pyarrow/includes/libarrow.pxd
index d95f01661c..456fcca360 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -103,6 +103,7 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil:
 
 int64_t length()
 int64_t null_count()
+int64_t offset()
 Type type_id()
 
 int num_fields()
diff --git a/python/pyarrow/tests/test_array.py 
b/python/pyarrow/tests/test_array.py
index c1131a0023..f034d78b39 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -600,6 +600,15 @@ def test_buffers_primitive():
 assert 1 <= len(null_bitmap) <= 64  # XXX this is varying
 assert bytearray(null_bitmap)[0] == 0b1011
 
+# Slicing does not affect the buffers but the offset
+a_sliced = a[1:]
+buffers = a_sliced.buffers()
+a_sliced.offset == 1
+assert len(buffers) == 2
+null_bitmap = buffers[0].to_pybytes()
+assert 1 <= len(null_bitmap) <= 64  # XXX this is varying
+assert bytearray(null_bitmap)[0] == 0b1011
+
 assert struct.unpack('hhxxh', buffers[1].to_pybytes()) == (1, 2, 4)
 
 a = pa.array(np.int8([4, 5, 6]))


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.Array.buffers should also include the offsets
> --
>
> Key: ARROW-2280
> URL: https://issues.apache.org/jira/browse/ARROW-2280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2280.
-
Resolution: Fixed

Issue resolved by pull request 1719
[https://github.com/apache/arrow/pull/1719]

> [Python] pyarrow.Array.buffers should also include the offsets
> --
>
> Key: ARROW-2280
> URL: https://issues.apache.org/jira/browse/ARROW-2280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390073#comment-16390073
 ] 

ASF GitHub Bot commented on ARROW-2280:
---

wesm commented on issue #1719: ARROW-2280: [Python] Return the offset for the 
buffers in pyarrow.Array
URL: https://github.com/apache/arrow/pull/1719#issuecomment-371255749
 
 
   +1. Travis CI failure is unrelated. Appveyor build looking fine: 
https://ci.appveyor.com/project/xhochy/arrow/build/1.0.623


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.Array.buffers should also include the offsets
> --
>
> Key: ARROW-2280
> URL: https://issues.apache.org/jira/browse/ARROW-2280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390056#comment-16390056
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

xhochy commented on a change in pull request #1702: ARROW-2262: [Python] 
Support slicing on pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#discussion_r172953568
 
 

 ##
 File path: python/pyarrow/table.pxi
 ##
 @@ -77,6 +77,49 @@ cdef class ChunkedArray:
 self._check_nullptr()
 return self.chunked_array.null_count()
 
+def __getitem__(self, key):
+cdef int64_t item, i
+self._check_nullptr()
+if isinstance(key, slice):
+return _normalize_slice(self, key)
+else:
+item = int(key)
+if item >= self.chunked_array.length() or item < 0:
+return IndexError("ChunkedArray selection out of bounds")
+for i in range(self.num_chunks):
 
 Review comment:
   Yes they would benefit from this API but it is much more complicated to 
implement there as `arrow::ChunkedArray` has no type.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390057#comment-16390057
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

xhochy commented on a change in pull request #1702: ARROW-2262: [Python] 
Support slicing on pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#discussion_r172955517
 
 

 ##
 File path: python/pyarrow/table.pxi
 ##
 @@ -77,6 +77,49 @@ cdef class ChunkedArray:
 self._check_nullptr()
 return self.chunked_array.null_count()
 
+def __getitem__(self, key):
+cdef int64_t item, i
+self._check_nullptr()
+if isinstance(key, slice):
+return _normalize_slice(self, key)
+else:
+item = int(key)
 
 Review comment:
   Changed it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390027#comment-16390027
 ] 

ASF GitHub Bot commented on ARROW-2280:
---

xhochy commented on a change in pull request #1719: ARROW-2280: [Python] Return 
the offset for the buffers in pyarrow.Array
URL: https://github.com/apache/arrow/pull/1719#discussion_r172947857
 
 

 ##
 File path: python/pyarrow/array.pxi
 ##
 @@ -483,10 +483,21 @@ cdef class Array:
 with nogil:
 check_status(ValidateArray(deref(self.ap)))
 
+def offset(self):
 
 Review comment:
   Fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.Array.buffers should also include the offsets
> --
>
> Key: ARROW-2280
> URL: https://issues.apache.org/jira/browse/ARROW-2280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2282) [Python] Create StringArray from buffers

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389952#comment-16389952
 ] 

ASF GitHub Bot commented on ARROW-2282:
---

xhochy commented on a change in pull request #1720: ARROW-2282: [Python] Create 
StringArray from buffers
URL: https://github.com/apache/arrow/pull/1720#discussion_r172944392
 
 

 ##
 File path: python/pyarrow/array.pxi
 ##
 @@ -761,8 +761,39 @@ cdef class UnionArray(Array):
 return pyarrow_wrap_array(out)
 
 cdef class StringArray(Array):
-pass
 
+@staticmethod
+def from_buffers(int length, Buffer value_offsets, Buffer data,
+ Buffer null_bitmap=None, int null_count=0,
+ int offset=0):
+"""
+Construct a StringArray from value_offsets and data buffers.
+If there are nulls in the data, also a null_bitmap and the matching
+null_count must be passed.
+
+Parameters
+--
+length : int
+value_offsets : Buffer
+data : Buffer
+null_bitmap : Buffer, optional
+null_count : int, default 0
+offset : int, default 0
+
+Returns
+---
+string_array : StringArray
+"""
+cdef shared_ptr[CBuffer] c_null_bitmap
+cdef shared_ptr[CArray] out
+
+if null_bitmap is not None:
+c_null_bitmap = null_bitmap.buffer
 
 Review comment:
   I used the same defaults as we do in C++, we might also should adjust the 
behaviour there too. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Create StringArray from buffers
> 
>
> Key: ARROW-2282
> URL: https://issues.apache.org/jira/browse/ARROW-2282
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> While we will add a more general-purpose functionality in 
> https://issues.apache.org/jira/browse/ARROW-2281, the interface is more 
> complicate then the constructor that explicitly states all arguments:  
> {{StringArray(int64_t length, const std::shared_ptr& value_offsets, 
> …}}
> Thus I will also expose this explicit constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2282) [Python] Create StringArray from buffers

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389795#comment-16389795
 ] 

ASF GitHub Bot commented on ARROW-2282:
---

wesm commented on a change in pull request #1720: ARROW-2282: [Python] Create 
StringArray from buffers
URL: https://github.com/apache/arrow/pull/1720#discussion_r172912650
 
 

 ##
 File path: python/pyarrow/array.pxi
 ##
 @@ -761,8 +761,39 @@ cdef class UnionArray(Array):
 return pyarrow_wrap_array(out)
 
 cdef class StringArray(Array):
-pass
 
+@staticmethod
+def from_buffers(int length, Buffer value_offsets, Buffer data,
+ Buffer null_bitmap=None, int null_count=0,
+ int offset=0):
+"""
+Construct a StringArray from value_offsets and data buffers.
+If there are nulls in the data, also a null_bitmap and the matching
+null_count must be passed.
+
+Parameters
+--
+length : int
+value_offsets : Buffer
+data : Buffer
+null_bitmap : Buffer, optional
+null_count : int, default 0
+offset : int, default 0
+
+Returns
+---
+string_array : StringArray
+"""
+cdef shared_ptr[CBuffer] c_null_bitmap
+cdef shared_ptr[CArray] out
+
+if null_bitmap is not None:
+c_null_bitmap = null_bitmap.buffer
 
 Review comment:
   Yes


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Create StringArray from buffers
> 
>
> Key: ARROW-2282
> URL: https://issues.apache.org/jira/browse/ARROW-2282
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> While we will add a more general-purpose functionality in 
> https://issues.apache.org/jira/browse/ARROW-2281, the interface is more 
> complicate then the constructor that explicitly states all arguments:  
> {{StringArray(int64_t length, const std::shared_ptr& value_offsets, 
> …}}
> Thus I will also expose this explicit constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2239) [C++] Update build docs for Windows

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389760#comment-16389760
 ] 

ASF GitHub Bot commented on ARROW-2239:
---

MaxRis commented on issue #1722: ARROW-2239: [C++] Update Windows build docs
URL: https://github.com/apache/arrow/pull/1722#issuecomment-371196240
 
 
   @pitrou looks great! Thank you!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Update build docs for Windows
> ---
>
> Key: ARROW-2239
> URL: https://issues.apache.org/jira/browse/ARROW-2239
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We should update the C++ build docs for Windows to recommend use of Ninja and 
> clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2282) [Python] Create StringArray from buffers

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389756#comment-16389756
 ] 

ASF GitHub Bot commented on ARROW-2282:
---

pitrou commented on a change in pull request #1720: ARROW-2282: [Python] Create 
StringArray from buffers
URL: https://github.com/apache/arrow/pull/1720#discussion_r172899796
 
 

 ##
 File path: python/pyarrow/array.pxi
 ##
 @@ -761,8 +761,39 @@ cdef class UnionArray(Array):
 return pyarrow_wrap_array(out)
 
 cdef class StringArray(Array):
-pass
 
+@staticmethod
+def from_buffers(int length, Buffer value_offsets, Buffer data,
+ Buffer null_bitmap=None, int null_count=0,
+ int offset=0):
+"""
+Construct a StringArray from value_offsets and data buffers.
+If there are nulls in the data, also a null_bitmap and the matching
+null_count must be passed.
+
+Parameters
+--
+length : int
+value_offsets : Buffer
+data : Buffer
+null_bitmap : Buffer, optional
+null_count : int, default 0
+offset : int, default 0
+
+Returns
+---
+string_array : StringArray
+"""
+cdef shared_ptr[CBuffer] c_null_bitmap
+cdef shared_ptr[CArray] out
+
+if null_bitmap is not None:
+c_null_bitmap = null_bitmap.buffer
 
 Review comment:
   Shouldn't null_count default to -1 if not passed explicitly and there's a 
null bitmap?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Create StringArray from buffers
> 
>
> Key: ARROW-2282
> URL: https://issues.apache.org/jira/browse/ARROW-2282
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> While we will add a more general-purpose functionality in 
> https://issues.apache.org/jira/browse/ARROW-2281, the interface is more 
> complicate then the constructor that explicitly states all arguments:  
> {{StringArray(int64_t length, const std::shared_ptr& value_offsets, 
> …}}
> Thus I will also expose this explicit constructor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2208) [Python] install issues with jemalloc

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2208.
---

> [Python] install issues with jemalloc
> -
>
> Key: ARROW-2208
> URL: https://issues.apache.org/jira/browse/ARROW-2208
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Jeff Reback
>Assignee: Wes McKinney
>Priority: Blocker
> Fix For: 0.9.0
>
>
> just started seeing these on pandas builds in Travis CI: 
> https://travis-ci.org/pandas-dev/pandas/jobs/345721382
> {code:java}
> from pkg_resources import get_distribution, DistributionNotFound
> try:
> __version__ = get_distribution(__name__).version
> except DistributionNotFound:
># package is not installed
>pass
> 
> 
> >   from pyarrow.lib import cpu_count, set_cpu_count
> E   ImportError: 
> /home/travis/miniconda3/envs/pandas/lib/python2.7/site-packages/pyarrow/../../.././libjemalloc.so.2:
>  cannot allocate memory in static TLS block
> ../../../miniconda3/envs/pandas/lib/python2.7/site-packages/pyarrow/__init__.p
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2153) [C++/Python] Decimal conversion not working for exponential notation

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2153:

Fix Version/s: 0.9.0

> [C++/Python] Decimal conversion not working for exponential notation
> 
>
> Key: ARROW-2153
> URL: https://issues.apache.org/jira/browse/ARROW-2153
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antony Mayi
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> {code:java}
> import pyarrow as pa
> import pandas as pd
> import decimal
> pa.Table.from_pandas(pd.DataFrame({'a': [decimal.Decimal('1.1'), 
> decimal.Decimal('2E+1')]}))
> {code}
>  
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "pyarrow/table.pxi", line 875, in pyarrow.lib.Table.from_pandas 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:44927)
>   File 
> "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 350, in dataframe_to_arrays
> convert_types)]
>   File 
> "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 349, in 
> for c, t in zip(columns_to_convert,
>   File 
> "/home/skadlec/.local/lib/python3.6/site-packages/pyarrow/pandas_compat.py", 
> line 345, in convert_column
> return pa.array(col, from_pandas=True, type=ty)
>   File "pyarrow/array.pxi", line 170, in pyarrow.lib.array 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:29224)
>   File "pyarrow/array.pxi", line 70, in pyarrow.lib._ndarray_to_array 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:28465)
>   File "pyarrow/error.pxi", line 77, in pyarrow.lib.check_status 
> (/arrow/python/build/temp.linux-x86_64-3.6/lib.cxx:8270)
> pyarrow.lib.ArrowInvalid: Expected base ten digit or decimal point but found 
> 'E' instead.
> {code}
> In manual cases clearly we can write {{decimal.Decimal('20')}} instead of 
> {{decimal.Decimal('2E+1')}} but during arithmetical operations inside an 
> application the exponential notation can be produced out of control (it is 
> actually the _normalized_ form of the decimal number) plus for some values 
> the exponential notation is the only form expressing the significance so this 
> should be accepted.
> The [documentation|https://docs.python.org/3/library/decimal.html] suggests 
> using following transformation but that's only possible when the significance 
> information doesn't need to be kept:
> {code:java}
> def remove_exponent(d):
> return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2177) [C++] Remove support for specifying negative scale values in DecimalType

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-2177:

Fix Version/s: 0.9.0

> [C++] Remove support for specifying negative scale values in DecimalType
> 
>
> Key: ARROW-2177
> URL: https://issues.apache.org/jira/browse/ARROW-2177
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Phillip Cloud
>Assignee: Phillip Cloud
>Priority: Major
> Fix For: 0.9.0
>
>
> Allowing both negative and positive scale makes it ambiguous what the scale 
> of a number should be when it using exponential notation, e.g., {{0.01E3}}. 
> Should that have a precision of 4 and a scale of 2 since it's specified as 2 
> points to the right of the decimal and it evaluates to 10? Or a precision of 
> 1 and a scale of -1?
> Current it's the latter, but I think it should be the former.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1952:

Fix Version/s: (was: 0.9.0)
   JS-0.3.0

> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Priority: Minor
> Fix For: JS-0.3.0
>
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners in userland.  Doing it there seems 
> brittle, error-prone, incurs friction for adoption, and if put in the core 
> lib, enable reuse across libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2257) [C++] Add high-level option to toggle CXX11 ABI

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-2257.
---

> [C++] Add high-level option to toggle CXX11 ABI
> ---
>
> Key: ARROW-2257
> URL: https://issues.apache.org/jira/browse/ARROW-2257
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.9.0
>
>
> Using gcc-4.8-based toolchain libraries from conda-forge I ran into the 
> following failure when building on Ubuntu 16.04 with clang-5.0
> {code}
> [48/48] Linking CXX executable debug/python-test
> FAILED: debug/python-test 
> : && /usr/bin/ccache /usr/bin/clang++-5.0  -ggdb -O0  -Weverything 
> -Wno-c++98-compat -Wno-c++98-compat-pedantic -Wno-deprecated 
> -Wno-weak-vtables -Wno-padded -Wno-comma -Wno-unused-parameter 
> -Wno-unused-template -Wno-undef -Wno-shadow -Wno-switch-enum 
> -Wno-exit-time-destructors -Wno-global-constructors 
> -Wno-weak-template-vtables -Wno-undefined-reinterpret-cast 
> -Wno-implicit-fallthrough -Wno-unreachable-code-return -Wno-float-equal 
> -Wno-missing-prototypes -Wno-old-style-cast -Wno-covered-switch-default 
> -Wno-cast-align -Wno-vla-extension -Wno-shift-sign-overflow 
> -Wno-used-but-marked-unused -Wno-missing-variable-declarations 
> -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Wno-sign-conversion 
> -Wno-disabled-macro-expansion -Wno-gnu-folding-constant 
> -Wno-reserved-id-macro -Wno-range-loop-analysis -Wno-double-promotion 
> -Wno-undefined-func-template -Wno-zero-as-null-pointer-constant 
> -Wno-unknown-warning-option -Werror -std=c++11 -msse3 -maltivec -Werror 
> -D_GLIBCXX_USE_CXX11_ABI=0 -Qunused-arguments  -fsanitize=address 
> -DADDRESS_SANITIZER -fsanitize-coverage=trace-pc-guard -g  -rdynamic 
> src/arrow/python/CMakeFiles/python-test.dir/python-test.cc.o  -o 
> debug/python-test  
> -Wl,-rpath,/home/wesm/code/arrow/cpp/build/debug:/home/wesm/miniconda/envs/arrow-dev/lib:/home/wesm/cpp-toolchain/lib
>  debug/libarrow_python_test_main.a debug/libarrow_python.a 
> debug/libarrow.so.0.0.0 
> /home/wesm/miniconda/envs/arrow-dev/lib/libpython3.6m.so 
> /home/wesm/cpp-toolchain/lib/libgtest.a -lpthread -ldl 
> orc_ep-install/lib/liborc.a /home/wesm/cpp-toolchain/lib/libprotobuf.a 
> /home/wesm/cpp-toolchain/lib/libzstd.a /home/wesm/cpp-toolchain/lib/libz.a 
> /home/wesm/cpp-toolchain/lib/libsnappy.a 
> /home/wesm/cpp-toolchain/lib/liblz4.a 
> /home/wesm/cpp-toolchain/lib/libbrotlidec-static.a 
> /home/wesm/cpp-toolchain/lib/libbrotlienc-static.a 
> /home/wesm/cpp-toolchain/lib/libbrotlicommon-static.a -lpthread 
> -Wl,-rpath-link,/home/wesm/cpp-toolchain/lib && :
> debug/libarrow.so.0.0.0: undefined reference to 
> `orc::ParseError::ParseError(std::string const&)'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::__cxx11::basic_string  std::char_traits, std::allocator > const&, unsigned char*)'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, 
> std::__cxx11::basic_string > const&, google::protobuf::io::CodedOutputStream*)'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::internal::fixed_address_empty_string[abi:cxx11]'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*,
>  std::__cxx11::basic_string std::allocator >*)'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::Message::GetTypeName[abi:cxx11]() const'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::MessageLite::SerializeToString(std::__cxx11::basic_string  std::char_traits, std::allocator >*) const'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::internal::WireFormatLite::WriteString(int, 
> std::__cxx11::basic_string > const&, google::protobuf::io::CodedOutputStream*)'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, 
> void (*)(std::__cxx11::basic_string std::allocator > const&))'
> debug/libarrow.so.0.0.0: undefined reference to 
> `google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, 
> std::__cxx11::basic_string > const&, google::protobuf::io::CodedOutputStream*)'
> debug/libarrow.so.0.0.0: undefined reference 

[jira] [Closed] (ARROW-1960) [Python] Pre-emptively import TensorFlow if it is available

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1960.
---

> [Python] Pre-emptively import TensorFlow if it is available
> ---
>
> Key: ARROW-1960
> URL: https://issues.apache.org/jira/browse/ARROW-1960
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>
> To work around some ABI incompatibility issues with libstdc++ (TF is using a 
> newer compiler), we should consider importing tensorflow first if it is 
> available before loading Arrow's shared libraries built with the manylinux1 
> toolchain. See discussion in
> https://github.com/apache/arrow/issues/1450
> One question is whether TensorFlow's symbols will clash with Arrow's in some 
> way that might impact its functioning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1370) wrong signed to unsigned conversion in js

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1370:

Fix Version/s: JS-0.3.0

> wrong signed to unsigned conversion in js
> -
>
> Key: ARROW-1370
> URL: https://issues.apache.org/jira/browse/ARROW-1370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Saman Amraii
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.3.0
>
>
> In JavaScript reader, signed integer vectors are initialized with unsigned 
> int vectors. This is in file types.ts, lines 158, 159, 160 and 167, 168, 169.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-1952) [JS] 32b dense vector coercion

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-1952:
---

Assignee: Paul Taylor

> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Assignee: Paul Taylor
>Priority: Minor
> Fix For: JS-0.3.0
>
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners in userland.  Doing it there seems 
> brittle, error-prone, incurs friction for adoption, and if put in the core 
> lib, enable reuse across libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1370) [JS] wrong signed to unsigned conversion in js

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1370:

Summary: [JS] wrong signed to unsigned conversion in js  (was: wrong signed 
to unsigned conversion in js)

> [JS] wrong signed to unsigned conversion in js
> --
>
> Key: ARROW-1370
> URL: https://issues.apache.org/jira/browse/ARROW-1370
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Saman Amraii
>Assignee: Paul Taylor
>Priority: Major
> Fix For: JS-0.3.0
>
>
> In JavaScript reader, signed integer vectors are initialized with unsigned 
> int vectors. This is in file types.ts, lines 158, 159, 160 and 167, 168, 169.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-1660) [Python] pandas field values are messed up across rows

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney closed ARROW-1660.
---

> [Python] pandas field values are messed up across rows
> --
>
> Key: ARROW-1660
> URL: https://issues.apache.org/jira/browse/ARROW-1660
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.7.1
> Environment: 4.4.0-72-generic #93-Ubuntu SMP x86_64, python3
>Reporter: MIkhail Osckin
>Assignee: Wes McKinney
>Priority: Major
>
> I have the following scala case class to store sparse matrix data to read it 
> later using python
> {code:java}
> case class CooVector(
> id: Int,
> row_ids: Seq[Int],
> rowsIdx: Seq[Int],
> colIdx: Seq[Int],
> data: Seq[Double])
> {code}
> I save the dataset of this type to multiple parquet files using spark and 
> then read it using pyarrow.parquet and convert the result to pandas dataset.
> The problem i have is that some values end up in wrong rows, for example, 
> row_ids might end up in wrong cooVector row. I have no idea what the reason 
> is but might be it is related to the fact that the fields are of variable 
> sizes. And everything is correct if i read it using spark. Also i checked 
> to_pydict method and the result is correct, so seems like the problem 
> somewhere in to_pandas method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-1952) [JS] 32b dense vector coercion

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-1952:

Fix Version/s: 0.9.0

> [JS] 32b dense vector coercion
> --
>
> Key: ARROW-1952
> URL: https://issues.apache.org/jira/browse/ARROW-1952
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: JavaScript
>Reporter: Leo Meyerovich
>Priority: Minor
> Fix For: 0.9.0
>
>
> JS APIs, for better or worse, is quite 32b centric. Currently, JS Arrow does 
> a good job of information-preserving flattening, e.g., 64i vector into an 
> array of [hi, lo] int32s.  Something similar for timestamps. ... However  
> in getting some Arrow code to load into a legacy system, I'm finding myself 
> to be writing a _lot_ of lossy flatteners in userland.  Doing it there seems 
> brittle, error-prone, incurs friction for adoption, and if put in the core 
> lib, enable reuse across libs.
> I can imagine at least 2 reasonable interfaces for this:
> (1) 64b Vector -> 32b flat array (typed or otherwise). This is the naive, 
> simple thing.
> (2) 64b Vector -> 32b Vector , and reuse whatever 32b vector -> flat array 
> logic will available anyways. This helps stay in the symbolic abstraction 
> longer, so may be smarter.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2239) [C++] Update build docs for Windows

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389750#comment-16389750
 ] 

ASF GitHub Bot commented on ARROW-2239:
---

pitrou commented on issue #1722: ARROW-2239: [C++] Update Windows build docs
URL: https://github.com/apache/arrow/pull/1722#issuecomment-371192587
 
 
   @MaxRis does this look to you? I didn't want to switch the debug build 
instructions to Ninja as I haven't tested that works :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Update build docs for Windows
> ---
>
> Key: ARROW-2239
> URL: https://issues.apache.org/jira/browse/ARROW-2239
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We should update the C++ build docs for Windows to recommend use of Ninja and 
> clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2239) [C++] Update build docs for Windows

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389751#comment-16389751
 ] 

ASF GitHub Bot commented on ARROW-2239:
---

pitrou commented on issue #1722: ARROW-2239: [C++] Update Windows build docs
URL: https://github.com/apache/arrow/pull/1722#issuecomment-371192587
 
 
   @MaxRis does this look ok to you? I didn't want to switch the debug build 
instructions to Ninja as I haven't tested that works :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Update build docs for Windows
> ---
>
> Key: ARROW-2239
> URL: https://issues.apache.org/jira/browse/ARROW-2239
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We should update the C++ build docs for Windows to recommend use of Ninja and 
> clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2239) [C++] Update build docs for Windows

2018-03-07 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2239:
--
Labels: pull-request-available  (was: )

> [C++] Update build docs for Windows
> ---
>
> Key: ARROW-2239
> URL: https://issues.apache.org/jira/browse/ARROW-2239
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We should update the C++ build docs for Windows to recommend use of Ninja and 
> clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2239) [C++] Update build docs for Windows

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389749#comment-16389749
 ] 

ASF GitHub Bot commented on ARROW-2239:
---

pitrou opened a new pull request #1722: ARROW-2239: [C++] Update Windows build 
docs
URL: https://github.com/apache/arrow/pull/1722
 
 
   Recommend Ninja and clcache.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Update build docs for Windows
> ---
>
> Key: ARROW-2239
> URL: https://issues.apache.org/jira/browse/ARROW-2239
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++, Documentation
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> We should update the C++ build docs for Windows to recommend use of Ninja and 
> clcache for faster builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389748#comment-16389748
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

wesm commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172896221
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1566,6 +1568,113 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+  int64_t null_count = 0;
+
+  // Compute null bitmap and store it as a Boolean Array to include it
+  // in the rechunking below
+  {
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+groups.push_back({std::make_shared(length_, null_bitmap_)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+const auto& group = groups.back();
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+  for (const auto& group : groups) {
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t nchunks = groups[0].size();
+  for (size_t chunk = 0; chunk < nchunks; chunk++) {
+// First group has the null bitmaps as Boolean Arrays
+const auto& null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::BOOL);
+DCHECK_EQ(null_data->buffers.size(), 2);
+const auto& null_buffer = null_data->buffers[1];
+// Careful: the rechunked null bitmap may have a non-zero offset
+// to its buffer, and it may not even start on a byte boundary
+int64_t null_offset = null_data->offset;
+std::shared_ptr fixed_null_buffer;
+
+if (!null_buffer) {
 
 Review comment:
   Good question. We haven't really done anything with sliced StructArray yet. 
With the way that `Array::Slice` works, the parent/struct offset should be 
added to whatever offset is in the child arrays. So here the safest thing then 
is probably to copy the bitmap. Might need to think about it some more


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback 

[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389731#comment-16389731
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172889299
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1566,6 +1568,113 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+  int64_t null_count = 0;
+
+  // Compute null bitmap and store it as a Boolean Array to include it
+  // in the rechunking below
+  {
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+groups.push_back({std::make_shared(length_, null_bitmap_)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+const auto& group = groups.back();
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+  for (const auto& group : groups) {
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t nchunks = groups[0].size();
+  for (size_t chunk = 0; chunk < nchunks; chunk++) {
+// First group has the null bitmaps as Boolean Arrays
+const auto& null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::BOOL);
+DCHECK_EQ(null_data->buffers.size(), 2);
+const auto& null_buffer = null_data->buffers[1];
+// Careful: the rechunked null bitmap may have a non-zero offset
+// to its buffer, and it may not even start on a byte boundary
+int64_t null_offset = null_data->offset;
+std::shared_ptr fixed_null_buffer;
+
+if (!null_buffer) {
 
 Review comment:
   Given how slicing is implemented, I'm assuming the offset is used when 
looking into the child arrays as well...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", 

[jira] [Commented] (ARROW-2280) [Python] pyarrow.Array.buffers should also include the offsets

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389726#comment-16389726
 ] 

ASF GitHub Bot commented on ARROW-2280:
---

wesm commented on a change in pull request #1719: ARROW-2280: [Python] Return 
the offset for the buffers in pyarrow.Array
URL: https://github.com/apache/arrow/pull/1719#discussion_r172887921
 
 

 ##
 File path: python/pyarrow/array.pxi
 ##
 @@ -483,10 +483,21 @@ cdef class Array:
 with nogil:
 check_status(ValidateArray(deref(self.ap)))
 
+def offset(self):
 
 Review comment:
   Perhaps make this a property? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] pyarrow.Array.buffers should also include the offsets
> --
>
> Key: ARROW-2280
> URL: https://issues.apache.org/jira/browse/ARROW-2280
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> Currently we only return the buffers but they don't make sense without the 
> offsets for them, esp. the validity bitmap will have a non-zero offset in 
> most cases where the array was sliced.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389719#comment-16389719
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172887556
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1566,6 +1568,113 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+  int64_t null_count = 0;
+
+  // Compute null bitmap and store it as a Boolean Array to include it
+  // in the rechunking below
+  {
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+groups.push_back({std::make_shared(length_, null_bitmap_)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+const auto& group = groups.back();
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+  for (const auto& group : groups) {
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t nchunks = groups[0].size();
+  for (size_t chunk = 0; chunk < nchunks; chunk++) {
+// First group has the null bitmaps as Boolean Arrays
+const auto& null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::BOOL);
+DCHECK_EQ(null_data->buffers.size(), 2);
+const auto& null_buffer = null_data->buffers[1];
+// Careful: the rechunked null bitmap may have a non-zero offset
+// to its buffer, and it may not even start on a byte boundary
+int64_t null_offset = null_data->offset;
+std::shared_ptr fixed_null_buffer;
+
+if (!null_buffer) {
 
 Review comment:
   Hmm... is the offset used only for the null bitmap or for looking into the 
child arrays as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in 

[jira] [Resolved] (ARROW-2238) [C++] Detect clcache in cmake configuration

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2238.
-
   Resolution: Fixed
Fix Version/s: 0.9.0

Issue resolved by pull request 1684
[https://github.com/apache/arrow/pull/1684]

> [C++] Detect clcache in cmake configuration
> ---
>
> Key: ARROW-2238
> URL: https://issues.apache.org/jira/browse/ARROW-2238
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> By default Windows builds should use clcache if installed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389716#comment-16389716
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

wesm commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172886127
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1566,6 +1568,113 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+  int64_t null_count = 0;
+
+  // Compute null bitmap and store it as a Boolean Array to include it
+  // in the rechunking below
+  {
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+groups.push_back({std::make_shared(length_, null_bitmap_)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+const auto& group = groups.back();
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+  for (const auto& group : groups) {
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t nchunks = groups[0].size();
+  for (size_t chunk = 0; chunk < nchunks; chunk++) {
+// First group has the null bitmaps as Boolean Arrays
+const auto& null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::BOOL);
+DCHECK_EQ(null_data->buffers.size(), 2);
+const auto& null_buffer = null_data->buffers[1];
+// Careful: the rechunked null bitmap may have a non-zero offset
+// to its buffer, and it may not even start on a byte boundary
+int64_t null_offset = null_data->offset;
+std::shared_ptr fixed_null_buffer;
+
+if (!null_buffer) {
 
 Review comment:
   I'm wondering if we can use the struct's offset parameter here and simply 
share the buffer between each array without copying


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File 

[jira] [Commented] (ARROW-1894) [Python] Treat CPython memoryview or buffer objects equivalently to pyarrow.Buffer in pyarrow.serialize

2018-03-07 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389715#comment-16389715
 ] 

Antoine Pitrou commented on ARROW-1894:
---

Well, memoryviews don't support pickling, because it's not clear generally what 
the semantics should be :)  (though IIRC cloudpickle tries to support them 
anyway)

> [Python] Treat CPython memoryview or buffer objects equivalently to 
> pyarrow.Buffer in pyarrow.serialize
> ---
>
> Key: ARROW-1894
> URL: https://issues.apache.org/jira/browse/ARROW-1894
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> These should be treated as Buffer-like on serialize. We should consider how 
> to "box" the buffers as the appropriate kind of object (Buffer, memoryview, 
> etc.) when being deserialized



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2283) [C++] Support Arrow C++ installed in /usr detection by pkg-config

2018-03-07 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-2283.
-
Resolution: Fixed

Issue resolved by pull request 1721
[https://github.com/apache/arrow/pull/1721]

> [C++] Support Arrow C++ installed in /usr detection by pkg-config
> -
>
> Key: ARROW-2283
> URL: https://issues.apache.org/jira/browse/ARROW-2283
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2283) [C++] Support Arrow C++ installed in /usr detection by pkg-config

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389709#comment-16389709
 ] 

ASF GitHub Bot commented on ARROW-2283:
---

wesm closed pull request #1721: ARROW-2283: [C++] Support Arrow C++ installed 
in /usr detection by pkg-config
URL: https://github.com/apache/arrow/pull/1721
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/cmake_modules/FindArrow.cmake 
b/cpp/cmake_modules/FindArrow.cmake
index 70defd6525..0a1789a8f6 100644
--- a/cpp/cmake_modules/FindArrow.cmake
+++ b/cpp/cmake_modules/FindArrow.cmake
@@ -25,6 +25,7 @@
 #  ARROW_FOUND, whether arrow has been found
 
 include(FindPkgConfig)
+include(GNUInstallDirs)
 
 if ("$ENV{ARROW_HOME}" STREQUAL "")
   pkg_check_modules(ARROW arrow)
@@ -33,6 +34,16 @@ if ("$ENV{ARROW_HOME}" STREQUAL "")
 message(STATUS "Arrow ABI version: ${ARROW_ABI_VERSION}")
 pkg_get_variable(ARROW_SO_VERSION arrow so_version)
 message(STATUS "Arrow SO version: ${ARROW_SO_VERSION}")
+if ("${ARROW_INCLUDE_DIRS}" STREQUAL "")
+  set(ARROW_INCLUDE_DIRS "/usr/${CMAKE_INSTALL_INCLUDEDIR}")
+endif()
+if ("${ARROW_LIBRARY_DIRS}" STREQUAL "")
+  set(ARROW_LIBRARY_DIRS "/usr/${CMAKE_INSTALL_LIBDIR}")
+  if (EXISTS "/etc/debian_version" AND CMAKE_LIBRARY_ARCHITECTURE)
+set(ARROW_LIBRARY_DIRS
+  "${ARROW_LIBRARY_DIRS}/${CMAKE_LIBRARY_ARCHITECTURE}")
+  endif()
+endif()
 set(ARROW_INCLUDE_DIR ${ARROW_INCLUDE_DIRS})
 set(ARROW_LIBS ${ARROW_LIBRARY_DIRS})
 set(ARROW_SEARCH_LIB_PATH ${ARROW_LIBRARY_DIRS})


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Support Arrow C++ installed in /usr detection by pkg-config
> -
>
> Key: ARROW-2283
> URL: https://issues.apache.org/jira/browse/ARROW-2283
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Affects Versions: 0.8.0
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1894) [Python] Treat CPython memoryview or buffer objects equivalently to pyarrow.Buffer in pyarrow.serialize

2018-03-07 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389703#comment-16389703
 ] 

Wes McKinney commented on ARROW-1894:
-

The purpose is to avoid copying during deserialization. I think that a 
{{memoryview}} object in a collection would be pickled now instead of sent as a 
sidecar (like Buffer, ndarray, etc.)

> [Python] Treat CPython memoryview or buffer objects equivalently to 
> pyarrow.Buffer in pyarrow.serialize
> ---
>
> Key: ARROW-1894
> URL: https://issues.apache.org/jira/browse/ARROW-1894
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> These should be treated as Buffer-like on serialize. We should consider how 
> to "box" the buffers as the appropriate kind of object (Buffer, memoryview, 
> etc.) when being deserialized



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389697#comment-16389697
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on issue #1635: ARROW-2142: [Python] Allow conversion from 
Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#issuecomment-371177216
 
 
   AppVeyor build at https://ci.appveyor.com/project/pitrou/arrow/build/1.0.170


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement  conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389696#comment-16389696
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

pitrou commented on issue #1702: ARROW-2262: [Python] Support slicing on 
pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#issuecomment-371177042
 
 
   @xhochy actually, it's probably because `arrow::ChunkedArray::chunk()` takes 
a C int but you are trying to pass a int64_t.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389641#comment-16389641
 ] 

ASF GitHub Bot commented on ARROW-2238:
---

pitrou commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in 
cmake configuration
URL: https://github.com/apache/arrow/pull/1684#issuecomment-371159300
 
 
   @xhochy, yes, it's ready for merging.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Detect clcache in cmake configuration
> ---
>
> Key: ARROW-2238
> URL: https://issues.apache.org/jira/browse/ARROW-2238
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> By default Windows builds should use clcache if installed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389626#comment-16389626
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172858790
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -1463,6 +1463,124 @@ def test_structarray(self):
 series = pd.Series(arr.to_pandas())
 tm.assert_series_equal(series, expected)
 
+def test_from_numpy(self):
+dt = np.dtype([('x', np.int32),
+   (('y_title', 'y'), np.bool_)])
+ty = pa.struct([pa.field('x', pa.int32()),
+pa.field('y', pa.bool_())])
+
+data = np.array([], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == []
+
+data = np.array([(42, True), (43, False)], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == [{'x': 42, 'y': True},
+   {'x': 43, 'y': False}]
+
+# With mask
+arr = pa.array(data, mask=np.bool_([False, True]), type=ty)
+assert arr.to_pylist() == [{'x': 42, 'y': True}, None]
+
+# Trivial struct type
+dt = np.dtype([])
+ty = pa.struct([])
+
+data = np.array([], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == []
+
+data = np.array([(), ()], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == [{}, {}]
+
+def test_from_numpy_nested(self):
+dt = np.dtype([('x', np.dtype([('xx', np.int8),
+   ('yy', np.bool_)])),
+   ('y', np.int16)])
+ty = pa.struct([pa.field('x', pa.struct([pa.field('xx', pa.int8()),
+ pa.field('yy', pa.bool_())])),
+pa.field('y', pa.int16())])
+
+data = np.array([], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == []
+
+data = np.array([((1, True), 2), ((3, False), 4)], dtype=dt)
+arr = pa.array(data, type=ty)
+assert arr.to_pylist() == [{'x': {'xx': 1, 'yy': True}, 'y': 2},
+   {'x': {'xx': 3, 'yy': False}, 'y': 4}]
+
+@pytest.mark.large_memory
+def test_from_numpy_large(self):
+# Exercise rechunking + nulls
+target_size = 3 * 1024**3  # 4GB
+dt = np.dtype([('x', np.float64), ('y', 'object')])
+bs = 65536 - dt.itemsize
+block = b'.' * bs
+n = target_size // (bs + dt.itemsize)
+data = np.zeros(n, dtype=dt)
+data['x'] = np.random.random_sample(n)
+data['y'] = block
+# Add implicit nulls
+data['x'][data['x'] < 0.2] = np.nan
+
+ty = pa.struct([pa.field('x', pa.float64()),
+pa.field('y', pa.binary(bs))])
+arr = pa.array(data, type=ty, from_pandas=True)
+assert arr.num_chunks == 2
+
+def iter_chunked_array(arr):
+for chunk in arr.iterchunks():
+for item in chunk:
+yield item
+
+def check(arr, data, mask=None):
 
 Review comment:
   Not sure whether there's a more compact form of writing this function...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> 

[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389625#comment-16389625
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172858643
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1566,6 +1568,113 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+  int64_t null_count = 0;
+
+  // Compute null bitmap and store it as a Boolean Array to include it
+  // in the rechunking below
+  {
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+groups.push_back({std::make_shared(length_, null_bitmap_)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+const auto& group = groups.back();
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+  for (const auto& group : groups) {
+int64_t n = 0;
+for (const auto& array : group) {
+  n += array->length();
+}
+  }
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t nchunks = groups[0].size();
+  for (size_t chunk = 0; chunk < nchunks; chunk++) {
+// First group has the null bitmaps as Boolean Arrays
+const auto& null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::BOOL);
+DCHECK_EQ(null_data->buffers.size(), 2);
+const auto& null_buffer = null_data->buffers[1];
+// Careful: the rechunked null bitmap may have a non-zero offset
+// to its buffer, and it may not even start on a byte boundary
+int64_t null_offset = null_data->offset;
+std::shared_ptr fixed_null_buffer;
+
+if (!null_buffer) {
 
 Review comment:
   Is there a more idiomatic way to write this fixup step? Is this a primitive 
we want to expose somewhere?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 

[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389623#comment-16389623
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on issue #1635: ARROW-2142: [Python] Allow conversion from 
Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#issuecomment-371155236
 
 
   Ok, so I fixed the null bitmap offset issue and wrote a large memory test 
exercising it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement  conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389594#comment-16389594
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

pitrou commented on issue #1702: ARROW-2262: [Python] Support slicing on 
pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#issuecomment-371149892
 
 
   @xhochy what does lib.cxx say around the lines mentioned above? That's 
assuming your local cython instance produces exactly the same output as 
AppVeyor does...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389547#comment-16389547
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

pitrou commented on a change in pull request #1702: ARROW-2262: [Python] 
Support slicing on pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#discussion_r172840532
 
 

 ##
 File path: python/pyarrow/table.pxi
 ##
 @@ -77,6 +77,49 @@ cdef class ChunkedArray:
 self._check_nullptr()
 return self.chunked_array.null_count()
 
+def __getitem__(self, key):
+cdef int64_t item, i
+self._check_nullptr()
+if isinstance(key, slice):
+return _normalize_slice(self, key)
+else:
+item = int(key)
+if item >= self.chunked_array.length() or item < 0:
+return IndexError("ChunkedArray selection out of bounds")
+for i in range(self.num_chunks):
 
 Review comment:
   I'm curious, can't this be implemented on the C++ side instead? I suppose 
C++ users may benefit from such an API as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2238) [C++] Detect clcache in cmake configuration

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389548#comment-16389548
 ] 

ASF GitHub Bot commented on ARROW-2238:
---

xhochy commented on issue #1684: ARROW-2238: [C++] Detect and use clcache in 
cmake configuration
URL: https://github.com/apache/arrow/pull/1684#issuecomment-371138092
 
 
   @MaxRis @pitrou This also looks fine from my side but as you both are more 
qualified on that matter: Is this ready to be merged?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Detect clcache in cmake configuration
> ---
>
> Key: ARROW-2238
> URL: https://issues.apache.org/jira/browse/ARROW-2238
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
>
> By default Windows builds should use clcache if installed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389545#comment-16389545
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

pitrou commented on a change in pull request #1702: ARROW-2262: [Python] 
Support slicing on pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#discussion_r172840289
 
 

 ##
 File path: python/pyarrow/table.pxi
 ##
 @@ -77,6 +77,49 @@ cdef class ChunkedArray:
 self._check_nullptr()
 return self.chunked_array.null_count()
 
+def __getitem__(self, key):
+cdef int64_t item, i
+self._check_nullptr()
+if isinstance(key, slice):
+return _normalize_slice(self, key)
+else:
+item = int(key)
 
 Review comment:
   You don't want to allow non-integer types such as float here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389543#comment-16389543
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

xhochy commented on issue #1702: ARROW-2262: [Python] Support slicing on 
pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#issuecomment-371137584
 
 
   Build fails with 
   
   ```
   C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45584): 
warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of 
data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj]
   
3701C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45669): 
warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of 
data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj]
   ```
   
   A review that points me to the problematic code would be appreciated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2262) [Python] Support slicing on pyarrow.ChunkedArray

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389544#comment-16389544
 ] 

ASF GitHub Bot commented on ARROW-2262:
---

xhochy commented on issue #1702: ARROW-2262: [Python] Support slicing on 
pyarrow.ChunkedArray
URL: https://github.com/apache/arrow/pull/1702#issuecomment-371137584
 
 
   Build fails with 
   
   ```
   C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45584): 
warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of 
data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj]
   C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.cxx(45669): 
warning C4244: 'argument': conversion from 'int64_t' to 'int', possible loss of 
data [C:\projects\arrow\python\build\temp.win-amd64-3.6\Release\lib.vcxproj]
   ```
   
   A review that points me to the problematic code would be appreciated.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Support slicing on pyarrow.ChunkedArray
> 
>
> Key: ARROW-2262
> URL: https://issues.apache.org/jira/browse/ARROW-2262
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2287) [Python] chunked array not iterable, not indexable

2018-03-07 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389542#comment-16389542
 ] 

Uwe L. Korn commented on ARROW-2287:


This is partly addressed in https://github.com/apache/arrow/pull/1702

> [Python] chunked array not iterable, not indexable
> --
>
> Key: ARROW-2287
> URL: https://issues.apache.org/jira/browse/ARROW-2287
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Priority: Major
>
> It would be useful to access individual elements of a chunked array either 
> through iteration or indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2287) [Python] chunked array not iterable, not indexable

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2287:
-

 Summary: [Python] chunked array not iterable, not indexable
 Key: ARROW-2287
 URL: https://issues.apache.org/jira/browse/ARROW-2287
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


It would be useful to access individual elements of a chunked array either 
through iteration or indexing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2286) [Python] Allow subscripting pyarrow.lib.StructValue

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2286:
-

 Summary: [Python] Allow subscripting pyarrow.lib.StructValue
 Key: ARROW-2286
 URL: https://issues.apache.org/jira/browse/ARROW-2286
 Project: Apache Arrow
  Issue Type: Wish
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> obj
{'x': 42, 'y': True}
>>> type(obj)
pyarrow.lib.StructValue
>>> obj['x']
Traceback (most recent call last):
  File "", line 1, in 
    obj['x']
TypeError: 'pyarrow.lib.StructValue' object is not subscriptable
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2285) [Python] Can't convert Numpy string arrays

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2285:
-

 Summary: [Python] Can't convert Numpy string arrays
 Key: ARROW-2285
 URL: https://issues.apache.org/jira/browse/ARROW-2285
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.8.0
Reporter: Antoine Pitrou


{code:python}
>>> arr = np.array([b'foo', b'bar'], dtype='S3')
>>> pa.array(arr, type=pa.binary(3))
Traceback (most recent call last):
  File "", line 1, in 
pa.array(arr, type=pa.binary(3))
  File "array.pxi", line 177, in pyarrow.lib.array
  File "array.pxi", line 77, in pyarrow.lib._ndarray_to_array
  File "error.pxi", line 85, in pyarrow.lib.check_status
ArrowNotImplementedError: 
/home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1661 code: 
converter.Convert()
NumPyConverter doesn't implement  conversion. 
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389431#comment-16389431
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172814583
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1590,6 +1592,85 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
+  if (tup == NULL) {
+std::stringstream ss;
+ss << "Missing field '" << field->name() << "' in struct array";
+return Status::TypeError(ss.str());
+  }
+  PyArray_Descr* sub_dtype =
+  reinterpret_cast(PyTuple_GET_ITEM(tup, 0));
+  DCHECK(PyArray_DescrCheck(sub_dtype));
+  int offset = static_cast(PyLong_AsLong(PyTuple_GET_ITEM(tup, 1)));
+  RETURN_IF_PYERROR();
+  Py_INCREF(sub_dtype); /* PyArray_GetField() steals ref */
+  PyObject* sub_array = PyArray_GetField(arr_, sub_dtype, offset);
+  RETURN_IF_PYERROR();
+  sub_arrays.emplace_back(sub_array);
+  sub_converters.emplace_back(pool_, sub_array, nullptr /* mask */, 
field->type(),
+  use_pandas_null_sentinels_);
+}
+  }
+
+  std::vector groups;
+
+  // Compute null bitmap and store it as a Null Array to include it
+  // in the rechunking below
+  {
+int64_t null_count = 0;
+if (mask_ != nullptr) {
+  RETURN_NOT_OK(InitNullBitmap());
+  null_count = MaskToBitmap(mask_, length_, null_bitmap_data_);
+}
+auto null_data = ArrayData::Make(std::make_shared(), length_,
+ {null_bitmap_}, null_count, 0);
+DCHECK_EQ(null_data->buffers.size(), 1);
+groups.push_back({std::make_shared(null_data)});
+  }
+
+  // Convert child data
+  for (auto& converter : sub_converters) {
+RETURN_NOT_OK(converter.Convert());
+groups.push_back(converter.result());
+  }
+  // Ensure the different array groups are chunked consistently
+  groups = ::arrow::internal::RechunkArraysConsistently(groups);
+
+  // Make struct array chunks by combining groups
+  size_t ngroups = groups.size();
+  size_t chunk, nchunks = groups[0].size();
+  for (chunk = 0; chunk < nchunks; chunk++) {
+// Create struct array chunk and populate it
+// First group has the null bitmaps as Null Arrays
+auto null_data = groups[0][chunk]->data();
+DCHECK_EQ(null_data->type->id(), Type::NA);
+DCHECK_EQ(null_data->buffers.size(), 1);
+
+auto arr_data = ArrayData::Make(type_, length_, null_data->null_count, 0);
 
 Review comment:
   Is it problematic to have `null_count == -1`? From my understanding it seems 
to be a supported condition (i.e. "I don't know the exact number of nulls, just 
use the null bitmap to compute it when necessary").
   
   Understood about the offset. Indeed, testing it may involve passing some 
large data...


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status

[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389417#comment-16389417
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172811919
 
 

 ##
 File path: cpp/src/arrow/array.cc
 ##
 @@ -772,6 +773,105 @@ std::shared_ptr MakeArray(const 
std::shared_ptr& data) {
   return out;
 }
 
+// --
+// Misc APIs
+
+namespace internal {
+
+std::vector RechunkArraysConsistently(
+const std::vector& groups) {
+  if (groups.size() <= 1) {
+return groups;
+  }
+  // Adjacent slices defining the desired rechunking
+  std::vector> slices;
+  // Total number of elements common to all array groups
+  int64_t total_length = -1;
+
+  {
+// Compute a vector of slices such that each array spans
+// one or more *entire* slices only
+// e.g. if group #1 has bounds {0, 2, 4, 5, 10}
+// and group #2 has bounds {0, 5, 7, 10}
+// then the computed slices are
+// {(0, 2), (2, 4), (4, 5), (5, 7), (7, 10)}
+std::set bounds;
+for (auto& group : groups) {
+  int64_t cur = 0;
+  bounds.insert(cur);
+  for (auto& array : group) {
+cur += array->length();
+bounds.insert(cur);
 
 Review comment:
   You're right, rechunking can simply be done on the way. I've now pushed a 
change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement  conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2142) [Python] Conversion from Numpy struct array unimplemented

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389414#comment-16389414
 ] 

ASF GitHub Bot commented on ARROW-2142:
---

pitrou commented on a change in pull request #1635: ARROW-2142: [Python] Allow 
conversion from Numpy struct array
URL: https://github.com/apache/arrow/pull/1635#discussion_r172811688
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -1590,6 +1592,85 @@ Status NumPyConverter::Visit(const StringType& type) {
   return PushArray(result->data());
 }
 
+Status NumPyConverter::Visit(const StructType& type) {
+  std::vector sub_converters;
+  std::vector sub_arrays;
+
+  {
+PyAcquireGIL gil_lock;
+
+// Create converters for each struct type field
+if (dtype_->fields == NULL || !PyDict_Check(dtype_->fields)) {
+  return Status::TypeError("Expected struct array");
+}
+
+for (auto field : type.children()) {
+  PyObject* tup = PyDict_GetItemString(dtype_->fields, 
field->name().c_str());
 
 Review comment:
   On Python 3, yes, a unicode object is constructed assuming a UTF-8 input 
(using `PyUnicode_FromString`). On Python 2, a bytes object is constructed for 
lookup, and any non-ASCII bytes-unicode comparison would fail.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Conversion from Numpy struct array unimplemented
> -
>
> Key: ARROW-2142
> URL: https://issues.apache.org/jira/browse/ARROW-2142
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
>
> {code:python}
> >>> arr = np.array([(1.5,)], dtype=np.dtype([('x', np.float32)]))
> >>> arr
> array([(1.5,)], dtype=[('x', ' >>> arr[0]
> (1.5,)
> >>> arr['x']
> array([1.5], dtype=float32)
> >>> arr['x'][0]
> 1.5
> >>> pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
> Traceback (most recent call last):
>   File "", line 1, in 
>     pa.array(arr, type=pa.struct([pa.field('x', pa.float32())]))
>   File "array.pxi", line 177, in pyarrow.lib.array
>   File "error.pxi", line 77, in pyarrow.lib.check_status
>   File "error.pxi", line 85, in pyarrow.lib.check_status
> ArrowNotImplementedError: 
> /home/antoine/arrow/cpp/src/arrow/python/numpy_to_arrow.cc:1585 code: 
> converter.Convert()
> NumPyConverter doesn't implement  conversion.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2284) [Python] test_plasma error on plasma_store error

2018-03-07 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2284:
-

 Summary: [Python] test_plasma error on plasma_store error
 Key: ARROW-2284
 URL: https://issues.apache.org/jira/browse/ARROW-2284
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


This appears caused by my latest changes:
{code:python}
Traceback (most recent call last):
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 192, in 
setup_method
    plasma_store_name, self.p = self.plasma_store_ctx.__enter__()
  File "/home/antoine/miniconda3/envs/pyarrow/lib/python3.6/contextlib.py", 
line 81, in __enter__
    return next(self.gen)
  File "/home/antoine/arrow/python/pyarrow/tests/test_plasma.py", line 168, in 
start_plasma_store
    err = proc.stderr.read().decode()
AttributeError: 'NoneType' object has no attribute 'read'
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2269) Cannot build bdist_wheel for Python

2018-03-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389365#comment-16389365
 ] 

ASF GitHub Bot commented on ARROW-2269:
---

xhochy commented on issue #1713: ARROW-2269: [Python] Fixing paths for libs 
when building bdist
URL: https://github.com/apache/arrow/pull/1713#issuecomment-371093384
 
 
   @mitar Definitively. We certainly lack a bit of documentation in general.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Cannot build bdist_wheel for Python
> ---
>
> Key: ARROW-2269
> URL: https://issues.apache.org/jira/browse/ARROW-2269
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Affects Versions: 0.9.0
>Reporter: Mitar
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> I am trying current master.
> I ran:
> 
> python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
> --with-plasma --bundle-arrow-cpp bdist_wheel
> 
> Output:
> 
> running build_ext
> creating build
> creating build/temp.linux-x86_64-3.6
> -- Runnning cmake for pyarrow
> cmake -DPYTHON_EXECUTABLE=.../Temp/arrow/pyarrow/bin/python  
> -DPYARROW_BUILD_PARQUET=on -DPYARROW_BOOST_USE_SHARED=on 
> -DPYARROW_BUILD_PLASMA=on -DPYARROW_BUNDLE_ARROW_CPP=ON 
> -DCMAKE_BUILD_TYPE=release .../Temp/arrow/arrow/python
> -- The C compiler identification is GNU 7.2.0
> -- The CXX compiler identification is GNU 7.2.0
> -- Check for working C compiler: /usr/bin/cc
> -- Check for working C compiler: /usr/bin/cc -- works
> -- Detecting C compiler ABI info
> -- Detecting C compiler ABI info - done
> -- Detecting C compile features
> -- Detecting C compile features - done
> -- Check for working CXX compiler: /usr/bin/c++
> -- Check for working CXX compiler: /usr/bin/c++ -- works
> -- Detecting CXX compiler ABI info
> -- Detecting CXX compiler ABI info - done
> -- Detecting CXX compile features
> -- Detecting CXX compile features - done
> INFOCompiler command: /usr/bin/c++
> INFOCompiler version: Using built-in specs.
> COLLECT_GCC=/usr/bin/c++
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
> 7.2.0-8ubuntu3.2' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs 
> --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr 
> --with-gcc-major-version-only --program-suffix=-7 
> --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
> --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
> --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
> --enable-libstdcxx-debug --enable-libstdcxx-time=yes 
> --with-default-libstdcxx-abi=new --enable-gnu-unique-object 
> --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie 
> --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto 
> --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 
> --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic 
> --enable-offload-targets=nvptx-none --without-cuda-driver 
> --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu 
> --target=x86_64-linux-gnu
> Thread model: posix
> gcc version 7.2.0 (Ubuntu 7.2.0-8ubuntu3.2) 
> INFOCompiler id: GNU
> Selected compiler gcc 7.2.0
> -- Performing Test CXX_SUPPORTS_SSE3
> -- Performing Test CXX_SUPPORTS_SSE3 - Success
> -- Performing Test CXX_SUPPORTS_ALTIVEC
> -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed
> Configured for RELEASE build (set with cmake 
> -DCMAKE_BUILD_TYPE={release,debug,...})
> -- Build Type: RELEASE
> -- Build output directory: 
> .../Temp/arrow/arrow/python/build/temp.linux-x86_64-3.6/release/
> -- Found PythonInterp: .../Temp/arrow/pyarrow/bin/python (found version 
> "3.6.3") 
> -- Searching for Python libs in 
> .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
> -- Looking for python3.6m
> -- Found Python lib 
> /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
> -- Found PythonLibs: 
> /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so
> -- Found NumPy: version "1.14.1" 
> .../Temp/arrow/pyarrow/lib/python3.6/site-packages/numpy/core/include
> -- Searching for Python libs in 
> .../Temp/arrow/pyarrow/lib64;.../Temp/arrow/pyarrow/lib;/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu
> -- Looking for python3.6m
> --