[jira] [Created] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas

2018-04-13 Thread Travis Brady (JIRA)
Travis Brady created ARROW-2459:
---

 Summary: pyarrow: Segfault with pyarrow.deserialize_pandas
 Key: ARROW-2459
 URL: https://issues.apache.org/jira/browse/ARROW-2459
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
 Environment: OS X, Linux
Reporter: Travis Brady


Following up from [https://github.com/apache/arrow/issues/1884] wherein I found 
that calling deserialize_pandas in the linked app.py script in the repo linked 
below causes the app.py process to segfault.

I initially observed this on OS X, but have since confirmed that the behavior 
exists on Linux as well.

Repo containing example: [https://github.com/travisbrady/sanic-arrow] 

And more generally: what is the right way to get a Java-based HTTP microservice 
to talk to a Python-based HTTP microservice using Arrow as the serialization 
format? I'm exchanging DataFrame type objects (they are pandas.DataFrame's on 
the Python side) between the two services for real-time scoring in a few 
xgboost models implemented in Python.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437947#comment-16437947
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert 
numpy arrays of bytes to arrow arrays of strings when user specifies arrow type 
of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381264065
 
 
   > I think it would be helpful to have iterators that look like this:
   
   Probably, though that would be another PR :-)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437944#comment-16437944
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert 
numpy arrays of bytes to arrow arrays of strings when user specifies arrow type 
of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381263863
 
 
   @joshuastorck The utf8 decoding check is in `BuilderAppend(StringBuilder*, 
...)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437938#comment-16437938
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

joshuastorck commented on issue #1886: ARROW-2101: [Python/C++] Correctly 
convert numpy arrays of bytes to arrow arrays of strings when user specifies 
arrow type of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381263228
 
 
   I built for Python 2 and confirmed the behavior is the same. 
   
   @pitrou, in regards to the inefficiency of utf-8 encoding, it could be moved 
below to the check of global_have_bytes. Would you prefer this?
   
   ```cpp
 if (global_have_bytes) {
   if (force_string)
   {
   PyObject* obj;
   
Ndarray1DIndexer objects(arr_);
Ndarray1DIndexer mask_values;

bool have_mask = false;
if (mask_ != nullptr) {
  mask_values.Init(mask_);
  have_mask = true;
}

for (int64_t offset = 0; offset < objects.size(); ++offset) {
  OwnedRef tmp_obj;
  obj = objects[offset];
  if ((have_mask && mask_values[offset]) || 
internal::PandasObjectIsNull(obj)) {
continue;
  }
  OwnedRef(PyUnicode_AsUTF8String(obj));
  RETURN_IF_PYERROR();
}
   }
   else
   {
 for (size_t i = 0; i < out_arrays_.size(); ++i) {
auto binary_data = out_arrays_[i]->data()->Copy();c
binary_data->type = ::arrow::binary();
out_arrays_[i] = std::make_shared(binary_data);
 }
   }
   ```
   
   I'm not fond of how much code I had to copy from AppendObjectStrings to 
write that loop. I think it would be helpful to have iterators that look like 
this:
   
   ```cpp
   NdArray1DIndexer array(array_);
   auto mask = NdArray1DIndexer::from_mask(mask_);
   NdArray1DMaskedIterator iterator(array.begin() + offset, array.end(), mask, 
true /* include masked value */);
   for (OwnedRef& obj: iterator)
   {
  // Maybe we use None to indicate masked values?
   }
   ```
   Or even better, we use pybind11 and these are light wrappers over them?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2387) negative decimal values get spurious rescaling error

2018-04-13 Thread Phillip Cloud (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phillip Cloud resolved ARROW-2387.
--
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1832
[https://github.com/apache/arrow/pull/1832]

> negative decimal values get spurious rescaling error
> 
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437832#comment-16437832
 ] 

ASF GitHub Bot commented on ARROW-2387:
---

cpcloud commented on issue #1832: ARROW-2387: [Python] Flip test for rescale 
loss if value < 0
URL: https://github.com/apache/arrow/pull/1832#issuecomment-381246201
 
 
   Sweet! Merging. Thank you!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> negative decimal values get spurious rescaling error
> 
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437822#comment-16437822
 ] 

ASF GitHub Bot commented on ARROW-2387:
---

bwo commented on issue #1832: ARROW-2387: [Python] Flip test for rescale loss 
if value < 0
URL: https://github.com/apache/arrow/pull/1832#issuecomment-381244400
 
 
   hooray!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> negative decimal values get spurious rescaling error
> 
>
> Key: ARROW-2387
> URL: https://issues.apache.org/jira/browse/ARROW-2387
> Project: Apache Arrow
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: ben w
>Assignee: Phillip Cloud
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> $ python
> Python 2.7.12 (default, Nov 20 2017, 18:23:56)
> [GCC 5.4.0 20160609] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow as pa, decimal
> >>> one = decimal.Decimal('1.00')
> >>> neg_one = decimal.Decimal('-1.00')
> >>> pa.array([one], pa.decimal128(24, 12))
> 
> [
> Decimal('1.')
> ]
> >>> pa.array([neg_one], pa.decimal128(24, 12))
> Traceback (most recent call last):
> File "", line 1, in 
> File "array.pxi", line 181, in pyarrow.lib.array
> File "array.pxi", line 36, in pyarrow.lib._sequence_to_array
> File "error.pxi", line 77, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from 
> original scale of 6 to new scale of 12 would cause data loss
> >>> pa.__version__
> '0.9.0'
> {code}
> not only is the error spurious, the decimal value has been multiplied by one 
> million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still 
> pretty strange to me).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437782#comment-16437782
 ] 

ASF GitHub Bot commented on ARROW-2458:
---

pcmoritz opened a new pull request #1893: ARROW-2458: [Plasma] Use one thread 
pool per PlasmaClient
URL: https://github.com/apache/arrow/pull/1893
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2458:
--
Labels: pull-request-available  (was: )

> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-13 Thread Philipp Moritz (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philipp Moritz reassigned ARROW-2458:
-

Assignee: Philipp Moritz

> [Plasma] PlasmaClient uses global variable
> --
>
> Key: ARROW-2458
> URL: https://issues.apache.org/jira/browse/ARROW-2458
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++)
>Affects Versions: 0.9.0
>Reporter: Philipp Moritz
>Assignee: Philipp Moritz
>Priority: Major
>  Labels: pull-request-available
>
> The threadpool threadpool_ that PlasmaClient is using is global at the 
> moment. This prevents us from using multiple PlasmaClients in the same 
> process (one per thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2458) [Plasma] PlasmaClient uses global variable

2018-04-13 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-2458:
-

 Summary: [Plasma] PlasmaClient uses global variable
 Key: ARROW-2458
 URL: https://issues.apache.org/jira/browse/ARROW-2458
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Plasma (C++)
Affects Versions: 0.9.0
Reporter: Philipp Moritz


The threadpool threadpool_ that PlasmaClient is using is global at the moment. 
This prevents us from using multiple PlasmaClients in the same process (one per 
thread).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437757#comment-16437757
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on a change in pull request #1886: ARROW-2101: [Python/C++] 
Correctly convert numpy arrays of bytes to arrow arrays of strings when user 
specifies arrow type of string
URL: https://github.com/apache/arrow/pull/1886#discussion_r181482519
 
 

 ##
 File path: cpp/src/arrow/python/numpy_to_arrow.cc
 ##
 @@ -844,6 +846,13 @@ Status NumPyConverter::ConvertObjectStrings() {
   StringBuilder builder(pool_);
   RETURN_NOT_OK(builder.Resize(length_));
 
+  // If the creator of this NumPyConverter specified a type,
+  // then we want to force the output type to be utf8. If
+  // the input data is PyBytes and not PyUnicode and
+  // not convertible to utf8, the call to AppendObjectStrings
+  // below will fail because we pass force_string as the
+  // value for check_valid.
+  bool force_string = type_ != std::nullptr && type_->Equals(utf8());
 
 Review comment:
   Apparently some compilers don't like `std::nullptr`. Just use `type_ != 
nullptr`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437752#comment-16437752
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert 
numpy arrays of bytes to arrow arrays of strings when user specifies arrow type 
of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381231182
 
 
   By the way, the validity check is expensive since it utf8-decodes the 
bytestring.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437748#comment-16437748
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert 
numpy arrays of bytes to arrow arrays of strings when user specifies arrow type 
of string
URL: https://github.com/apache/arrow/pull/1886#issuecomment-381230244
 
 
   > Also, this doesn't change anything for Python 2 if using 'str' objects and 
the type is not specified, it will still create a BinaryArray, is this what we 
want?
   
   *Probably*. Python 2 `str` objects are bytestrings just like Python 3 
`bytes` objects.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

2018-04-13 Thread Alex Hagerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437747#comment-16437747
 ] 

Alex Hagerman commented on ARROW-2339:
--

Good to know. I'll look at the open tickets and priority to see if there is 
something else to pick up. Also don't want to hold up things if I can't work on 
something for a few days.

 

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

2018-04-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437736#comment-16437736
 ] 

Antoine Pitrou commented on ARROW-2339:
---

And by the way I think this is quite low-priority, unless you know of a use 
case where the performance of hashing arrow scalars is critical.

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2241) [Python] Simple script for running all current ASV benchmarks at a commit or tag

2018-04-13 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2241.
---
Resolution: Fixed
  Assignee: Antoine Pitrou

> [Python] Simple script for running all current ASV benchmarks at a commit or 
> tag
> 
>
> Key: ARROW-2241
> URL: https://issues.apache.org/jira/browse/ARROW-2241
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.10.0
>
>
> The objective of this is to be able to get a graph for performance at each 
> release tag for the currently-defined benchmarks (including benchmarks that 
> did not exist in older tags)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2241) [Python] Simple script for running all current ASV benchmarks at a commit or tag

2018-04-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437732#comment-16437732
 ] 

Antoine Pitrou commented on ARROW-2241:
---

In ARROW-2182 we made it so ASV is able to build the C++ Arrow libs for each 
changeset. Since parquet-cpp is in another repository, though, it's not handled 
through that mechanism.

> [Python] Simple script for running all current ASV benchmarks at a commit or 
> tag
> 
>
> Key: ARROW-2241
> URL: https://issues.apache.org/jira/browse/ARROW-2241
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.10.0
>
>
> The objective of this is to be able to get a graph for performance at each 
> release tag for the currently-defined benchmarks (including benchmarks that 
> did not exist in older tags)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437728#comment-16437728
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

pitrou commented on issue #1892: ARROW-2455: [C++] Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892#issuecomment-381225091
 
 
   Thank you @sighingnow !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
> ---
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

2018-04-13 Thread Alex Hagerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437729#comment-16437729
 ] 

Alex Hagerman commented on ARROW-2339:
--

That will be interesting! Got it. Thank you for the direction.

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437724#comment-16437724
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

pitrou closed pull request #1892: ARROW-2455: [C++] Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/gpu/cuda_context.cc 
b/cpp/src/arrow/gpu/cuda_context.cc
index 909c98aa8..578c04a5a 100644
--- a/cpp/src/arrow/gpu/cuda_context.cc
+++ b/cpp/src/arrow/gpu/cuda_context.cc
@@ -40,7 +40,7 @@ struct CudaDevice {
 
 class CudaContext::CudaContextImpl {
  public:
-  CudaContextImpl() {}
+  CudaContextImpl() : bytes_allocated_(0) {}
 
   Status Init(const CudaDevice& device) {
 device_ = device;


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
> ---
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2455.
---
   Resolution: Fixed
Fix Version/s: 0.10.0

Issue resolved by pull request 1892
[https://github.com/apache/arrow/pull/1892]

> [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
> ---
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

2018-04-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437715#comment-16437715
 ] 

Antoine Pitrou commented on ARROW-2339:
---

No, you need to scrupulously replicate Python hashing's mechanism.

Also I don't think there's any point in using xxHash and friends over a simple 
64-bit integer.

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing

2018-04-13 Thread Alex Hagerman (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437664#comment-16437664
 ] 

Alex Hagerman commented on ARROW-2339:
--

[~pitrou] [~wesmckinn] sorry I've been absent on this work has had me tied up 
day and night hoping to work some more on this over the weekend. I was 
wondering if you had any thoughts on using xxHash, MumrurHash or FNV-1a for 
this? I was going to do some timing this weekend as well as testing for 
collisions on various ints as you mentioned on the original ticket. Do you know 
if we can use existing implementations of the hash from C or C++ with wrappers? 
I didn't know what ASF rules might be on that with regard to licenses (only ASF 
or MIT/BSD allowed) and adding the Cython wrappers to PyArrow. If it's better 
just to do a new implementation I'll work on that too, but didn't want to 
reinvent a wheel if I didn't need to.

> [Python] Add a fast path for int hashing
> 
>
> Key: ARROW-2339
> URL: https://issues.apache.org/jira/browse/ARROW-2339
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Alex Hagerman
>Assignee: Alex Hagerman
>Priority: Major
> Fix For: 0.10.0
>
>
> Create a __hash__ fast path for Int scalars that avoids using as_py().
>  
> https://issues.apache.org/jira/browse/ARROW-640
> [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2222) [C++] Add option to validate Flatbuffers messages

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437615#comment-16437615
 ] 

ASF GitHub Bot commented on ARROW-:
---

crepererum commented on issue #1763: ARROW-: handle untrusted inputs (POC)
URL: https://github.com/apache/arrow/pull/1763#issuecomment-381205704
 
 
   @xhochy I've fixed the formatting, but I don't understand why it is failing 
now :(


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Add option to validate Flatbuffers messages
> -
>
> Key: ARROW-
> URL: https://issues.apache.org/jira/browse/ARROW-
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Marco Neumann
>Priority: Major
>  Labels: pull-request-available
>
> This is follow up work to ARROW-1589, ARROW-2023, and can be validated by the 
> {{ipc-fuzzer-test}}. Users receiving untrusted input streams can prevent 
> segfaults this way
> As part of this, we should quantify the overhead associated with message 
> validation in regular use



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (ARROW-2456) garrow_array_builder_append_values does not work for large arrays

2018-04-13 Thread Haralampos Gavriilidis (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haralampos Gavriilidis closed ARROW-2456.
-
Resolution: Duplicate

> garrow_array_builder_append_values does not work for large arrays
> -
>
> Key: ARROW-2456
> URL: https://issues.apache.org/jira/browse/ARROW-2456
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, GLib
>Reporter: Haralampos Gavriilidis
>Priority: Major
>
> When calling 
> {code:java}
> garrow_array_builder_append_values(GArrowArrayBuilder *builder,
>  const VALUE *values,
>  gint64 values_length,
>  const gboolean *is_valids,
>  gint64 is_valids_length,
>  GError **error,
>  const gchar *context){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2457) garrow_array_builder_append_values() won't work for large arrays

2018-04-13 Thread Haralampos Gavriilidis (JIRA)
Haralampos Gavriilidis created ARROW-2457:
-

 Summary: garrow_array_builder_append_values() won't work for large 
arrays
 Key: ARROW-2457
 URL: https://issues.apache.org/jira/browse/ARROW-2457
 Project: Apache Arrow
  Issue Type: Bug
  Components: C, C++, GLib
Affects Versions: 0.9.0, 0.8.0
Reporter: Haralampos Gavriilidis


I am using garrow_array_builder_append_values() to transform a native C array 
to an Arrow array, without calling arrow_array_builder_append multiple times. 
When calling garrow_array_builder_append_values() in array-builder.cpp with 
following signature:
{code:java}
garrow_array_builder_append_values(GArrowArrayBuilder *builder,
const VALUE *values,
gint64 values_length,
const gboolean *is_valids,
gint64 is_valids_length,
GError **error,
const gchar *context)
{code}
it will fail for large arrays. This is probably happening because the is_valids 
array is copied to the valid_bytes array (of different type), for which the 
memory is allocated on the stack, and not on the heap, like shown on the 
snippet below:
{code:java}
uint8_t valid_bytes[is_valids_length];
for (gint64 i = 0; i < is_valids_length; ++i){ 
  valid_bytes[i] = is_valids[i]; 
}
{code}
 A way to avoid this problem would be to allocate memory for the valid_bytes 
array using malloc() or something similar. Is this behavior intended, maybe 
because no large arrays should be handed over to that function, or it is rather 
a bug?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2456) garrow_array_builder_append_values does not work for large arrays

2018-04-13 Thread Haralampos Gavriilidis (JIRA)
Haralampos Gavriilidis created ARROW-2456:
-

 Summary: garrow_array_builder_append_values does not work for 
large arrays
 Key: ARROW-2456
 URL: https://issues.apache.org/jira/browse/ARROW-2456
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, GLib
Reporter: Haralampos Gavriilidis


When calling 
{code:java}
garrow_array_builder_append_values(GArrowArrayBuilder *builder,
 const VALUE *values,
 gint64 values_length,
 const gboolean *is_valids,
 gint64 is_valids_length,
 GError **error,
 const gchar *context){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437380#comment-16437380
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

liurenjie1024 commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381157762
 
 
   @crepererum We can have memory pool as a wrapper allocator api so that we 
can have more functionality, e.g. statistics about memory usage


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2397) Document changes in Tensor encoding in IPC.md.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437351#comment-16437351
 ] 

ASF GitHub Bot commented on ARROW-2397:
---

xhochy commented on issue #1837: ARROW-2397: [Documentation] Update format 
documentation to describe tensor alignment.
URL: https://github.com/apache/arrow/pull/1837#issuecomment-381149693
 
 
   @robertnishihara I think we can go ahead and merge this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Document changes in Tensor encoding in IPC.md.
> --
>
> Key: ARROW-2397
> URL: https://issues.apache.org/jira/browse/ARROW-2397
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Robert Nishihara
>Priority: Major
>  Labels: pull-request-available
>
> Update IPC.md to reflect the changes in 
> https://github.com/apache/arrow/pull/1802.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437341#comment-16437341
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

crepererum commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381147104
 
 
   Or in other words: do you (@andygrove) think we should switch to the 
upstream API once it is stable?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437337#comment-16437337
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

crepererum commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381146548
 
 
   Could we not use something closer to the hopefully-soon-stable [Allocator 
API](https://doc.rust-lang.org/alloc/allocator/trait.Alloc.html)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437297#comment-16437297
 ] 

ASF GitHub Bot commented on ARROW-2435:
---

andygrove commented on issue #1875: ARROW-2435: [Rust] Add memory pool 
abstraction.
URL: https://github.com/apache/arrow/pull/1875#issuecomment-381135426
 
 
   @xhochy I think this looks good now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Rust] Add memory pool abstraction.
> ---
>
> Key: ARROW-2435
> URL: https://issues.apache.org/jira/browse/ARROW-2435
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.9.0
>Reporter: Renjie Liu
>Assignee: Renjie Liu
>Priority: Major
>  Labels: pull-request-available
>
> Add memory pool abstraction as the c++ api.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437279#comment-16437279
 ] 

ASF GitHub Bot commented on ARROW-2101:
---

pitrou commented on a change in pull request #1886: Bug fix for ARROW-2101
URL: https://github.com/apache/arrow/pull/1886#discussion_r181381388
 
 

 ##
 File path: python/pyarrow/tests/test_convert_numpy.py
 ##
 @@ -0,0 +1,35 @@
+# -*- coding: utf-8 -*-
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import numpy as np
+import pyarrow as pa
+
+import pytest
+
+# Regression test for ARROW-2101
+def test_convert_numpy_array_of_bytes_to_arrow_array_of_strings():
+converted = pa.array(np.array([b'x'], dtype=object), pa.string())
+assert converted.type == pa.string()
+
+# Make sure that if an ndarray of bytes is passed to the array
+# constructor and the type is string, it will fail if those bytes
+# cannot be converted to utf-8
+def test_convert_numpy_array_of_bytes_to_arrow_array_of_strings_bad_data():
+with pytest.raises(pa.lib.ArrowException,
+   message="Unknown error: 'utf-8' codec can't decode byte 
0x80 in position 0: invalid start byte"):
+pa.array(np.array([b'\x80\x81'], dtype=object), pa.string())
 
 Review comment:
   Indeed. Also I don't think we need both Python and C++ tests. Given the 
difference in verbosity and maintainability, I'd favour writing the tests on 
the Python side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> 
>
> Key: ARROW-2101
> URL: https://issues.apache.org/jira/browse/ARROW-2101
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.8.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow 
> data of binary type, even if the user supplies type information.  conversion 
> of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2455:
--
Summary: [C++] The bytes_allocated_ in CudaContextImpl isn't initialized  
(was: [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized)

> [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
> ---
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2455) [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2455:
--
Summary: [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized  
(was: The bytes_allocated_ in CudaContextImpl isn't initialized)

> [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized
> ---
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437243#comment-16437243
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

sighingnow commented on issue #1892: ARROW-2455: [C++] Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892#issuecomment-381118389
 
 
   Fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The bytes_allocated_ in CudaContextImpl isn't initialized
> -
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437237#comment-16437237
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

pitrou commented on issue #1892: ARROW-2455: Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892#issuecomment-381115779
 
 
   Thanks for doing this. You've got a C++ linting error here:
   https://travis-ci.org/apache/arrow/jobs/366068960#L953
   
   I suggest you run "make format" to fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The bytes_allocated_ in CudaContextImpl isn't initialized
> -
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437236#comment-16437236
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

pitrou commented on issue #1892: ARROW-2455: Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892#issuecomment-381115779
 
 
   Thanks for doing this. You've got a C++ linting here:
   https://travis-ci.org/apache/arrow/jobs/366068960#L953
   
   I suggest you run "make format" to fix it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The bytes_allocated_ in CudaContextImpl isn't initialized
> -
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-13 Thread Antoine Pitrou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2423:
--
Labels: beginner  (was: )

> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: beginner
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects

2018-04-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437222#comment-16437222
 ] 

Antoine Pitrou commented on ARROW-2423:
---

Agreed with this. It should be an easy fix.

> [Python] PyArrow datatypes raise ValueError on equality checks against 
> non-PyArrow objects
> --
>
> Key: ARROW-2423
> URL: https://issues.apache.org/jira/browse/ARROW-2423
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> PyArrow 0.9.0 (py36_1)
> Python 3.6.3
>Reporter: Dave Challis
>Priority: Minor
>  Labels: beginner
>
> Checking a PyArrow datatype object for equality with non-PyArrow datatypes 
> causes a `ValueError` to be raised, rather than either returning a True/False 
> value, or returning 
> [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented]
>  if the comparison isn't implemented.
> E.g. attempting to call:
> {code:java}
> import pyarrow
> pyarrow.int32() == 'foo'
> {code}
> results in:
> {code:java}
> Traceback (most recent call last):
>   File "types.pxi", line 1221, in pyarrow.lib.type_for_alias
> KeyError: 'foo'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "t.py", line 2, in 
> pyarrow.int32() == 'foo'
>   File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__
>   File "types.pxi", line 113, in pyarrow.lib.DataType.equals
>   File "types.pxi", line 1223, in pyarrow.lib.type_for_alias
> ValueError: No type alias for foo
> {code}
> The expected outcome for the above would be for the comparison to return 
> `False`, as that's the general behaviour for comparisons between objects of 
> different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2455:
--
Labels: pull-request-available  (was: )

> The bytes_allocated_ in CudaContextImpl isn't initialized
> -
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437221#comment-16437221
 ] 

ASF GitHub Bot commented on ARROW-2455:
---

sighingnow opened a new pull request #1892: ARROW-2455: Initialize the atomic 
bytes_allocated_ properly
URL: https://github.com/apache/arrow/pull/1892
 
 
   The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't 
initialized, leading to failure of cuda-test on windows.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The bytes_allocated_ in CudaContextImpl isn't initialized
> -
>
> Key: ARROW-2455
> URL: https://issues.apache.org/jira/browse/ARROW-2455
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GPU
>Reporter: Tao He
>Priority: Major
>  Labels: pull-request-available
>
> The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
> leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized

2018-04-13 Thread Tao He (JIRA)
Tao He created ARROW-2455:
-

 Summary: The bytes_allocated_ in CudaContextImpl isn't initialized
 Key: ARROW-2455
 URL: https://issues.apache.org/jira/browse/ARROW-2455
 Project: Apache Arrow
  Issue Type: Bug
  Components: GPU
Reporter: Tao He


The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, 
leading to failure of cuda-test on windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2450) [Python] Saving to parquet fails for empty lists

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437139#comment-16437139
 ] 

ASF GitHub Bot commented on ARROW-2450:
---

pitrou opened a new pull request #1891: ARROW-2450: [Python] Test for Parquet 
roundtrip of null lists
URL: https://github.com/apache/arrow/pull/1891
 
 
   Actual fix is in PARQUET-1268.
   Also fix a crash when a column doesn't have any statistics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Saving to parquet fails for empty lists
> 
>
> Key: ARROW-2450
> URL: https://issues.apache.org/jira/browse/ARROW-2450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Uwe L. Korn
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.1
>
>
> When writing a table to parquet through pandas, if any column includes an 
> empty list, it fails with a segmentation fault.
> Minimal example:
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> def save(rows):
> table1 = pa.Table.from_pandas(pd.DataFrame(rows))
> pq.write_table(table1, 'test-foo.pq')
> table2 = pq.read_table('test-foo.pq')
> print('ROWS:', rows)
> print('TABLE1:', table1.to_pandas(), sep='\n')
> print('TABLE2:', table2.to_pandas(), sep='\n')
> save([{'val': ['something']}])
> print('---')
> save([{'val': []}])  # empty
> {code}
> Output:
> {code}
> ROWS: [{'val': ['something']}]
> TABLE1:
>val
> 0  [something]
> TABLE2:
>val
> 0  [something]
> ---
> ROWS: [{'val': []}]
> TABLE1:
>   val
> 0  []
> [1]13472 segmentation fault (core dumped)  python3 test.py
> {code}
> Versions:
> {code}
> $ pip3 list | grep pyarrow
> pyarrow (0.9.0)
> $ python3 --version
> Python 3.5.2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2450) [Python] Saving to parquet fails for empty lists

2018-04-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2450:
--
Labels: pull-request-available  (was: )

> [Python] Saving to parquet fails for empty lists
> 
>
> Key: ARROW-2450
> URL: https://issues.apache.org/jira/browse/ARROW-2450
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
>Reporter: Uwe L. Korn
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.1
>
>
> When writing a table to parquet through pandas, if any column includes an 
> empty list, it fails with a segmentation fault.
> Minimal example:
> {code}
> import pyarrow as pa
> import pyarrow.parquet as pq
> import pandas as pd
> def save(rows):
> table1 = pa.Table.from_pandas(pd.DataFrame(rows))
> pq.write_table(table1, 'test-foo.pq')
> table2 = pq.read_table('test-foo.pq')
> print('ROWS:', rows)
> print('TABLE1:', table1.to_pandas(), sep='\n')
> print('TABLE2:', table2.to_pandas(), sep='\n')
> save([{'val': ['something']}])
> print('---')
> save([{'val': []}])  # empty
> {code}
> Output:
> {code}
> ROWS: [{'val': ['something']}]
> TABLE1:
>val
> 0  [something]
> TABLE2:
>val
> 0  [something]
> ---
> ROWS: [{'val': []}]
> TABLE1:
>   val
> 0  []
> [1]13472 segmentation fault (core dumped)  python3 test.py
> {code}
> Versions:
> {code}
> $ pip3 list | grep pyarrow
> pyarrow (0.9.0)
> $ python3 --version
> Python 3.5.2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2454) [Python] Empty chunked array slice crashes

2018-04-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2454:
-

 Summary: [Python] Empty chunked array slice crashes
 Key: ARROW-2454
 URL: https://issues.apache.org/jira/browse/ARROW-2454
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


{code:python}
>>> col = pa.Column.from_array('ints', pa.array([1,2,3]))
>>> col

chunk 0: 
[
  1,
  2,
  3
]
>>> col.data

>>> col.data[:1]

>>> col.data[:0]
Erreur de segmentation (core dumped)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission error

2018-04-13 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2452:
---
Summary: [TEST] Spark integration test fails with permission error  (was: 
[TEST] Spark integration test fails with permission err\or)

> [TEST] Spark integration test fails with permission error
> -
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission error

2018-04-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2452:
--
Labels: pull-request-available  (was: )

> [TEST] Spark integration test fails with permission error
> -
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission err\or

2018-04-13 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2452:
---
Summary: [TEST] Spark integration test fails with permission err\or  (was: 
[TEST] Spark integration test fails with permission eror)

> [TEST] Spark integration test fails with permission err\or
> --
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2452) [TEST] Spark integration test fails with permission error

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437093#comment-16437093
 ] 

ASF GitHub Bot commented on ARROW-2452:
---

kszucs opened a new pull request #1890: ARROW-2452: [TEST] Spark integration 
test fails with permission error
URL: https://github.com/apache/arrow/pull/1890
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [TEST] Spark integration test fails with permission error
> -
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation
> directory, preferably one that is listed in your PYTHONPATH environment
> variable.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-2453) [Python] Improve Table column access

2018-04-13 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2453:
-

 Summary: [Python] Improve Table column access
 Key: ARROW-2453
 URL: https://issues.apache.org/jira/browse/ARROW-2453
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Affects Versions: 0.9.0
Reporter: Antoine Pitrou


Suppose you have a table column named "nulls". Right now, to access it on a 
table, you need to do something like this:
{code:python}
>>> table.column(table.schema.get_field_index('nulls'))

chunk 0: 
[
  NA,
  NA,
  NA
]
{code}

Also, if you mistype the column name, instead of getting an error you get an 
arbitrary column:
{code}
>>> table.column(table.schema.get_field_index('z'))

chunk 0: 
[
  0,
  1,
  2
]
{code}

{{Table.column()}} should accept a string object and return the column with the 
corresponding name. KeyError should be raised if there is no column with a such 
name.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror

2018-04-13 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2452:
---
Description: 
{{arrow/dev/run_docker_compose.sh spark_integration}}

{code}
Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.
{code}

  was:
{{arrow/dev/run_docker_compose.sh spark_integration}}

{{Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.}} 


> [TEST] Spark integration test fails with permission eror
> 
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {code}
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different 

[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror

2018-04-13 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2452:
---
Description: 
{{arrow/dev/run_docker_compose.sh spark_integration}}

{{Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.}} 

  was:
{{arrow/dev/run_docker_compose.sh spark_integration}}

{{ 
Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.
}} 


> [TEST] Spark integration test fails with permission eror
> 
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {{Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different installation

[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror

2018-04-13 Thread Krisztian Szucs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs updated ARROW-2452:
---
Description: 
{{arrow/dev/run_docker_compose.sh spark_integration}}

{{ 
Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.
}} 

  was:
{{ arrow/dev/run_docker_compose.sh spark_integration }}

{{ 
Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.
}} 


> [TEST] Spark integration test fails with permission eror
> 
>
> Key: ARROW-2452
> URL: https://issues.apache.org/jira/browse/ARROW-2452
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Krisztian Szucs
>Priority: Major
>
> {{arrow/dev/run_docker_compose.sh spark_integration}}
> {{ 
> Scanning dependencies of target lib
> [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
> [100%] Linking CXX shared module release/lib.so
> [100%] Built target lib
> -- Finished cmake --build for pyarrow
> Bundling includes: release/include
> ('Moving built C-extension', 'release/lib.so', 'to build path', 
> '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
> release/_parquet.so
> Cython module _parquet failure permitted
> release/_orc.so
> Cython module _orc failure permitted
> release/plasma.so
> Cython module plasma failure permitted
> running install
> error: can't create or remove files in install directory
> The following error occurred while trying to add or remove files in the
> installation directory:
> [Errno 13] Permission denied: 
> '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'
> The installation directory you specified (via --install-dir, --prefix, or
> the distutils default setting) was:
> /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/
> Perhaps your account does not have write access to this directory?  If the
> installation directory is a system-owned directory, you may need to sign in
> as the administrator or "root" account.  If you do not have administrative
> access to this machine, you may wish to choose a different 

[jira] [Created] (ARROW-2452) [TEST] Spark integration test fails with permission eror

2018-04-13 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-2452:
--

 Summary: [TEST] Spark integration test fails with permission eror
 Key: ARROW-2452
 URL: https://issues.apache.org/jira/browse/ARROW-2452
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Krisztian Szucs


{{ arrow/dev/run_docker_compose.sh spark_integration }}

{{ 
Scanning dependencies of target lib
[ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o
[100%] Linking CXX shared module release/lib.so
[100%] Built target lib
-- Finished cmake --build for pyarrow
Bundling includes: release/include
('Moving built C-extension', 'release/lib.so', 'to build path', 
'/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so')
release/_parquet.so
Cython module _parquet failure permitted
release/_orc.so
Cython module _orc failure permitted
release/plasma.so
Cython module plasma failure permitted
running install
error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test'

The installation directory you specified (via --install-dir, --prefix, or
the distutils default setting) was:

/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/

Perhaps your account does not have write access to this directory?  If the
installation directory is a system-owned directory, you may need to sign in
as the administrator or "root" account.  If you do not have administrative
access to this machine, you may wish to choose a different installation
directory, preferably one that is listed in your PYTHONPATH environment
variable.
}} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2448) Segfault when plasma client goes out of scope before buffer.

2018-04-13 Thread Antoine Pitrou (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437057#comment-16437057
 ] 

Antoine Pitrou commented on ARROW-2448:
---

{quote}Not sure if I understand your question correctly, but the buffer can 
still point to a valid region of memory after the client is destroyed (since 
the store is still running).{quote}

In that case, it's impossible to release that buffer's memory, right? Even when 
the client process dies, as long as the store is still running, the memory 
would still be reserved...

{quote}Similarly, each PlasmaBuffer needs a shared pointer to the PlasmaClient. 
What do you think about something like that?{quote}

That would probably work to fix the crash, indeed. Something like a "pimpl" 
pattern?

> Segfault when plasma client goes out of scope before buffer.
> 
>
> Key: ARROW-2448
> URL: https://issues.apache.org/jira/browse/ARROW-2448
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Plasma (C++), Python
>Reporter: Robert Nishihara
>Priority: Major
>
> The following causes a segfault.
>  
> First start a plasma store with
> {code:java}
> plasma_store -s /tmp/store -m 100{code}
> Then run the following in Python.
> {code}
> import pyarrow.plasma as plasma
> import numpy as np
> client = plasma.connect('/tmp/store', '', 0)
> object_id = client.put(np.zeros(3))
> buf = client.get(object_id)
> del client
> del buf  # This segfaults.{code}
> The backtrace is 
> {code:java}
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS 
> (code=1, address=0xfffc)
>   * frame #0: 0x0001056deaee 
> libplasma.0.dylib`plasma::PlasmaClient::Release(plasma::UniqueID const&) + 142
>     frame #1: 0x0001056de9e9 
> libplasma.0.dylib`plasma::PlasmaBuffer::~PlasmaBuffer() + 41
>     frame #2: 0x0001056dec9f libplasma.0.dylib`arrow::Buffer::~Buffer() + 
> 63
>     frame #3: 0x000106206661 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] std::__1::__shared_count::__release_shared(this=0x0001019b7d20) 
> at memory:3444
>     frame #4: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() 
> [inlined] 
> std::__1::__shared_weak_count::__release_shared(this=0x0001019b7d20) at 
> memory:3486
>     frame #5: 0x000106206617 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4412
>     frame #6: 0x000106002b35 
> lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780)
>  at memory:4410
>     frame #7: 0x0001061052c5 lib.cpython-36m-darwin.so`void 
> __Pyx_call_destructor >(x=std::__1::shared_ptr::element_type @ 0x0001019b7d38 
> strong=0 weak=1) at lib.cxx:486
>     frame #8: 0x000106104f93 
> lib.cpython-36m-darwin.so`__pyx_tp_dealloc_7pyarrow_3lib_Buffer(o=0x000100791768)
>  at lib.cxx:107704
>     frame #9: 0x0001069fcd54 
> multiarray.cpython-36m-darwin.so`array_dealloc + 292
>     frame #10: 0x0001000e8daf 
> libpython3.6m.dylib`_PyDict_DelItem_KnownHash + 463
>     frame #11: 0x000100171899 
> libpython3.6m.dylib`_PyEval_EvalFrameDefault + 13321
>     frame #12: 0x0001001791ef 
> libpython3.6m.dylib`_PyEval_EvalCodeWithName + 2447
>     frame #13: 0x00010016e3d4 libpython3.6m.dylib`PyEval_EvalCode + 100
>     frame #14: 0x0001001a3bd6 
> libpython3.6m.dylib`PyRun_InteractiveOneObject + 582
>     frame #15: 0x0001001a350e 
> libpython3.6m.dylib`PyRun_InteractiveLoopFlags + 222
>     frame #16: 0x0001001a33fc libpython3.6m.dylib`PyRun_AnyFileExFlags + 
> 60
>     frame #17: 0x0001001bc835 libpython3.6m.dylib`Py_Main + 3829
>     frame #18: 0x00010df8 python`main + 232
>     frame #19: 0x7fff6cd80015 libdyld.dylib`start + 1
>     frame #20: 0x7fff6cd80015 libdyld.dylib`start + 1{code}
> Basically, the issue is that when the buffer goes out of scope, it calls 
> {{Release}} on the plasma client, but the client has already been deallocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437044#comment-16437044
 ] 

ASF GitHub Bot commented on ARROW-2300:
---

kszucs opened a new pull request #1889: ARROW-2300: [C++/Python] Integration 
test for HDFS
URL: https://github.com/apache/arrow/pull/1889
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] python/testing/test_hdfs.sh no longer works
> 
>
> Key: ARROW-2300
> URL: https://issues.apache.org/jira/browse/ARROW-2300
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Tried this on a fresh Ubuntu 16.04 install:
> {code}
> $ ./test_hdfs.sh 
> + docker build -t arrow-hdfs-test -f hdfs/Dockerfile .
> Sending build context to Docker daemon  36.86kB
> Step 1/6 : FROM cpcloud86/impala:metastore
> manifest for cpcloud86/impala:metastore not found
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works

2018-04-13 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-2300:
--
Labels: pull-request-available  (was: )

> [Python] python/testing/test_hdfs.sh no longer works
> 
>
> Key: ARROW-2300
> URL: https://issues.apache.org/jira/browse/ARROW-2300
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Tried this on a fresh Ubuntu 16.04 install:
> {code}
> $ ./test_hdfs.sh 
> + docker build -t arrow-hdfs-test -f hdfs/Dockerfile .
> Sending build context to Docker daemon  36.86kB
> Step 1/6 : FROM cpcloud86/impala:metastore
> manifest for cpcloud86/impala:metastore not found
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)