[jira] [Created] (ARROW-2459) pyarrow: Segfault with pyarrow.deserialize_pandas
Travis Brady created ARROW-2459: --- Summary: pyarrow: Segfault with pyarrow.deserialize_pandas Key: ARROW-2459 URL: https://issues.apache.org/jira/browse/ARROW-2459 Project: Apache Arrow Issue Type: Bug Components: Python Environment: OS X, Linux Reporter: Travis Brady Following up from [https://github.com/apache/arrow/issues/1884] wherein I found that calling deserialize_pandas in the linked app.py script in the repo linked below causes the app.py process to segfault. I initially observed this on OS X, but have since confirmed that the behavior exists on Linux as well. Repo containing example: [https://github.com/travisbrady/sanic-arrow] And more generally: what is the right way to get a Java-based HTTP microservice to talk to a Python-based HTTP microservice using Arrow as the serialization format? I'm exchanging DataFrame type objects (they are pandas.DataFrame's on the Python side) between the two services for real-time scoring in a few xgboost models implemented in Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437947#comment-16437947 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#issuecomment-381264065 > I think it would be helpful to have iterators that look like this: Probably, though that would be another PR :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437944#comment-16437944 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#issuecomment-381263863 @joshuastorck The utf8 decoding check is in `BuilderAppend(StringBuilder*, ...)`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437938#comment-16437938 ] ASF GitHub Bot commented on ARROW-2101: --- joshuastorck commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#issuecomment-381263228 I built for Python 2 and confirmed the behavior is the same. @pitrou, in regards to the inefficiency of utf-8 encoding, it could be moved below to the check of global_have_bytes. Would you prefer this? ```cpp if (global_have_bytes) { if (force_string) { PyObject* obj; Ndarray1DIndexer objects(arr_); Ndarray1DIndexer mask_values; bool have_mask = false; if (mask_ != nullptr) { mask_values.Init(mask_); have_mask = true; } for (int64_t offset = 0; offset < objects.size(); ++offset) { OwnedRef tmp_obj; obj = objects[offset]; if ((have_mask && mask_values[offset]) || internal::PandasObjectIsNull(obj)) { continue; } OwnedRef(PyUnicode_AsUTF8String(obj)); RETURN_IF_PYERROR(); } } else { for (size_t i = 0; i < out_arrays_.size(); ++i) { auto binary_data = out_arrays_[i]->data()->Copy();c binary_data->type = ::arrow::binary(); out_arrays_[i] = std::make_shared(binary_data); } } ``` I'm not fond of how much code I had to copy from AppendObjectStrings to write that loop. I think it would be helpful to have iterators that look like this: ```cpp NdArray1DIndexer array(array_); auto mask = NdArray1DIndexer::from_mask(mask_); NdArray1DMaskedIterator iterator(array.begin() + offset, array.end(), mask, true /* include masked value */); for (OwnedRef& obj: iterator) { // Maybe we use None to indicate masked values? } ``` Or even better, we use pybind11 and these are light wrappers over them? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2387) negative decimal values get spurious rescaling error
[ https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud resolved ARROW-2387. -- Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1832 [https://github.com/apache/arrow/pull/1832] > negative decimal values get spurious rescaling error > > > Key: ARROW-2387 > URL: https://issues.apache.org/jira/browse/ARROW-2387 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: ben w >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > {code:java} > $ python > Python 2.7.12 (default, Nov 20 2017, 18:23:56) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow as pa, decimal > >>> one = decimal.Decimal('1.00') > >>> neg_one = decimal.Decimal('-1.00') > >>> pa.array([one], pa.decimal128(24, 12)) > > [ > Decimal('1.') > ] > >>> pa.array([neg_one], pa.decimal128(24, 12)) > Traceback (most recent call last): > File "", line 1, in > File "array.pxi", line 181, in pyarrow.lib.array > File "array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "error.pxi", line 77, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from > original scale of 6 to new scale of 12 would cause data loss > >>> pa.__version__ > '0.9.0' > {code} > not only is the error spurious, the decimal value has been multiplied by one > million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still > pretty strange to me). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error
[ https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437833#comment-16437833 ] ASF GitHub Bot commented on ARROW-2387: --- cpcloud closed pull request #1832: ARROW-2387: [Python] Flip test for rescale loss if value < 0 URL: https://github.com/apache/arrow/pull/1832 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/cpp/src/arrow/python/python-test.cc b/cpp/src/arrow/python/python-test.cc index 293255b96..3ea814ff4 100644 --- a/cpp/src/arrow/python/python-test.cc +++ b/cpp/src/arrow/python/python-test.cc @@ -247,6 +247,16 @@ TEST_F(DecimalTest, FromPythonDecimalRescaleTruncateable) { ASSERT_EQ(100, value.low_bits()); } +TEST_F(DecimalTest, FromPythonNegativeDecimalRescale) { + Decimal128 value; + OwnedRef python_decimal(this->CreatePythonDecimal("-1.000")); + auto type = ::arrow::decimal(10, 9); + const auto& decimal_type = static_cast(*type); + ASSERT_OK( + internal::DecimalFromPythonDecimal(python_decimal.obj(), decimal_type, &value)); + ASSERT_EQ(-10, value); +} + TEST_F(DecimalTest, TestOverflowFails) { Decimal128 value; OwnedRef python_decimal( diff --git a/cpp/src/arrow/util/decimal.cc b/cpp/src/arrow/util/decimal.cc index 9e5e3ddb3..668da6f1f 100644 --- a/cpp/src/arrow/util/decimal.cc +++ b/cpp/src/arrow/util/decimal.cc @@ -843,7 +843,7 @@ static bool RescaleWouldCauseDataLoss(const Decimal128& value, int32_t delta_sca } *result = value * multiplier; - return *result < value; + return (value < 0) ? *result > value : *result < value; } Status Decimal128::Rescale(int32_t original_scale, int32_t new_scale, This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > negative decimal values get spurious rescaling error > > > Key: ARROW-2387 > URL: https://issues.apache.org/jira/browse/ARROW-2387 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: ben w >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > {code:java} > $ python > Python 2.7.12 (default, Nov 20 2017, 18:23:56) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow as pa, decimal > >>> one = decimal.Decimal('1.00') > >>> neg_one = decimal.Decimal('-1.00') > >>> pa.array([one], pa.decimal128(24, 12)) > > [ > Decimal('1.') > ] > >>> pa.array([neg_one], pa.decimal128(24, 12)) > Traceback (most recent call last): > File "", line 1, in > File "array.pxi", line 181, in pyarrow.lib.array > File "array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "error.pxi", line 77, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from > original scale of 6 to new scale of 12 would cause data loss > >>> pa.__version__ > '0.9.0' > {code} > not only is the error spurious, the decimal value has been multiplied by one > million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still > pretty strange to me). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error
[ https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437832#comment-16437832 ] ASF GitHub Bot commented on ARROW-2387: --- cpcloud commented on issue #1832: ARROW-2387: [Python] Flip test for rescale loss if value < 0 URL: https://github.com/apache/arrow/pull/1832#issuecomment-381246201 Sweet! Merging. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > negative decimal values get spurious rescaling error > > > Key: ARROW-2387 > URL: https://issues.apache.org/jira/browse/ARROW-2387 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: ben w >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > {code:java} > $ python > Python 2.7.12 (default, Nov 20 2017, 18:23:56) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow as pa, decimal > >>> one = decimal.Decimal('1.00') > >>> neg_one = decimal.Decimal('-1.00') > >>> pa.array([one], pa.decimal128(24, 12)) > > [ > Decimal('1.') > ] > >>> pa.array([neg_one], pa.decimal128(24, 12)) > Traceback (most recent call last): > File "", line 1, in > File "array.pxi", line 181, in pyarrow.lib.array > File "array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "error.pxi", line 77, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from > original scale of 6 to new scale of 12 would cause data loss > >>> pa.__version__ > '0.9.0' > {code} > not only is the error spurious, the decimal value has been multiplied by one > million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still > pretty strange to me). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2387) negative decimal values get spurious rescaling error
[ https://issues.apache.org/jira/browse/ARROW-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437822#comment-16437822 ] ASF GitHub Bot commented on ARROW-2387: --- bwo commented on issue #1832: ARROW-2387: [Python] Flip test for rescale loss if value < 0 URL: https://github.com/apache/arrow/pull/1832#issuecomment-381244400 hooray! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > negative decimal values get spurious rescaling error > > > Key: ARROW-2387 > URL: https://issues.apache.org/jira/browse/ARROW-2387 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.9.0 >Reporter: ben w >Assignee: Phillip Cloud >Priority: Major > Labels: pull-request-available > > {code:java} > $ python > Python 2.7.12 (default, Nov 20 2017, 18:23:56) > [GCC 5.4.0 20160609] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow as pa, decimal > >>> one = decimal.Decimal('1.00') > >>> neg_one = decimal.Decimal('-1.00') > >>> pa.array([one], pa.decimal128(24, 12)) > > [ > Decimal('1.') > ] > >>> pa.array([neg_one], pa.decimal128(24, 12)) > Traceback (most recent call last): > File "", line 1, in > File "array.pxi", line 181, in pyarrow.lib.array > File "array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "error.pxi", line 77, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Rescaling decimal value -100.00 from > original scale of 6 to new scale of 12 would cause data loss > >>> pa.__version__ > '0.9.0' > {code} > not only is the error spurious, the decimal value has been multiplied by one > million (i.e. 10 ** 6 and 6 is the difference in scales, but this is still > pretty strange to me). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2458) [Plasma] PlasmaClient uses global variable
[ https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437782#comment-16437782 ] ASF GitHub Bot commented on ARROW-2458: --- pcmoritz opened a new pull request #1893: ARROW-2458: [Plasma] Use one thread pool per PlasmaClient URL: https://github.com/apache/arrow/pull/1893 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Plasma] PlasmaClient uses global variable > -- > > Key: ARROW-2458 > URL: https://issues.apache.org/jira/browse/ARROW-2458 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Affects Versions: 0.9.0 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > > The threadpool threadpool_ that PlasmaClient is using is global at the > moment. This prevents us from using multiple PlasmaClients in the same > process (one per thread). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2458) [Plasma] PlasmaClient uses global variable
[ https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2458: -- Labels: pull-request-available (was: ) > [Plasma] PlasmaClient uses global variable > -- > > Key: ARROW-2458 > URL: https://issues.apache.org/jira/browse/ARROW-2458 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Affects Versions: 0.9.0 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > > The threadpool threadpool_ that PlasmaClient is using is global at the > moment. This prevents us from using multiple PlasmaClients in the same > process (one per thread). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-2458) [Plasma] PlasmaClient uses global variable
[ https://issues.apache.org/jira/browse/ARROW-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz reassigned ARROW-2458: - Assignee: Philipp Moritz > [Plasma] PlasmaClient uses global variable > -- > > Key: ARROW-2458 > URL: https://issues.apache.org/jira/browse/ARROW-2458 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Affects Versions: 0.9.0 >Reporter: Philipp Moritz >Assignee: Philipp Moritz >Priority: Major > Labels: pull-request-available > > The threadpool threadpool_ that PlasmaClient is using is global at the > moment. This prevents us from using multiple PlasmaClients in the same > process (one per thread). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2458) [Plasma] PlasmaClient uses global variable
Philipp Moritz created ARROW-2458: - Summary: [Plasma] PlasmaClient uses global variable Key: ARROW-2458 URL: https://issues.apache.org/jira/browse/ARROW-2458 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Affects Versions: 0.9.0 Reporter: Philipp Moritz The threadpool threadpool_ that PlasmaClient is using is global at the moment. This prevents us from using multiple PlasmaClients in the same process (one per thread). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437757#comment-16437757 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on a change in pull request #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#discussion_r181482519 ## File path: cpp/src/arrow/python/numpy_to_arrow.cc ## @@ -844,6 +846,13 @@ Status NumPyConverter::ConvertObjectStrings() { StringBuilder builder(pool_); RETURN_NOT_OK(builder.Resize(length_)); + // If the creator of this NumPyConverter specified a type, + // then we want to force the output type to be utf8. If + // the input data is PyBytes and not PyUnicode and + // not convertible to utf8, the call to AppendObjectStrings + // below will fail because we pass force_string as the + // value for check_valid. + bool force_string = type_ != std::nullptr && type_->Equals(utf8()); Review comment: Apparently some compilers don't like `std::nullptr`. Just use `type_ != nullptr`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437752#comment-16437752 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#issuecomment-381231182 By the way, the validity check is expensive since it utf8-decodes the bytestring. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437748#comment-16437748 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on issue #1886: ARROW-2101: [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string URL: https://github.com/apache/arrow/pull/1886#issuecomment-381230244 > Also, this doesn't change anything for Python 2 if using 'str' objects and the type is not specified, it will still create a BinaryArray, is this what we want? *Probably*. Python 2 `str` objects are bytestrings just like Python 3 `bytes` objects. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437747#comment-16437747 ] Alex Hagerman commented on ARROW-2339: -- Good to know. I'll look at the open tickets and priority to see if there is something else to pick up. Also don't want to hold up things if I can't work on something for a few days. > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437736#comment-16437736 ] Antoine Pitrou commented on ARROW-2339: --- And by the way I think this is quite low-priority, unless you know of a use case where the performance of hashing arrow scalars is critical. > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2241) [Python] Simple script for running all current ASV benchmarks at a commit or tag
[ https://issues.apache.org/jira/browse/ARROW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-2241. --- Resolution: Fixed Assignee: Antoine Pitrou > [Python] Simple script for running all current ASV benchmarks at a commit or > tag > > > Key: ARROW-2241 > URL: https://issues.apache.org/jira/browse/ARROW-2241 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.10.0 > > > The objective of this is to be able to get a graph for performance at each > release tag for the currently-defined benchmarks (including benchmarks that > did not exist in older tags) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2241) [Python] Simple script for running all current ASV benchmarks at a commit or tag
[ https://issues.apache.org/jira/browse/ARROW-2241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437732#comment-16437732 ] Antoine Pitrou commented on ARROW-2241: --- In ARROW-2182 we made it so ASV is able to build the C++ Arrow libs for each changeset. Since parquet-cpp is in another repository, though, it's not handled through that mechanism. > [Python] Simple script for running all current ASV benchmarks at a commit or > tag > > > Key: ARROW-2241 > URL: https://issues.apache.org/jira/browse/ARROW-2241 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Priority: Major > Fix For: 0.10.0 > > > The objective of this is to be able to get a graph for performance at each > release tag for the currently-defined benchmarks (including benchmarks that > did not exist in older tags) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437728#comment-16437728 ] ASF GitHub Bot commented on ARROW-2455: --- pitrou commented on issue #1892: ARROW-2455: [C++] Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892#issuecomment-381225091 Thank you @sighingnow ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] The bytes_allocated_ in CudaContextImpl isn't initialized > --- > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437729#comment-16437729 ] Alex Hagerman commented on ARROW-2339: -- That will be interesting! Got it. Thank you for the direction. > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437724#comment-16437724 ] ASF GitHub Bot commented on ARROW-2455: --- pitrou closed pull request #1892: ARROW-2455: [C++] Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/cpp/src/arrow/gpu/cuda_context.cc b/cpp/src/arrow/gpu/cuda_context.cc index 909c98aa8..578c04a5a 100644 --- a/cpp/src/arrow/gpu/cuda_context.cc +++ b/cpp/src/arrow/gpu/cuda_context.cc @@ -40,7 +40,7 @@ struct CudaDevice { class CudaContext::CudaContextImpl { public: - CudaContextImpl() {} + CudaContextImpl() : bytes_allocated_(0) {} Status Init(const CudaDevice& device) { device_ = device; This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] The bytes_allocated_ in CudaContextImpl isn't initialized > --- > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-2455. --- Resolution: Fixed Fix Version/s: 0.10.0 Issue resolved by pull request 1892 [https://github.com/apache/arrow/pull/1892] > [C++] The bytes_allocated_ in CudaContextImpl isn't initialized > --- > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437715#comment-16437715 ] Antoine Pitrou commented on ARROW-2339: --- No, you need to scrupulously replicate Python hashing's mechanism. Also I don't think there's any point in using xxHash and friends over a simple 64-bit integer. > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2339) [Python] Add a fast path for int hashing
[ https://issues.apache.org/jira/browse/ARROW-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437664#comment-16437664 ] Alex Hagerman commented on ARROW-2339: -- [~pitrou] [~wesmckinn] sorry I've been absent on this work has had me tied up day and night hoping to work some more on this over the weekend. I was wondering if you had any thoughts on using xxHash, MumrurHash or FNV-1a for this? I was going to do some timing this weekend as well as testing for collisions on various ints as you mentioned on the original ticket. Do you know if we can use existing implementations of the hash from C or C++ with wrappers? I didn't know what ASF rules might be on that with regard to licenses (only ASF or MIT/BSD allowed) and adding the Cython wrappers to PyArrow. If it's better just to do a new implementation I'll work on that too, but didn't want to reinvent a wheel if I didn't need to. > [Python] Add a fast path for int hashing > > > Key: ARROW-2339 > URL: https://issues.apache.org/jira/browse/ARROW-2339 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Alex Hagerman >Assignee: Alex Hagerman >Priority: Major > Fix For: 0.10.0 > > > Create a __hash__ fast path for Int scalars that avoids using as_py(). > > https://issues.apache.org/jira/browse/ARROW-640 > [https://github.com/apache/arrow/pull/1765/files/4497b69db8039cfeaa7a25f593f3a3e6c7984604] > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2222) [C++] Add option to validate Flatbuffers messages
[ https://issues.apache.org/jira/browse/ARROW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437615#comment-16437615 ] ASF GitHub Bot commented on ARROW-: --- crepererum commented on issue #1763: ARROW-: handle untrusted inputs (POC) URL: https://github.com/apache/arrow/pull/1763#issuecomment-381205704 @xhochy I've fixed the formatting, but I don't understand why it is failing now :( This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Add option to validate Flatbuffers messages > - > > Key: ARROW- > URL: https://issues.apache.org/jira/browse/ARROW- > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Marco Neumann >Priority: Major > Labels: pull-request-available > > This is follow up work to ARROW-1589, ARROW-2023, and can be validated by the > {{ipc-fuzzer-test}}. Users receiving untrusted input streams can prevent > segfaults this way > As part of this, we should quantify the overhead associated with message > validation in regular use -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ARROW-2456) garrow_array_builder_append_values does not work for large arrays
[ https://issues.apache.org/jira/browse/ARROW-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haralampos Gavriilidis closed ARROW-2456. - Resolution: Duplicate > garrow_array_builder_append_values does not work for large arrays > - > > Key: ARROW-2456 > URL: https://issues.apache.org/jira/browse/ARROW-2456 > Project: Apache Arrow > Issue Type: Bug > Components: C++, GLib >Reporter: Haralampos Gavriilidis >Priority: Major > > When calling > {code:java} > garrow_array_builder_append_values(GArrowArrayBuilder *builder, > const VALUE *values, > gint64 values_length, > const gboolean *is_valids, > gint64 is_valids_length, > GError **error, > const gchar *context){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2457) garrow_array_builder_append_values() won't work for large arrays
Haralampos Gavriilidis created ARROW-2457: - Summary: garrow_array_builder_append_values() won't work for large arrays Key: ARROW-2457 URL: https://issues.apache.org/jira/browse/ARROW-2457 Project: Apache Arrow Issue Type: Bug Components: C, C++, GLib Affects Versions: 0.9.0, 0.8.0 Reporter: Haralampos Gavriilidis I am using garrow_array_builder_append_values() to transform a native C array to an Arrow array, without calling arrow_array_builder_append multiple times. When calling garrow_array_builder_append_values() in array-builder.cpp with following signature: {code:java} garrow_array_builder_append_values(GArrowArrayBuilder *builder, const VALUE *values, gint64 values_length, const gboolean *is_valids, gint64 is_valids_length, GError **error, const gchar *context) {code} it will fail for large arrays. This is probably happening because the is_valids array is copied to the valid_bytes array (of different type), for which the memory is allocated on the stack, and not on the heap, like shown on the snippet below: {code:java} uint8_t valid_bytes[is_valids_length]; for (gint64 i = 0; i < is_valids_length; ++i){ valid_bytes[i] = is_valids[i]; } {code} A way to avoid this problem would be to allocate memory for the valid_bytes array using malloc() or something similar. Is this behavior intended, maybe because no large arrays should be handed over to that function, or it is rather a bug? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2456) garrow_array_builder_append_values does not work for large arrays
Haralampos Gavriilidis created ARROW-2456: - Summary: garrow_array_builder_append_values does not work for large arrays Key: ARROW-2456 URL: https://issues.apache.org/jira/browse/ARROW-2456 Project: Apache Arrow Issue Type: Bug Components: C++, GLib Reporter: Haralampos Gavriilidis When calling {code:java} garrow_array_builder_append_values(GArrowArrayBuilder *builder, const VALUE *values, gint64 values_length, const gboolean *is_valids, gint64 is_valids_length, GError **error, const gchar *context){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.
[ https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437380#comment-16437380 ] ASF GitHub Bot commented on ARROW-2435: --- liurenjie1024 commented on issue #1875: ARROW-2435: [Rust] Add memory pool abstraction. URL: https://github.com/apache/arrow/pull/1875#issuecomment-381157762 @crepererum We can have memory pool as a wrapper allocator api so that we can have more functionality, e.g. statistics about memory usage This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Rust] Add memory pool abstraction. > --- > > Key: ARROW-2435 > URL: https://issues.apache.org/jira/browse/ARROW-2435 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.9.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > Labels: pull-request-available > > Add memory pool abstraction as the c++ api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2397) Document changes in Tensor encoding in IPC.md.
[ https://issues.apache.org/jira/browse/ARROW-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437351#comment-16437351 ] ASF GitHub Bot commented on ARROW-2397: --- xhochy commented on issue #1837: ARROW-2397: [Documentation] Update format documentation to describe tensor alignment. URL: https://github.com/apache/arrow/pull/1837#issuecomment-381149693 @robertnishihara I think we can go ahead and merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Document changes in Tensor encoding in IPC.md. > -- > > Key: ARROW-2397 > URL: https://issues.apache.org/jira/browse/ARROW-2397 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Robert Nishihara >Priority: Major > Labels: pull-request-available > > Update IPC.md to reflect the changes in > https://github.com/apache/arrow/pull/1802. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.
[ https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437341#comment-16437341 ] ASF GitHub Bot commented on ARROW-2435: --- crepererum commented on issue #1875: ARROW-2435: [Rust] Add memory pool abstraction. URL: https://github.com/apache/arrow/pull/1875#issuecomment-381147104 Or in other words: do you (@andygrove) think we should switch to the upstream API once it is stable? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Rust] Add memory pool abstraction. > --- > > Key: ARROW-2435 > URL: https://issues.apache.org/jira/browse/ARROW-2435 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.9.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > Labels: pull-request-available > > Add memory pool abstraction as the c++ api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.
[ https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437337#comment-16437337 ] ASF GitHub Bot commented on ARROW-2435: --- crepererum commented on issue #1875: ARROW-2435: [Rust] Add memory pool abstraction. URL: https://github.com/apache/arrow/pull/1875#issuecomment-381146548 Could we not use something closer to the hopefully-soon-stable [Allocator API](https://doc.rust-lang.org/alloc/allocator/trait.Alloc.html)? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Rust] Add memory pool abstraction. > --- > > Key: ARROW-2435 > URL: https://issues.apache.org/jira/browse/ARROW-2435 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.9.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > Labels: pull-request-available > > Add memory pool abstraction as the c++ api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2435) [Rust] Add memory pool abstraction.
[ https://issues.apache.org/jira/browse/ARROW-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437297#comment-16437297 ] ASF GitHub Bot commented on ARROW-2435: --- andygrove commented on issue #1875: ARROW-2435: [Rust] Add memory pool abstraction. URL: https://github.com/apache/arrow/pull/1875#issuecomment-381135426 @xhochy I think this looks good now? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Rust] Add memory pool abstraction. > --- > > Key: ARROW-2435 > URL: https://issues.apache.org/jira/browse/ARROW-2435 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.9.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > Labels: pull-request-available > > Add memory pool abstraction as the c++ api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
[ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437279#comment-16437279 ] ASF GitHub Bot commented on ARROW-2101: --- pitrou commented on a change in pull request #1886: Bug fix for ARROW-2101 URL: https://github.com/apache/arrow/pull/1886#discussion_r181381388 ## File path: python/pyarrow/tests/test_convert_numpy.py ## @@ -0,0 +1,35 @@ +# -*- coding: utf-8 -*- +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import numpy as np +import pyarrow as pa + +import pytest + +# Regression test for ARROW-2101 +def test_convert_numpy_array_of_bytes_to_arrow_array_of_strings(): +converted = pa.array(np.array([b'x'], dtype=object), pa.string()) +assert converted.type == pa.string() + +# Make sure that if an ndarray of bytes is passed to the array +# constructor and the type is string, it will fail if those bytes +# cannot be converted to utf-8 +def test_convert_numpy_array_of_bytes_to_arrow_array_of_strings_bad_data(): +with pytest.raises(pa.lib.ArrowException, + message="Unknown error: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte"): +pa.array(np.array([b'\x80\x81'], dtype=object), pa.string()) Review comment: Indeed. Also I don't think we need both Python and C++ tests. Given the difference in verbosity and maintainability, I'd favour writing the tests on the Python side. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] from_pandas reads 'str' type as binary Arrow data with Python 2 > > > Key: ARROW-2101 > URL: https://issues.apache.org/jira/browse/ARROW-2101 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.8.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow > data of binary type, even if the user supplies type information. conversion > of 'unicode' type works to create Arrow data of string types. For example > {code} > In [25]: pa.Array.from_pandas(pd.Series(['a'])).type > Out[25]: DataType(binary) > In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type > Out[26]: DataType(binary) > In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type > Out[27]: DataType(string) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2455) [C++] The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2455: -- Summary: [C++] The bytes_allocated_ in CudaContextImpl isn't initialized (was: [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized) > [C++] The bytes_allocated_ in CudaContextImpl isn't initialized > --- > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2455) [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2455: -- Summary: [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized (was: The bytes_allocated_ in CudaContextImpl isn't initialized) > [GPU] The bytes_allocated_ in CudaContextImpl isn't initialized > --- > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437243#comment-16437243 ] ASF GitHub Bot commented on ARROW-2455: --- sighingnow commented on issue #1892: ARROW-2455: [C++] Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892#issuecomment-381118389 Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The bytes_allocated_ in CudaContextImpl isn't initialized > - > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437237#comment-16437237 ] ASF GitHub Bot commented on ARROW-2455: --- pitrou commented on issue #1892: ARROW-2455: Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892#issuecomment-381115779 Thanks for doing this. You've got a C++ linting error here: https://travis-ci.org/apache/arrow/jobs/366068960#L953 I suggest you run "make format" to fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The bytes_allocated_ in CudaContextImpl isn't initialized > - > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437236#comment-16437236 ] ASF GitHub Bot commented on ARROW-2455: --- pitrou commented on issue #1892: ARROW-2455: Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892#issuecomment-381115779 Thanks for doing this. You've got a C++ linting here: https://travis-ci.org/apache/arrow/jobs/366068960#L953 I suggest you run "make format" to fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The bytes_allocated_ in CudaContextImpl isn't initialized > - > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects
[ https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2423: -- Labels: beginner (was: ) > [Python] PyArrow datatypes raise ValueError on equality checks against > non-PyArrow objects > -- > > Key: ARROW-2423 > URL: https://issues.apache.org/jira/browse/ARROW-2423 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 > Environment: Mac OS High Sierra > PyArrow 0.9.0 (py36_1) > Python 3.6.3 >Reporter: Dave Challis >Priority: Minor > Labels: beginner > > Checking a PyArrow datatype object for equality with non-PyArrow datatypes > causes a `ValueError` to be raised, rather than either returning a True/False > value, or returning > [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented] > if the comparison isn't implemented. > E.g. attempting to call: > {code:java} > import pyarrow > pyarrow.int32() == 'foo' > {code} > results in: > {code:java} > Traceback (most recent call last): > File "types.pxi", line 1221, in pyarrow.lib.type_for_alias > KeyError: 'foo' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "t.py", line 2, in > pyarrow.int32() == 'foo' > File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__ > File "types.pxi", line 113, in pyarrow.lib.DataType.equals > File "types.pxi", line 1223, in pyarrow.lib.type_for_alias > ValueError: No type alias for foo > {code} > The expected outcome for the above would be for the comparison to return > `False`, as that's the general behaviour for comparisons between objects of > different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2423) [Python] PyArrow datatypes raise ValueError on equality checks against non-PyArrow objects
[ https://issues.apache.org/jira/browse/ARROW-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437222#comment-16437222 ] Antoine Pitrou commented on ARROW-2423: --- Agreed with this. It should be an easy fix. > [Python] PyArrow datatypes raise ValueError on equality checks against > non-PyArrow objects > -- > > Key: ARROW-2423 > URL: https://issues.apache.org/jira/browse/ARROW-2423 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 > Environment: Mac OS High Sierra > PyArrow 0.9.0 (py36_1) > Python 3.6.3 >Reporter: Dave Challis >Priority: Minor > Labels: beginner > > Checking a PyArrow datatype object for equality with non-PyArrow datatypes > causes a `ValueError` to be raised, rather than either returning a True/False > value, or returning > [NotImplemented|https://docs.python.org/3/library/constants.html#NotImplemented] > if the comparison isn't implemented. > E.g. attempting to call: > {code:java} > import pyarrow > pyarrow.int32() == 'foo' > {code} > results in: > {code:java} > Traceback (most recent call last): > File "types.pxi", line 1221, in pyarrow.lib.type_for_alias > KeyError: 'foo' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "t.py", line 2, in > pyarrow.int32() == 'foo' > File "types.pxi", line 90, in pyarrow.lib.DataType.__richcmp__ > File "types.pxi", line 113, in pyarrow.lib.DataType.equals > File "types.pxi", line 1223, in pyarrow.lib.type_for_alias > ValueError: No type alias for foo > {code} > The expected outcome for the above would be for the comparison to return > `False`, as that's the general behaviour for comparisons between objects of > different types (e.g. `1 == 'foo'` or `object() == 12.4` both return `False`). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2455: -- Labels: pull-request-available (was: ) > The bytes_allocated_ in CudaContextImpl isn't initialized > - > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
[ https://issues.apache.org/jira/browse/ARROW-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437221#comment-16437221 ] ASF GitHub Bot commented on ARROW-2455: --- sighingnow opened a new pull request #1892: ARROW-2455: Initialize the atomic bytes_allocated_ properly URL: https://github.com/apache/arrow/pull/1892 The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, leading to failure of cuda-test on windows. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > The bytes_allocated_ in CudaContextImpl isn't initialized > - > > Key: ARROW-2455 > URL: https://issues.apache.org/jira/browse/ARROW-2455 > Project: Apache Arrow > Issue Type: Bug > Components: GPU >Reporter: Tao He >Priority: Major > Labels: pull-request-available > > The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, > leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2455) The bytes_allocated_ in CudaContextImpl isn't initialized
Tao He created ARROW-2455: - Summary: The bytes_allocated_ in CudaContextImpl isn't initialized Key: ARROW-2455 URL: https://issues.apache.org/jira/browse/ARROW-2455 Project: Apache Arrow Issue Type: Bug Components: GPU Reporter: Tao He The atomic counter `bytes_allocated_` in `CudaContextImpl` isn't initialized, leading to failure of cuda-test on windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2450) [Python] Saving to parquet fails for empty lists
[ https://issues.apache.org/jira/browse/ARROW-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437139#comment-16437139 ] ASF GitHub Bot commented on ARROW-2450: --- pitrou opened a new pull request #1891: ARROW-2450: [Python] Test for Parquet roundtrip of null lists URL: https://github.com/apache/arrow/pull/1891 Actual fix is in PARQUET-1268. Also fix a crash when a column doesn't have any statistics. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] Saving to parquet fails for empty lists > > > Key: ARROW-2450 > URL: https://issues.apache.org/jira/browse/ARROW-2450 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Uwe L. Korn >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.9.1 > > > When writing a table to parquet through pandas, if any column includes an > empty list, it fails with a segmentation fault. > Minimal example: > {code} > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > def save(rows): > table1 = pa.Table.from_pandas(pd.DataFrame(rows)) > pq.write_table(table1, 'test-foo.pq') > table2 = pq.read_table('test-foo.pq') > print('ROWS:', rows) > print('TABLE1:', table1.to_pandas(), sep='\n') > print('TABLE2:', table2.to_pandas(), sep='\n') > save([{'val': ['something']}]) > print('---') > save([{'val': []}]) # empty > {code} > Output: > {code} > ROWS: [{'val': ['something']}] > TABLE1: >val > 0 [something] > TABLE2: >val > 0 [something] > --- > ROWS: [{'val': []}] > TABLE1: > val > 0 [] > [1]13472 segmentation fault (core dumped) python3 test.py > {code} > Versions: > {code} > $ pip3 list | grep pyarrow > pyarrow (0.9.0) > $ python3 --version > Python 3.5.2 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2450) [Python] Saving to parquet fails for empty lists
[ https://issues.apache.org/jira/browse/ARROW-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2450: -- Labels: pull-request-available (was: ) > [Python] Saving to parquet fails for empty lists > > > Key: ARROW-2450 > URL: https://issues.apache.org/jira/browse/ARROW-2450 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.9.0 >Reporter: Uwe L. Korn >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.9.1 > > > When writing a table to parquet through pandas, if any column includes an > empty list, it fails with a segmentation fault. > Minimal example: > {code} > import pyarrow as pa > import pyarrow.parquet as pq > import pandas as pd > def save(rows): > table1 = pa.Table.from_pandas(pd.DataFrame(rows)) > pq.write_table(table1, 'test-foo.pq') > table2 = pq.read_table('test-foo.pq') > print('ROWS:', rows) > print('TABLE1:', table1.to_pandas(), sep='\n') > print('TABLE2:', table2.to_pandas(), sep='\n') > save([{'val': ['something']}]) > print('---') > save([{'val': []}]) # empty > {code} > Output: > {code} > ROWS: [{'val': ['something']}] > TABLE1: >val > 0 [something] > TABLE2: >val > 0 [something] > --- > ROWS: [{'val': []}] > TABLE1: > val > 0 [] > [1]13472 segmentation fault (core dumped) python3 test.py > {code} > Versions: > {code} > $ pip3 list | grep pyarrow > pyarrow (0.9.0) > $ python3 --version > Python 3.5.2 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2454) [Python] Empty chunked array slice crashes
Antoine Pitrou created ARROW-2454: - Summary: [Python] Empty chunked array slice crashes Key: ARROW-2454 URL: https://issues.apache.org/jira/browse/ARROW-2454 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Antoine Pitrou {code:python} >>> col = pa.Column.from_array('ints', pa.array([1,2,3])) >>> col chunk 0: [ 1, 2, 3 ] >>> col.data >>> col.data[:1] >>> col.data[:0] Erreur de segmentation (core dumped) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission error
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2452: --- Summary: [TEST] Spark integration test fails with permission error (was: [TEST] Spark integration test fails with permission err\or) > [TEST] Spark integration test fails with permission error > - > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {code} > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different installation > directory, preferably one that is listed in your PYTHONPATH environment > variable. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission error
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2452: -- Labels: pull-request-available (was: ) > [TEST] Spark integration test fails with permission error > - > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {code} > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different installation > directory, preferably one that is listed in your PYTHONPATH environment > variable. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission err\or
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2452: --- Summary: [TEST] Spark integration test fails with permission err\or (was: [TEST] Spark integration test fails with permission eror) > [TEST] Spark integration test fails with permission err\or > -- > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {code} > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different installation > directory, preferably one that is listed in your PYTHONPATH environment > variable. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2452) [TEST] Spark integration test fails with permission error
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437093#comment-16437093 ] ASF GitHub Bot commented on ARROW-2452: --- kszucs opened a new pull request #1890: ARROW-2452: [TEST] Spark integration test fails with permission error URL: https://github.com/apache/arrow/pull/1890 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [TEST] Spark integration test fails with permission error > - > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {code} > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different installation > directory, preferably one that is listed in your PYTHONPATH environment > variable. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2453) [Python] Improve Table column access
Antoine Pitrou created ARROW-2453: - Summary: [Python] Improve Table column access Key: ARROW-2453 URL: https://issues.apache.org/jira/browse/ARROW-2453 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.9.0 Reporter: Antoine Pitrou Suppose you have a table column named "nulls". Right now, to access it on a table, you need to do something like this: {code:python} >>> table.column(table.schema.get_field_index('nulls')) chunk 0: [ NA, NA, NA ] {code} Also, if you mistype the column name, instead of getting an error you get an arbitrary column: {code} >>> table.column(table.schema.get_field_index('z')) chunk 0: [ 0, 1, 2 ] {code} {{Table.column()}} should accept a string object and return the column with the corresponding name. KeyError should be raised if there is no column with a such name. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2452: --- Description: {{arrow/dev/run_docker_compose.sh spark_integration}} {code} Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable. {code} was: {{arrow/dev/run_docker_compose.sh spark_integration}} {{Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable.}} > [TEST] Spark integration test fails with permission eror > > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {code} > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different i
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2452: --- Description: {{arrow/dev/run_docker_compose.sh spark_integration}} {{Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable.}} was: {{arrow/dev/run_docker_compose.sh spark_integration}} {{ Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable. }} > [TEST] Spark integration test fails with permission eror > > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {{Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different installation >
[jira] [Updated] (ARROW-2452) [TEST] Spark integration test fails with permission eror
[ https://issues.apache.org/jira/browse/ARROW-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs updated ARROW-2452: --- Description: {{arrow/dev/run_docker_compose.sh spark_integration}} {{ Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable. }} was: {{ arrow/dev/run_docker_compose.sh spark_integration }} {{ Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable. }} > [TEST] Spark integration test fails with permission eror > > > Key: ARROW-2452 > URL: https://issues.apache.org/jira/browse/ARROW-2452 > Project: Apache Arrow > Issue Type: Bug >Reporter: Krisztian Szucs >Priority: Major > > {{arrow/dev/run_docker_compose.sh spark_integration}} > {{ > Scanning dependencies of target lib > [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o > [100%] Linking CXX shared module release/lib.so > [100%] Built target lib > -- Finished cmake --build for pyarrow > Bundling includes: release/include > ('Moving built C-extension', 'release/lib.so', 'to build path', > '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') > release/_parquet.so > Cython module _parquet failure permitted > release/_orc.so > Cython module _orc failure permitted > release/plasma.so > Cython module plasma failure permitted > running install > error: can't create or remove files in install directory > The following error occurred while trying to add or remove files in the > installation directory: > [Errno 13] Permission denied: > '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' > The installation directory you specified (via --install-dir, --prefix, or > the distutils default setting) was: > /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ > Perhaps your account does not have write access to this directory? If the > installation directory is a system-owned directory, you may need to sign in > as the administrator or "root" account. If you do not have administrative > access to this machine, you may wish to choose a different insta
[jira] [Created] (ARROW-2452) [TEST] Spark integration test fails with permission eror
Krisztian Szucs created ARROW-2452: -- Summary: [TEST] Spark integration test fails with permission eror Key: ARROW-2452 URL: https://issues.apache.org/jira/browse/ARROW-2452 Project: Apache Arrow Issue Type: Bug Reporter: Krisztian Szucs {{ arrow/dev/run_docker_compose.sh spark_integration }} {{ Scanning dependencies of target lib [ 66%] Building CXX object CMakeFiles/lib.dir/lib.cxx.o [100%] Linking CXX shared module release/lib.so [100%] Built target lib -- Finished cmake --build for pyarrow Bundling includes: release/include ('Moving built C-extension', 'release/lib.so', 'to build path', '/apache-arrow/arrow/python/build/lib.linux-x86_64-2.7/pyarrow/lib.so') release/_parquet.so Cython module _parquet failure permitted release/_orc.so Cython module _orc failure permitted release/plasma.so Cython module plasma failure permitted running install error: can't create or remove files in install directory The following error occurred while trying to add or remove files in the installation directory: [Errno 13] Permission denied: '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/test-easy-install-1855.write-test' The installation directory you specified (via --install-dir, --prefix, or the distutils default setting) was: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/ Perhaps your account does not have write access to this directory? If the installation directory is a system-owned directory, you may need to sign in as the administrator or "root" account. If you do not have administrative access to this machine, you may wish to choose a different installation directory, preferably one that is listed in your PYTHONPATH environment variable. }} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2448) Segfault when plasma client goes out of scope before buffer.
[ https://issues.apache.org/jira/browse/ARROW-2448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437057#comment-16437057 ] Antoine Pitrou commented on ARROW-2448: --- {quote}Not sure if I understand your question correctly, but the buffer can still point to a valid region of memory after the client is destroyed (since the store is still running).{quote} In that case, it's impossible to release that buffer's memory, right? Even when the client process dies, as long as the store is still running, the memory would still be reserved... {quote}Similarly, each PlasmaBuffer needs a shared pointer to the PlasmaClient. What do you think about something like that?{quote} That would probably work to fix the crash, indeed. Something like a "pimpl" pattern? > Segfault when plasma client goes out of scope before buffer. > > > Key: ARROW-2448 > URL: https://issues.apache.org/jira/browse/ARROW-2448 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++), Python >Reporter: Robert Nishihara >Priority: Major > > The following causes a segfault. > > First start a plasma store with > {code:java} > plasma_store -s /tmp/store -m 100{code} > Then run the following in Python. > {code} > import pyarrow.plasma as plasma > import numpy as np > client = plasma.connect('/tmp/store', '', 0) > object_id = client.put(np.zeros(3)) > buf = client.get(object_id) > del client > del buf # This segfaults.{code} > The backtrace is > {code:java} > (lldb) bt > * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS > (code=1, address=0xfffc) > * frame #0: 0x0001056deaee > libplasma.0.dylib`plasma::PlasmaClient::Release(plasma::UniqueID const&) + 142 > frame #1: 0x0001056de9e9 > libplasma.0.dylib`plasma::PlasmaBuffer::~PlasmaBuffer() + 41 > frame #2: 0x0001056dec9f libplasma.0.dylib`arrow::Buffer::~Buffer() + > 63 > frame #3: 0x000106206661 > lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() > [inlined] std::__1::__shared_count::__release_shared(this=0x0001019b7d20) > at memory:3444 > frame #4: 0x000106206617 > lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr() > [inlined] > std::__1::__shared_weak_count::__release_shared(this=0x0001019b7d20) at > memory:3486 > frame #5: 0x000106206617 > lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780) > at memory:4412 > frame #6: 0x000106002b35 > lib.cpython-36m-darwin.so`std::__1::shared_ptr::~shared_ptr(this=0x000100791780) > at memory:4410 > frame #7: 0x0001061052c5 lib.cpython-36m-darwin.so`void > __Pyx_call_destructor > >(x=std::__1::shared_ptr::element_type @ 0x0001019b7d38 > strong=0 weak=1) at lib.cxx:486 > frame #8: 0x000106104f93 > lib.cpython-36m-darwin.so`__pyx_tp_dealloc_7pyarrow_3lib_Buffer(o=0x000100791768) > at lib.cxx:107704 > frame #9: 0x0001069fcd54 > multiarray.cpython-36m-darwin.so`array_dealloc + 292 > frame #10: 0x0001000e8daf > libpython3.6m.dylib`_PyDict_DelItem_KnownHash + 463 > frame #11: 0x000100171899 > libpython3.6m.dylib`_PyEval_EvalFrameDefault + 13321 > frame #12: 0x0001001791ef > libpython3.6m.dylib`_PyEval_EvalCodeWithName + 2447 > frame #13: 0x00010016e3d4 libpython3.6m.dylib`PyEval_EvalCode + 100 > frame #14: 0x0001001a3bd6 > libpython3.6m.dylib`PyRun_InteractiveOneObject + 582 > frame #15: 0x0001001a350e > libpython3.6m.dylib`PyRun_InteractiveLoopFlags + 222 > frame #16: 0x0001001a33fc libpython3.6m.dylib`PyRun_AnyFileExFlags + > 60 > frame #17: 0x0001001bc835 libpython3.6m.dylib`Py_Main + 3829 > frame #18: 0x00010df8 python`main + 232 > frame #19: 0x7fff6cd80015 libdyld.dylib`start + 1 > frame #20: 0x7fff6cd80015 libdyld.dylib`start + 1{code} > Basically, the issue is that when the buffer goes out of scope, it calls > {{Release}} on the plasma client, but the client has already been deallocated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works
[ https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437044#comment-16437044 ] ASF GitHub Bot commented on ARROW-2300: --- kszucs opened a new pull request #1889: ARROW-2300: [C++/Python] Integration test for HDFS URL: https://github.com/apache/arrow/pull/1889 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Python] python/testing/test_hdfs.sh no longer works > > > Key: ARROW-2300 > URL: https://issues.apache.org/jira/browse/ARROW-2300 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > Tried this on a fresh Ubuntu 16.04 install: > {code} > $ ./test_hdfs.sh > + docker build -t arrow-hdfs-test -f hdfs/Dockerfile . > Sending build context to Docker daemon 36.86kB > Step 1/6 : FROM cpcloud86/impala:metastore > manifest for cpcloud86/impala:metastore not found > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2300) [Python] python/testing/test_hdfs.sh no longer works
[ https://issues.apache.org/jira/browse/ARROW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2300: -- Labels: pull-request-available (was: ) > [Python] python/testing/test_hdfs.sh no longer works > > > Key: ARROW-2300 > URL: https://issues.apache.org/jira/browse/ARROW-2300 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > Tried this on a fresh Ubuntu 16.04 install: > {code} > $ ./test_hdfs.sh > + docker build -t arrow-hdfs-test -f hdfs/Dockerfile . > Sending build context to Docker daemon 36.86kB > Step 1/6 : FROM cpcloud86/impala:metastore > manifest for cpcloud86/impala:metastore not found > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)