[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431175#comment-16431175
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379883273
 
 
   My pleasure!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431167#comment-16431167
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379882116
 
 
   Thank you @kszucs !


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431128#comment-16431128
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou closed pull request #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/compute/kernels/cast.cc 
b/cpp/src/arrow/compute/kernels/cast.cc
index eaebd7cef..bfd519d18 100644
--- a/cpp/src/arrow/compute/kernels/cast.cc
+++ b/cpp/src/arrow/compute/kernels/cast.cc
@@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
-
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
+
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
   }
-  out_data[i] -= remainder;
-  bit_reader.Next();
 }
   }
 };
diff --git a/python/pyarrow/tests/test_convert_pandas.py 
b/python/pyarrow/tests/test_convert_pandas.py
index c6e2b75be..de6120176 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -807,6 +807,44 @@ def test_datetime64_to_date32(self):
 
 assert arr2.equals(arr.cast('date32'))
 
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False]),
+])
+def test_pandas_datetime_to_date64(self, mask):
+s = pd.to_datetime([
+'2018-05-10T00:00:00',
+'2018-05-11T00:00:00',
+'2018-05-12T00:00:00',
+])
+arr = pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
+data = np.array([
+date(2018, 5, 10),
+date(2018, 5, 11),
+date(2018, 5, 12)
+])
+expected = pa.array(data, mask=mask, type=pa.date64())
+
+assert arr.equals(expected)
+
+@pytest.mark.parametrize('mask', [
+None,
+np.ones(3),
+np.array([True, False, False])
+])
+def test_pandas_datetime_to_date64_failures(self, mask):
+s = pd.to_datetime([
+'2018-05-10T10:24:01',
+'2018-05-11T10:24:01',
+'2018-05-12T10:24:01',
+])
+
+expected_msg = 'Timestamp value had non-zero intraday milliseconds'
+with pytest.raises(pa.ArrowInvalid, msg=expected_msg):
+pa.Array.from_pandas(s, type=pa.date64(), mask=mask)
+
 def test_date_infer(self):
 df = pd.DataFrame({
 'date': [date(2000, 1, 1),


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> 

[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430843#comment-16430843
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180156054
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   No problem :) I'm still learning arrow.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430775#comment-16430775
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on issue #1859: ARROW-2391: [C++/Python] Segmentation fault 
from PyArrow when mapping Pandas datetime column to pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#issuecomment-379803722
 
 
   Waiting for the AppVeyor build before merging this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430734#comment-16430734
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180136727
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Wow. Sorry, I had completely overlooked the `out_data[i] -= remainder;` line 
:-S


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430645#comment-16430645
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180122730
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Sure, but don't we need another branch then to handle when time truncation 
is allowed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430623#comment-16430623
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180118162
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   What I'm suggesting is:
   ```cpp
   if (!options.allow_time_truncate) {
 // Ensure that intraday milliseconds have been zeroed out
 auto out_data = GetMutableValues(output, 1);
   
 if (input.null_count != 0) {
   internal::BitmapReader bit_reader(input.buffers[0]->data(), 
input.offset,
 input.length);
   
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0 && bit_reader.IsSet())) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
 bit_reader.Next();
   }
 } else {
   for (int64_t i = 0; i < input.length; ++i) {
 const int64_t remainder = out_data[i] % kMillisecondsInDay;
 if (ARROW_PREDICT_FALSE(remainder > 0)) {
   ctx->SetStatus(
   Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
   break;
 }
 out_data[i] -= remainder;
   }
 }
   }
   ```
   
   Does it make sense?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> 

[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430574#comment-16430574
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180106203
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   I might misunderstand, but:
   
   ```python
   # with allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T10:24:01',
   ]  # OK
   
   # without allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T10:24:01',  # <- fails here
   ]  
   
   # with allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T00:00:00',
   ]  # OK
   
   # without allow_time_truncate
   [
   '2018-05-10T00:00:00',
   '2018-05-11T00:00:00',
   '2018-05-12T00:00:00',
   ]  # OK - this would fail if I test outside the loop
   
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430566#comment-16430566
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180103071
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   What I mean is that you can skip the whole thing is 
`options.allow_time_truncate` is true (the compiler might do the optimization 
for us, but still).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430564#comment-16430564
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

kszucs commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180102499
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   Doesn't the first value encountered with time part trigger the error - which 
has to be checked inside the loop?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430548#comment-16430548
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180100389
 
 

 ##
 File path: cpp/src/arrow/compute/kernels/cast.cc
 ##
 @@ -396,21 +396,34 @@ struct CastFunctor {
 ShiftTime(ctx, options, conversion.first, 
conversion.second, input,
 output);
 
-internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
-  input.length);
+if (input.null_count != 0) {
+  internal::BitmapReader bit_reader(input.buffers[0]->data(), input.offset,
+input.length);
 
-// Ensure that intraday milliseconds have been zeroed out
-auto out_data = GetMutableValues(output, 1);
-for (int64_t i = 0; i < input.length; ++i) {
-  const int64_t remainder = out_data[i] % kMillisecondsInDay;
-  if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
-  remainder > 0)) {
-ctx->SetStatus(
-Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
-break;
+  // Ensure that intraday milliseconds have been zeroed out
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && 
bit_reader.IsSet() &&
+remainder > 0)) {
+  ctx->SetStatus(
+  Status::Invalid("Timestamp value had non-zero intraday 
milliseconds"));
+  break;
+}
+out_data[i] -= remainder;
+bit_reader.Next();
+  }
+} else {
+  auto out_data = GetMutableValues(output, 1);
+  for (int64_t i = 0; i < input.length; ++i) {
+const int64_t remainder = out_data[i] % kMillisecondsInDay;
+if (ARROW_PREDICT_FALSE(!options.allow_time_truncate && remainder > 
0)) {
 
 Review comment:
   `options.allow_time_truncate` is a constant accross this whole piece of 
code, so just add a higher-level `if` statement around all this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430465#comment-16430465
 ] 

ASF GitHub Bot commented on ARROW-2391:
---

pitrou commented on a change in pull request #1859: ARROW-2391: [C++/Python] 
Segmentation fault from PyArrow when mapping Pandas datetime column to 
pyarrow.date64
URL: https://github.com/apache/arrow/pull/1859#discussion_r180075435
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -807,6 +807,31 @@ def test_datetime64_to_date32(self):
 
 assert arr2.equals(arr.cast('date32'))
 
+def test_pandas_datetime_to_date64(self):
 
 Review comment:
   Could you expand the test to also check the case where a non-None mask is 
passed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>  Labels: pull-request-available
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-09 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430449#comment-16430449
 ] 

Krisztian Szucs commented on ARROW-2391:


https://github.com/apache/arrow/pull/1859

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2391) [Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64

2018-04-07 Thread Krisztian Szucs (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429289#comment-16429289
 ] 

Krisztian Szucs commented on ARROW-2391:


Confirmed, it segfaults with the latest master.

> [Python] Segmentation fault from PyArrow when mapping Pandas datetime column 
> to pyarrow.date64
> --
>
> Key: ARROW-2391
> URL: https://issues.apache.org/jira/browse/ARROW-2391
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.9.0
> Environment: Mac OS High Sierra
> Python 3.6
>Reporter: Dave Challis
>Priority: Major
>
> When trying to call `pyarrow.Table.from_pandas` with a `pandas.DataFrame` and 
> a `pyarrow.Schema` provided, the function call results in a segmentation 
> fault if Pandas `datetime64[ns]` column tries to be converted to a 
> `pyarrow.date64` type.
> A minimal example which shows this is:
> {code:python}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'created': ['2018-05-10T10:24:01']})
> df['created'] = pd.to_datetime(df['created'])}}
> schema = pa.schema([pa.field('created', pa.date64())])
> pa.Table.from_pandas(df, schema=schema)
> {code}
> Executing the above causes the python interpreter to exit with "Segmentation 
> fault: 11".
> Attempting to convert into various other datatypes (by specifying different 
> schemas) either succeeds, or raises an exception if the conversion is invalid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)