[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383637#comment-16383637
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

xhochy closed pull request #1650: ARROW-2205: [Python] Option for integer 
object nulls
URL: https://github.com/apache/arrow/pull/1650
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/cpp/src/arrow/python/arrow_to_pandas.cc 
b/cpp/src/arrow/python/arrow_to_pandas.cc
index aefd4d76d..21e848281 100644
--- a/cpp/src/arrow/python/arrow_to_pandas.cc
+++ b/cpp/src/arrow/python/arrow_to_pandas.cc
@@ -362,6 +362,29 @@ static void ConvertBooleanNoNulls(PandasOptions options, 
const ChunkedArray& dat
   }
 }
 
+template 
+static Status ConvertIntegerObjects(PandasOptions options, const ChunkedArray& 
data,
+PyObject** out_values) {
+  PyAcquireGIL lock;
+  for (int c = 0; c < data.num_chunks(); c++) {
+const auto& arr = *data.chunk(c);
+const T* in_values = GetPrimitiveValues(arr);
+
+for (int i = 0; i < arr.length(); ++i) {
+  if (arr.IsNull(i)) {
+Py_INCREF(Py_None);
+*out_values++ = Py_None;
+  } else {
+*out_values++ = std::is_signed::value
+? PyLong_FromLongLong(in_values[i])
+: PyLong_FromUnsignedLongLong(in_values[i]);
+RETURN_IF_PYERROR();
+  }
+}
+  }
+  return Status::OK();
+}
+
 template 
 inline Status ConvertBinaryLike(PandasOptions options, const ChunkedArray& 
data,
 PyObject** out_values) {
@@ -684,6 +707,22 @@ class ObjectBlock : public PandasBlock {
 
 if (type == Type::BOOL) {
   RETURN_NOT_OK(ConvertBooleanWithNulls(options_, data, out_buffer));
+} else if (type == Type::UINT8) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::INT8) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, out_buffer));
+} else if (type == Type::UINT16) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::INT16) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::UINT32) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::INT32) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::UINT64) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
+} else if (type == Type::INT64) {
+  RETURN_NOT_OK(ConvertIntegerObjects(options_, data, 
out_buffer));
 } else if (type == Type::BINARY) {
   RETURN_NOT_OK(ConvertBinaryLike(options_, data, out_buffer));
 } else if (type == Type::STRING) {
@@ -1202,34 +1241,33 @@ using BlockMap = std::unordered_map>;
 
 static Status GetPandasBlockType(const Column& col, const PandasOptions& 
options,
  PandasBlock::type* output_type) {
+#define INTEGER_CASE(NAME) 
  \
+  *output_type =   
  \
+  col.null_count() > 0 
  \
+  ? options.integer_object_nulls ? PandasBlock::OBJECT : 
PandasBlock::DOUBLE \
+  : PandasBlock::NAME; 
  \
+  break;
+
   switch (col.type()->id()) {
 case Type::BOOL:
   *output_type = col.null_count() > 0 ? PandasBlock::OBJECT : 
PandasBlock::BOOL;
   break;
 case Type::UINT8:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::UINT8;
-  break;
+  INTEGER_CASE(UINT8);
 case Type::INT8:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::INT8;
-  break;
+  INTEGER_CASE(INT8);
 case Type::UINT16:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::UINT16;
-  break;
+  INTEGER_CASE(UINT16);
 case Type::INT16:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::INT16;
-  break;
+  INTEGER_CASE(INT16);
 case Type::UINT32:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::UINT32;
-  break;
+  INTEGER_CASE(UINT32);
 case Type::INT32:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::INT32;
-  break;
-case Type::INT64:
-  *output_type = col.null_count() > 0 ? PandasBlock::DOUBLE : 
PandasBlock::INT64;
-  break;
+  INTEGER_CASE(INT32);
 

[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-03-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382265#comment-16382265
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-369649945
 
 
   Rebasing this again


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Assignee: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380977#comment-16380977
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-369369328
 
 
   Rebased. Going to wait for the builds to rujn


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Assignee: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380978#comment-16380978
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-369369328
 
 
   Rebased. Going to wait for the builds to run


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Assignee: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378833#comment-16378833
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-368927373
 
 
   yes, plan to review this today


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377105#comment-16377105
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

cpcloud commented on a change in pull request #1650: ARROW-2205: [Python] 
Option for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170646464
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   Grouping by modules (which contain functions) is the solution to that 
particular problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377102#comment-16377102
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

pitrou commented on a change in pull request #1650: ARROW-2205: [Python] Option 
for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170645928
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   I would vote against the pytest-style of a forest of functions. In my 
experience the lack of organization produces difficult to maintain test 
modules. Organizing the test methods into several classes helped me figure out 
which features were tested and how.
   
   An alternative would be to split the tests into several modules (or perhaps 
several modules inside a subpackage).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377094#comment-16377094
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

cpcloud commented on a change in pull request #1650: ARROW-2205: [Python] 
Option for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170642827
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   Sure, but if we are going to do it eventually then we shouldn't knowingly 
add to the debt in the name of consistency.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377083#comment-16377083
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

adshieh commented on a change in pull request #1650: ARROW-2205: [Python] 
Option for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170640785
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   Sure! However, it seems like none of the test methods use `self` and the 
test classes are just for organizational purposes, so moving it to a test 
function would be a deviation?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377082#comment-16377082
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on a change in pull request #1650: ARROW-2205: [Python] Option 
for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170640655
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   We should probably convert this whole module to pytest-style in a separate 
patch


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377068#comment-16377068
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

cpcloud commented on a change in pull request #1650: ARROW-2205: [Python] 
Option for integer object nulls
URL: https://github.com/apache/arrow/pull/1650#discussion_r170637799
 
 

 ##
 File path: python/pyarrow/tests/test_convert_pandas.py
 ##
 @@ -615,6 +615,36 @@ def test_int_object_nulls(self):
 _check_pandas_roundtrip(df, expected=expected,
 expected_schema=schema)
 
+def test_int_object_nulls_option(self):
 
 Review comment:
   It doesn't look like you're using `self` here. Can you make this into a test 
function and 
[`pytest.mark.parametrize`](https://docs.pytest.org/en/latest/parametrize.html) 
it on the `int_dtypes` parameter?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16376976#comment-16376976
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

adshieh commented on issue #1650: ARROW-2205: [Python] Option for integer 
object nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-368526062
 
 
   I personally prefer keyword arguments because the number of calls to 
`to_pandas` in my use cases has been limited, so having the documentation in 
the method and avoiding the extra step of creating an options object has been 
convenient.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375756#comment-16375756
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

pitrou commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-368256301
 
 
   >  I am wondering out loud if there's anything we can do to help with the 
API for a growing number of pandas conversion arguments (like using an options 
object instead of keyword args)
   
   Perhaps the dialect concept used by the [csv 
module](https://docs.python.org/3/library/csv.html) can be re-used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375704#comment-16375704
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

xhochy commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-368247892
 
 
   @wesm I think using `kwargs` seems to be the most pythonic way to do this. 
With Pandas I also wondered in the beginning over the large number of kwargs 
but in the end, it seems like a good-enough solution.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-2205) [Python] Option for integer object nulls

2018-02-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375665#comment-16375665
 ] 

ASF GitHub Bot commented on ARROW-2205:
---

wesm commented on issue #1650: ARROW-2205: [Python] Option for integer object 
nulls
URL: https://github.com/apache/arrow/pull/1650#issuecomment-368241657
 
 
   Thanks for working on this @adshieh! I am wondering out loud if there's 
anything we can do to help with the API for a growing number of pandas 
conversion arguments (like using an options object instead of keyword args)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Python] Option for integer object nulls
> 
>
> Key: ARROW-2205
> URL: https://issues.apache.org/jira/browse/ARROW-2205
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Affects Versions: 0.8.0
>Reporter: Albert Shieh
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case where the loss of precision in casting integers to floats 
> matters, and pandas supports storing integers with nulls without loss of 
> precision in object columns. However, a roundtrip through arrow will cast the 
> object columns to float columns, even though the object columns are stored in 
> arrow as integers with nulls.
> This is a minimal example demonstrating the behavior of a roundtrip:
> {code}
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({"a": np.array([None, 1], dtype=object)})
> df_pa = pa.Table.from_pandas(df).to_pandas()
> print(df)
> print(df_pa)
> {code}
> The output is:
> {code}
>   a
> 0  None
> 1 1
>  a
> 0  NaN
> 1  1.0
> {code}
> This seems to be the desired behavior, given test_int_object_nulls in 
> test_convert_pandas.
> I think it would be useful to add an option in the to_pandas methods to allow 
> integers with nulls to be returned as object columns. The option can default 
> to false in order to preserve the current behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)