[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-25 Thread GitBox


wesm commented on a change in pull request #7537:
URL: https://github.com/apache/arrow/pull/7537#discussion_r445594104



##
File path: cpp/src/arrow/python/helpers.cc
##
@@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) {
   return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj));
 }
 
+namespace {
+
+static std::once_flag pandas_static_initialized;
+static PyTypeObject* pandas_NaTType = nullptr;
+
+void GetPandasStaticSymbols() {
+  OwnedRef pandas;
+  Status s = ImportModule("pandas", );
+  if (!s.ok()) {
+return;
+  }
+
+  OwnedRef nat_value;
+  s = ImportFromModule(pandas.obj(), "NaT", _value);
+  if (!s.ok()) {
+return;
+  }
+  PyObject* nat_type = PyObject_Type(nat_value.obj());
+  pandas_NaTType = reinterpret_cast(nat_type);
+
+  // PyObject_Type returns a new reference but we trust that pandas.NaT will
+  // outlive our use of this PyObject*
+  Py_DECREF(nat_type);
+}

Review comment:
   Sure, I'll add it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-25 Thread GitBox


wesm commented on a change in pull request #7537:
URL: https://github.com/apache/arrow/pull/7537#discussion_r445593841



##
File path: cpp/src/arrow/python/helpers.cc
##
@@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) {
   return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj));
 }
 
+namespace {
+
+static std::once_flag pandas_static_initialized;
+static PyTypeObject* pandas_NaTType = nullptr;
+
+void GetPandasStaticSymbols() {
+  OwnedRef pandas;
+  Status s = ImportModule("pandas", );
+  if (!s.ok()) {
+return;
+  }
+
+  OwnedRef nat_value;
+  s = ImportFromModule(pandas.obj(), "NaT", _value);
+  if (!s.ok()) {
+return;
+  }
+  PyObject* nat_type = PyObject_Type(nat_value.obj());
+  pandas_NaTType = reinterpret_cast(nat_type);
+
+  // PyObject_Type returns a new reference but we trust that pandas.NaT will
+  // outlive our use of this PyObject*
+  Py_DECREF(nat_type);
+}
+
+}  // namespace
+
+void InitPandasStaticData() {
+  std::call_once(pandas_static_initialized, GetPandasStaticSymbols);
+}
+
 bool PandasObjectIsNull(PyObject* obj) {
   if (!MayHaveNaN(obj)) {
 return false;
   }
   if (obj == Py_None) {
 return true;
   }
-  if (PyFloat_IsNaN(obj) ||
+  if (PyFloat_IsNaN(obj) || (pandas_NaTType && PyObject_TypeCheck(obj, 
pandas_NaTType)) ||

Review comment:
   Right unfortunately you can have multiple instances of `pandas.NaT`... 
not great





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-25 Thread GitBox


wesm commented on a change in pull request #7537:
URL: https://github.com/apache/arrow/pull/7537#discussion_r445593243



##
File path: cpp/src/arrow/python/python_to_arrow.cc
##
@@ -1171,6 +1171,12 @@ Status GetConverterFlat(const std::shared_ptr& 
type, bool strict_conve
 
 Status GetConverter(const std::shared_ptr& type, bool from_pandas,
 bool strict_conversions, std::unique_ptr* 
out) {
+  if (from_pandas) {
+// ARROW-842: If pandas is not installed then null checks will be less
+// comprehensive, but that is okay.
+internal::InitPandasStaticData();

Review comment:
   I see. Well, the problem with putting the pandas initialization 
elsewhere is that we've already arranged for pandas to only be imported when 
it's needed, so if we did `import pandas` always when doing `import pyarrow` 
we'd be breaking that contract. I suggest we address this as follow up as needed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True

2020-06-24 Thread GitBox


wesm commented on a change in pull request #7537:
URL: https://github.com/apache/arrow/pull/7537#discussion_r445221159



##
File path: cpp/src/arrow/python/helpers.cc
##
@@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) {
   return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj));
 }
 
+namespace {
+
+static std::once_flag pandas_static_initialized;
+static OwnedRef pandas_NaTType;

Review comment:
   @pitrou this seemed worrisome to me so could use your advice about how 
to import this symbol once and then have it persist. Another option is that I 
could immediately decref the object returned by `PyObject_Type` and put a 
`PyObject*` here with the near certainty that this object will remain alive 
until the end of the Python session





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org