[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
wesm commented on a change in pull request #7537: URL: https://github.com/apache/arrow/pull/7537#discussion_r445594104 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) { return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj)); } +namespace { + +static std::once_flag pandas_static_initialized; +static PyTypeObject* pandas_NaTType = nullptr; + +void GetPandasStaticSymbols() { + OwnedRef pandas; + Status s = ImportModule("pandas", ); + if (!s.ok()) { +return; + } + + OwnedRef nat_value; + s = ImportFromModule(pandas.obj(), "NaT", _value); + if (!s.ok()) { +return; + } + PyObject* nat_type = PyObject_Type(nat_value.obj()); + pandas_NaTType = reinterpret_cast(nat_type); + + // PyObject_Type returns a new reference but we trust that pandas.NaT will + // outlive our use of this PyObject* + Py_DECREF(nat_type); +} Review comment: Sure, I'll add it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
wesm commented on a change in pull request #7537: URL: https://github.com/apache/arrow/pull/7537#discussion_r445593841 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) { return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj)); } +namespace { + +static std::once_flag pandas_static_initialized; +static PyTypeObject* pandas_NaTType = nullptr; + +void GetPandasStaticSymbols() { + OwnedRef pandas; + Status s = ImportModule("pandas", ); + if (!s.ok()) { +return; + } + + OwnedRef nat_value; + s = ImportFromModule(pandas.obj(), "NaT", _value); + if (!s.ok()) { +return; + } + PyObject* nat_type = PyObject_Type(nat_value.obj()); + pandas_NaTType = reinterpret_cast(nat_type); + + // PyObject_Type returns a new reference but we trust that pandas.NaT will + // outlive our use of this PyObject* + Py_DECREF(nat_type); +} + +} // namespace + +void InitPandasStaticData() { + std::call_once(pandas_static_initialized, GetPandasStaticSymbols); +} + bool PandasObjectIsNull(PyObject* obj) { if (!MayHaveNaN(obj)) { return false; } if (obj == Py_None) { return true; } - if (PyFloat_IsNaN(obj) || + if (PyFloat_IsNaN(obj) || (pandas_NaTType && PyObject_TypeCheck(obj, pandas_NaTType)) || Review comment: Right unfortunately you can have multiple instances of `pandas.NaT`... not great This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
wesm commented on a change in pull request #7537: URL: https://github.com/apache/arrow/pull/7537#discussion_r445593243 ## File path: cpp/src/arrow/python/python_to_arrow.cc ## @@ -1171,6 +1171,12 @@ Status GetConverterFlat(const std::shared_ptr& type, bool strict_conve Status GetConverter(const std::shared_ptr& type, bool from_pandas, bool strict_conversions, std::unique_ptr* out) { + if (from_pandas) { +// ARROW-842: If pandas is not installed then null checks will be less +// comprehensive, but that is okay. +internal::InitPandasStaticData(); Review comment: I see. Well, the problem with putting the pandas initialization elsewhere is that we've already arranged for pandas to only be imported when it's needed, so if we did `import pandas` always when doing `import pyarrow` we'd be breaking that contract. I suggest we address this as follow up as needed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on a change in pull request #7537: ARROW-842: [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
wesm commented on a change in pull request #7537: URL: https://github.com/apache/arrow/pull/7537#discussion_r445221159 ## File path: cpp/src/arrow/python/helpers.cc ## @@ -254,14 +255,45 @@ bool PyFloat_IsNaN(PyObject* obj) { return PyFloat_Check(obj) && std::isnan(PyFloat_AsDouble(obj)); } +namespace { + +static std::once_flag pandas_static_initialized; +static OwnedRef pandas_NaTType; Review comment: @pitrou this seemed worrisome to me so could use your advice about how to import this symbol once and then have it persist. Another option is that I could immediately decref the object returned by `PyObject_Type` and put a `PyObject*` here with the near certainty that this object will remain alive until the end of the Python session This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org