[GitHub] [spark] ueshin commented on a change in pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

GitBox Fri, 18 Jun 2021 14:20:26 -0700


ueshin commented on a change in pull request #32821:
URL: https://github.com/apache/spark/pull/32821#discussion_r654013813




##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -300,6 +303,13 @@ def prepare(self, col: pd.Series) -> pd.Series:
         """Prepare column when from_pandas."""
         return col.replace({np.nan: None})
 
+    def isnull(self, index_ops: Union["Index", "Series"]) -> Union["Series", 
"Index"]:
+        from pyspark.pandas.indexes import MultiIndex
+
+        if isinstance(index_ops, MultiIndex):
+            raise NotImplementedError("isna is not defined for MultiIndex")

Review comment:
       Shall we move this back to `base.py`?
   I feel checking whether it's `MultiIndex` or not weird because the layer is 
a different between dtype and spark_type vs. MultiIndex.

##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -300,6 +303,13 @@ def prepare(self, col: pd.Series) -> pd.Series:
         """Prepare column when from_pandas."""
         return col.replace({np.nan: None})
 
+    def isnull(self, index_ops: Union["Index", "Series"]) -> Union["Series", 
"Index"]:
+        from pyspark.pandas.indexes import MultiIndex
+
+        if isinstance(index_ops, MultiIndex):
+            raise NotImplementedError("isna is not defined for MultiIndex")
+        return index_ops._with_new_scol(index_ops.spark.column.isNull())

Review comment:
       Could you also take care of `field`?
   `field` must contains `dataType=BooleanType(), nullable=False`. e.g.,
   
   ```py
   return index_ops._with_new_scol(
       index_ops.spark.column.isNull(),
       field=index_ops._internal.data_fields[0].copy(spark_type=BooleanType(), 
nullable=False)
   )
   ```

##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -300,6 +303,13 @@ def prepare(self, col: pd.Series) -> pd.Series:
         """Prepare column when from_pandas."""
         return col.replace({np.nan: None})
 
+    def isnull(self, index_ops: Union["Index", "Series"]) -> Union["Series", 
"Index"]:
+        from pyspark.pandas.indexes import MultiIndex
+
+        if isinstance(index_ops, MultiIndex):
+            raise NotImplementedError("isna is not defined for MultiIndex")

Review comment:
       Shall we move this back to `base.py`?
   I feel checking whether it's `MultiIndex` or not here is weird because the 
layer is a different between dtype and spark_type vs. MultiIndex.

##########
File path: python/pyspark/pandas/data_type_ops/base.py
##########
@@ -300,6 +303,13 @@ def prepare(self, col: pd.Series) -> pd.Series:
         """Prepare column when from_pandas."""
         return col.replace({np.nan: None})
 
+    def isnull(self, index_ops: Union["Index", "Series"]) -> Union["Series", 
"Index"]:
+        from pyspark.pandas.indexes import MultiIndex
+
+        if isinstance(index_ops, MultiIndex):
+            raise NotImplementedError("isna is not defined for MultiIndex")
+        return index_ops._with_new_scol(index_ops.spark.column.isNull())

Review comment:
       Ah, I see. I'll take a look later. Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ueshin commented on a change in pull request #32821: [SPARK-35342][PYTHON] Introduce DecimalOps and make `isnull` method data-type-based

Reply via email to