[GitHub] [spark] bzhaoopenstack commented on a diff in pull request #37235: [SPARK-39824][PYTHON][PS] Introduce index where and putmask func in pyspark

GitBox Wed, 20 Jul 2022 20:36:34 -0700


bzhaoopenstack commented on code in PR #37235:
URL: https://github.com/apache/spark/pull/37235#discussion_r926222847



##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -2511,6 +2511,36 @@ def test_drop_level(self):
         ):
             psmidx.droplevel(-3)
 
+    def test_where_putmask(self):
+        pidx = pd.Index([1, 2, 3, 4])
+        psidx = ps.from_pandas(pidx)
+    
+        # where and putmask with default inserted value np.nan
+        self.assert_eq(
+            pidx.where(pidx > 2),
+            psidx.where(psidx > 2)
+        )
+        self.assert_eq(
+            pidx.putmask(pidx > 2, 99),
+            psidx.putmask(psidx > 2, 99)
+        )
+    
+        # where and putmask with isin func
+        self.assert_eq(
+            pidx.where(pidx.isin([1, 2])),
+            psidx.where(psidx.isin([1, 2]))
+        )
+        self.assert_eq(
+            pidx.putmask(pidx.isin([1, 2]), 99),
+            psidx.putmask(psidx.isin([1, 2]), 99)
+        )

Review Comment:
   OK.  I will add into UTs.
   
   ```
   >>> (pidx + 1).putmask((pidx+1).isin([1,2]), 99)
   Int64Index([99, 3, 4, 5], dtype='int64')
   >>> (psidx + 1).putmask((psidx + 1).isin([1, 2]), 99)
   /home/spark/spark/python/pyspark/pandas/utils.py:976: 
PandasAPIOnSparkAdviceWarning: `to_list` loads all data into the driver's 
memory. It should only be used if the resulting list is expected to be small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /home/spark/spark/python/pyspark/pandas/indexes/base.py:636: UserWarning: We 
recommend using `Int64Index.to_numpy()` instead.
     warnings.warn("We recommend using `{}.to_numpy()` 
instead.".format(type(self).__name__))
   /home/spark/spark/python/pyspark/pandas/utils.py:976: 
PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's 
memory. It should only be used if the resulting NumPy ndarray is expected to be 
small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /home/spark/spark/python/pyspark/pandas/indexes/base.py:636: UserWarning: We 
recommend using `Int64Index.to_numpy()` instead.
     warnings.warn("We recommend using `{}.to_numpy()` 
instead.".format(type(self).__name__))
   /home/spark/spark/python/pyspark/pandas/utils.py:976: 
PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's 
memory. It should only be used if the resulting NumPy ndarray is expected to be 
small.
     warnings.warn(message, PandasAPIOnSparkAdviceWarning)
   /home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning: 
iteritems is deprecated and will be removed in a future version. Use .items 
instead.
     fields = [
   /home/spark/spark/python/pyspark/sql/pandas/conversion.py:486: 
FutureWarning: iteritems is deprecated and will be removed in a future version. 
Use .items instead.
     for column, series in pdf.iteritems():
   Int64Index([99, 3, 4, 5], dtype='int64')
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] bzhaoopenstack commented on a diff in pull request #37235: [SPARK-39824][PYTHON][PS] Introduce index where and putmask func in pyspark

Reply via email to