bzhaoopenstack commented on code in PR #37235:
URL: https://github.com/apache/spark/pull/37235#discussion_r926222847
##########
python/pyspark/pandas/tests/indexes/test_base.py:
##########
@@ -2511,6 +2511,36 @@ def test_drop_level(self):
):
psmidx.droplevel(-3)
+ def test_where_putmask(self):
+ pidx = pd.Index([1, 2, 3, 4])
+ psidx = ps.from_pandas(pidx)
+
+ # where and putmask with default inserted value np.nan
+ self.assert_eq(
+ pidx.where(pidx > 2),
+ psidx.where(psidx > 2)
+ )
+ self.assert_eq(
+ pidx.putmask(pidx > 2, 99),
+ psidx.putmask(psidx > 2, 99)
+ )
+
+ # where and putmask with isin func
+ self.assert_eq(
+ pidx.where(pidx.isin([1, 2])),
+ psidx.where(psidx.isin([1, 2]))
+ )
+ self.assert_eq(
+ pidx.putmask(pidx.isin([1, 2]), 99),
+ psidx.putmask(psidx.isin([1, 2]), 99)
+ )
Review Comment:
OK. I will add into UTs.
```
>>> (pidx + 1).putmask((pidx+1).isin([1,2]), 99)
Int64Index([99, 3, 4, 5], dtype='int64')
>>> (psidx + 1).putmask((psidx + 1).isin([1, 2]), 99)
/home/spark/spark/python/pyspark/pandas/utils.py:976:
PandasAPIOnSparkAdviceWarning: `to_list` loads all data into the driver's
memory. It should only be used if the resulting list is expected to be small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/home/spark/spark/python/pyspark/pandas/indexes/base.py:636: UserWarning: We
recommend using `Int64Index.to_numpy()` instead.
warnings.warn("We recommend using `{}.to_numpy()`
instead.".format(type(self).__name__))
/home/spark/spark/python/pyspark/pandas/utils.py:976:
PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's
memory. It should only be used if the resulting NumPy ndarray is expected to be
small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/home/spark/spark/python/pyspark/pandas/indexes/base.py:636: UserWarning: We
recommend using `Int64Index.to_numpy()` instead.
warnings.warn("We recommend using `{}.to_numpy()`
instead.".format(type(self).__name__))
/home/spark/spark/python/pyspark/pandas/utils.py:976:
PandasAPIOnSparkAdviceWarning: `to_numpy` loads all data into the driver's
memory. It should only be used if the resulting NumPy ndarray is expected to be
small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
/home/spark/spark/python/pyspark/pandas/internal.py:1573: FutureWarning:
iteritems is deprecated and will be removed in a future version. Use .items
instead.
fields = [
/home/spark/spark/python/pyspark/sql/pandas/conversion.py:486:
FutureWarning: iteritems is deprecated and will be removed in a future version.
Use .items instead.
for column, series in pdf.iteritems():
Int64Index([99, 3, 4, 5], dtype='int64')
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]