pralabhkumar commented on a change in pull request #35191:
URL: https://github.com/apache/spark/pull/35191#discussion_r823967986
##########
File path: python/pyspark/pandas/series.py
##########
@@ -5228,9 +5229,20 @@ def asof(self, where: Union[Any, List]) -> Union[Scalar,
"Series"]:
where = [where]
index_scol = self._internal.index_spark_columns[0]
index_type = self._internal.spark_type_for(index_scol)
+
+ if np.nan in where:
+ # When `where` is np.nan, pandas returns the last index value.
+ last_index =
self._internal.spark_frame.select(F.last(index_scol)).take(1)[0][0]
+ modified_where = [last_index if x is np.nan else x for x in where]
+ else:
+ modified_where = where
+
cond = [
- F.max(F.when(index_scol <= SF.lit(index).cast(index_type),
self.spark.column))
- for index in where
+ F.last(
Review comment:
Hi @ueshin . Thx for the suggestion . However, IMHO F.max_by may not
work .
1) Because it will not ignore nulls and also may not work on multiple
columns . For .e.g we have input DF like below
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
|CASE WHEN (__index_level_0__ <= CAST(5 AS BIGINT)) THEN Koalas END|CASE
WHEN (__index_level_0__ <= CAST(20 AS BIGINT)) THEN Koalas
END|__natural_order__|
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
| null|
2.0| 0|
| null|
1.0| 8589934592|
| null|
null| 17179869184|
| null|
null| 25769803776|
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
What we need is the last not null value and if the column have all null then
the null value . So the expected output is (which is what F.last is providing)
.
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
|CASE WHEN (__index_level_0__ <= CAST(5 AS BIGINT)) THEN Koalas END|CASE
WHEN (__index_level_0__ <= CAST(20 AS BIGINT)) THEN Koalas
END|__natural_order__|
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
| null|
1.0| 8589934592|
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
Please let me know if I have misunderstood . In my earlier commits , I was
able to achieve the same with F.explode
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]