[GitHub] [spark] pralabhkumar commented on a change in pull request #35191: [SPARK-37491][PYTHON]Fix Series.asof for unsorted values

GitBox Thu, 10 Mar 2022 09:21:45 -0800


pralabhkumar commented on a change in pull request #35191:
URL: https://github.com/apache/spark/pull/35191#discussion_r823967986




##########
File path: python/pyspark/pandas/series.py
##########
@@ -5228,9 +5229,20 @@ def asof(self, where: Union[Any, List]) -> Union[Scalar, 
"Series"]:
             where = [where]
         index_scol = self._internal.index_spark_columns[0]
         index_type = self._internal.spark_type_for(index_scol)
+
+        if np.nan in where:
+            # When `where` is np.nan, pandas returns the last index value.
+            last_index = 
self._internal.spark_frame.select(F.last(index_scol)).take(1)[0][0]
+            modified_where = [last_index if x is np.nan else x for x in where]
+        else:
+            modified_where = where
+
         cond = [
-            F.max(F.when(index_scol <= SF.lit(index).cast(index_type), 
self.spark.column))
-            for index in where
+            F.last(

Review comment:
       Hi @ueshin . Thx for the suggestion . However, IMHO F.max_by may not 
work . 
   
   1) Because it will not ignore nulls and also may not work on multiple 
columns .  For .e.g we have input DF like below 
   
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   |CASE WHEN (__index_level_0__ <= CAST(5 AS BIGINT)) THEN Koalas END|CASE 
WHEN (__index_level_0__ <= CAST(20 AS BIGINT)) THEN Koalas 
END|__natural_order__|
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   |                                                              null|         
                                                       2.0|                0|
   |                                                              null|         
                                                       1.0|       8589934592|
   |                                                              null|         
                                                      null|      17179869184|
   |                                                              null|         
                                                      null|      25769803776|
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   
   What we need is the last not null value and if the column have all null then 
the null value . So the expected output is (which is what F.last is providing) 
. 
   
   
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   |CASE WHEN (__index_level_0__ <= CAST(5 AS BIGINT)) THEN Koalas END|CASE 
WHEN (__index_level_0__ <= CAST(20 AS BIGINT)) THEN Koalas 
END|__natural_order__|
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   |                                                              null|         
                                                       1.0|       8589934592|
   
+------------------------------------------------------------------+-------------------------------------------------------------------+-----------------+
   
   Please let me know if I have misunderstood . In my earlier commits , I was 
able to achieve the same with F.explode 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] pralabhkumar commented on a change in pull request #35191: [SPARK-37491][PYTHON]Fix Series.asof for unsorted values

Reply via email to