Hi, I have a dataframe which looks like
+--------+---+------+----+ |group_id| id| text|type| +--------+---+------+----+ | 1| 1| one| a| | 1| 1| two| t| | 1| 2| three| a| | 1| 2| four| t| | 1| 5| five| a| | 1| 6| six| t| | 1| 7| seven| a| | 1| 9| eight| t| | 1| 9| nine| a| | 1| 10| ten| t| | 1| 11|eleven| a| +--------+---+------+----+ If I do Window operation by partitioning it on group_id and ordering it by id then will orderby make sure that already ordered(sorted) rows retain the same order? e.g. window_spec = Window.partitionBy(df.group_id).orderBy(df.id) df = df.withColumn("row_number", row_number().over(window_spec)) Will the result always be as bellow? +--------+---+------+----+------+ |group_id| id| text|type|row_number| +--------+---+------+----+------+ | 1| 1| one| a| 1| | 1| 1| two| t| 2| | 1| 2| three| a| 3| | 1| 2| four| t| 4| | 1| 5| five| a| 5| | 1| 6| six| t| 6| | 1| 7| seven| a| 7| | 1| 9| eight| t| 8| | 1| 9| nine| a| 9| | 1| 10| ten| t| 10| | 1| 11|eleven| a| 11| +--------+---+------+----+------+ In the nutshell my question is, how spark Window's orderBy handles already ordered(sorted) rows? My assumption is it is stable i.e. it doesn't change the order of already ordered rows but I couldn't find anything related to this in the documentation. How can I make sure that my assumption is correct? I am using python 3.5 and pyspark 2.3.1. Thanks. Muddasser -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org