Pyspark Window orderBy

mhussain Tue, 16 Oct 2018 02:11:16 -0700

Hi,

I have a dataframe which looks like


+--------+---+------+----+
|group_id| id|  text|type|
+--------+---+------+----+
|       1|  1|   one|   a|
|       1|  1|   two|   t|
|       1|  2| three|   a|
|       1|  2|  four|   t|
|       1|  5|  five|   a|
|       1|  6|   six|   t|
|       1|  7| seven|   a|
|       1|  9| eight|   t|
|       1|  9|  nine|   a|
|       1| 10|   ten|   t|
|       1| 11|eleven|   a|
+--------+---+------+----+
If I do Window operation by partitioning it on group_id and ordering it by
id then will orderby make sure that already ordered(sorted) rows retain the
same order?

e.g.

window_spec = Window.partitionBy(df.group_id).orderBy(df.id)
df = df.withColumn("row_number", row_number().over(window_spec))
Will the result always be as bellow?

+--------+---+------+----+------+
|group_id| id|  text|type|row_number|
+--------+---+------+----+------+
|       1|  1|   one|   a|     1|
|       1|  1|   two|   t|     2|
|       1|  2| three|   a|     3|
|       1|  2|  four|   t|     4|
|       1|  5|  five|   a|     5|
|       1|  6|   six|   t|     6|
|       1|  7| seven|   a|     7|
|       1|  9| eight|   t|     8|
|       1|  9|  nine|   a|     9|
|       1| 10|   ten|   t|    10|
|       1| 11|eleven|   a|    11|
+--------+---+------+----+------+
In the nutshell my question is, how spark Window's orderBy handles already
ordered(sorted) rows? My assumption is it is stable i.e. it doesn't change
the order of already ordered rows but I couldn't find anything related to
this in the documentation. How can I make sure that my assumption is
correct?

I am using python 3.5 and pyspark 2.3.1.

Thanks.
Muddasser



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Pyspark Window orderBy

Reply via email to