ZygD created SPARK-38614: ---------------------------- Summary: df.show(3) does not equal df.show() first rows Key: SPARK-38614 URL: https://issues.apache.org/jira/browse/SPARK-38614 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.1 Reporter: ZygD
*Minimal reproducible example* ```python from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) ``` *Expected result* ```none +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows ``` *Actual result* ```none +---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org