ZygD created SPARK-38614:
----------------------------

             Summary: df.show(3) does not equal df.show() first rows
                 Key: SPARK-38614
                 URL: https://issues.apache.org/jira/browse/SPARK-38614
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.2.1
            Reporter: ZygD


*Minimal reproducible example*

```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```

*Expected result*

```none

+---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only 
showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 
2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows

```

*Actual result*

```none

+---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 
1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only 
showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 
3|0.6| | 4|0.8| +---+---+ only showing top 5 rows

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to