[jira] [Commented] (SPARK-24042) High-order function: zip_with_index

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770659#comment-17770659
 ] 

ZygD commented on SPARK-24042:
--

[~Tagar]  has incorrectly linked another issue to this one. Even this is 
resolved, the other one is not. Can we "unlink" the issue SPARK-23074?

> High-order function: zip_with_index
> ---
>
> Key: SPARK-24042
> URL: https://issues.apache.org/jira/browse/SPARK-24042
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Marek Novotny
>Priority: Major
>
> Implement function {{zip_with_index(array[, indexFirst])}} that transforms 
> the input array by encapsulating elements into pairs with indexes indicating 
> the order.
> Examples:
> {{zip_with_index(array("d", "a", null, "b")) => 
> [("d",0),("a",1),(null,2),("b",3)]}}
> {{zip_with_index(array("d", "a", null, "b"), true) => 
> [(0,"d"),(1,"a"),(2,null),(3,"b")]}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:58 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about 
arrays, while this is not. 


was (Author: JIRAUSER286869):
[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about 
arrays, while this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:57 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. [The linked 
closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about 
arrays, while this is not. 


was (Author: JIRAUSER286869):
[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. The linked closed 
issue is about arrays, while this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:54 AM:
---

[~gurwls223] [~Tagar] 
The problem is {*}not solved{*}! This was incorrectly closed. The linked closed 
issue is about arrays, while this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. [The linked closed 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM:
---

The problem is not solved! This was incorrectly closed. The linked closed issue 
is about arrays, and this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. [The linked 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM:
---

The problem is not solved! This was incorrectly closed. [The linked closed 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 


was (Author: JIRAUSER286869):
The problem is not solved! This was incorrectly closed. The linked closed issue 
is about arrays, and this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23074) Dataframe-ified zipwithindex

2023-09-30 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658
 ] 

ZygD commented on SPARK-23074:
--

The problem is not solved! This was incorrectly closed. [The linked 
issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and 
this is not. 

> Dataframe-ified zipwithindex
> 
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed, dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
>   df: DataFrame,
>   offset: Int = 1,
>   colName: String = "id",
>   inFront: Boolean = true
> ) : DataFrame = {
>   df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
>   Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
>   ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
>   )
> ),
> StructType(
>   (if (inFront) Array(StructField(colName,LongType,false)) else 
> Array[StructField]()) 
> ++ df.schema.fields ++ 
>   (if (inFront) Array[StructField]() else 
> Array(StructField(colName,LongType,false)))
> )
>   ) 
> }
> {code}
> credits: 
> [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

2022-06-16 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Description: 
Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0.

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}


> After Spark update, df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0.
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

2022-06-16 Thread ZygD (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555124#comment-17555124
 ] 

ZygD commented on SPARK-38614:
--

Just checked with the new 3.3.0 release. The error still persists.

> After Spark update, df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0.
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

2022-06-16 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Affects Version/s: 3.3.0

> After Spark update, df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

2022-04-01 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Component/s: SQL

> After Spark update, df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.2.0, 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

2022-03-24 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Summary: After Spark update, df.show() shows incorrect F.percent_rank 
results  (was: df.show() shows incorrect F.percent_rank results)

> After Spark update, df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0, 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Description: 
Expected result is obtained using Spark 3.1.1, but not 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
Expected result is obtained using Spark 3.0.2, but not 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}


> df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.1, but not 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Description: 
Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
Expected result is obtained using Spark 3.1.1, but not 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}


> df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0, 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Affects Version/s: 3.2.0

> df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.0, 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.1.1, but not 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Labels: correctness  (was: )

> df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: ZygD
>Priority: Major
>  Labels: correctness
>
> Expected result is obtained using Spark 3.0.2, but not 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Summary: df.show() shows incorrect F.percent_rank results  (was: df.show(3) 
does not equal df.show() first rows)

> df.show() shows incorrect F.percent_rank results
> 
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: ZygD
>Priority: Major
>
> Expected result is obtained using Spark 3.0.2, but not 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Description: 
Expected result is obtained using Spark 3.0.2, but not 3.2.1

*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}


> df.show(3) does not equal df.show() first rows
> --
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: ZygD
>Priority: Major
>
> Expected result is obtained using Spark 3.0.2, but not 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows

2022-03-21 Thread ZygD (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZygD updated SPARK-38614:
-
Description: 
*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---++
only showing top 3 rows

+---++
| id|  pr|
+---++
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---++
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+--+
| id|pr|
+---+--+
|  0|   0.0|
|  1|0.|
|  2|0.|
+---+--+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
*Minimal reproducible example*

```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```

*Expected result*

```none

+---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only 
showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 
2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows

```

*Actual result*

```none

+---+--+ | id| pr| +---+--+ | 0| 0.0| | 
1|0.| | 2|0.| +---+--+ only 
showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 
3|0.6| | 4|0.8| +---+---+ only showing top 5 rows

```


> df.show(3) does not equal df.show() first rows
> --
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: ZygD
>Priority: Major
>
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---++
> only showing top 3 rows
> +---++
> | id|  pr|
> +---++
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---++
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+--+
> | id|pr|
> +---+--+
> |  0|   0.0|
> |  1|0.|
> |  2|0.|
> +---+--+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38614) df.show(3) does not equal df.show() first rows

2022-03-21 Thread ZygD (Jira)
ZygD created SPARK-38614:


 Summary: df.show(3) does not equal df.show() first rows
 Key: SPARK-38614
 URL: https://issues.apache.org/jira/browse/SPARK-38614
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.2.1
Reporter: ZygD


*Minimal reproducible example*

```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```

*Expected result*

```none

+---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only 
showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 
2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows

```

*Actual result*

```none

+---+--+ | id| pr| +---+--+ | 0| 0.0| | 
1|0.| | 2|0.| +---+--+ only 
showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 
3|0.6| | 4|0.8| +---+---+ only showing top 5 rows

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org