[jira] [Commented] (SPARK-24042) High-order function: zip_with_index
[ https://issues.apache.org/jira/browse/SPARK-24042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770659#comment-17770659 ] ZygD commented on SPARK-24042: -- [~Tagar] has incorrectly linked another issue to this one. Even this is resolved, the other one is not. Can we "unlink" the issue SPARK-23074? > High-order function: zip_with_index > --- > > Key: SPARK-24042 > URL: https://issues.apache.org/jira/browse/SPARK-24042 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Marek Novotny >Priority: Major > > Implement function {{zip_with_index(array[, indexFirst])}} that transforms > the input array by encapsulating elements into pairs with indexes indicating > the order. > Examples: > {{zip_with_index(array("d", "a", null, "b")) => > [("d",0),("a",1),(null,2),("b",3)]}} > {{zip_with_index(array("d", "a", null, "b"), true) => > [(0,"d"),(1,"a"),(2,null),(3,"b")]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:58 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, while this is not. was (Author: JIRAUSER286869): [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about arrays, while this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:57 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. [The linked closed|https://issues.apache.org/jira/browse/SPARK-24042] issue is about arrays, while this is not. was (Author: JIRAUSER286869): [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. The linked closed issue is about arrays, while this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:54 AM: --- [~gurwls223] [~Tagar] The problem is {*}not solved{*}! This was incorrectly closed. The linked closed issue is about arrays, while this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM: --- The problem is not solved! This was incorrectly closed. The linked closed issue is about arrays, and this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. [The linked issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD edited comment on SPARK-23074 at 9/30/23 7:51 AM: --- The problem is not solved! This was incorrectly closed. [The linked closed issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. was (Author: JIRAUSER286869): The problem is not solved! This was incorrectly closed. The linked closed issue is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17770658#comment-17770658 ] ZygD commented on SPARK-23074: -- The problem is not solved! This was incorrectly closed. [The linked issue|https://issues.apache.org/jira/browse/SPARK-24042] is about arrays, and this is not. > Dataframe-ified zipwithindex > > > Key: SPARK-23074 > URL: https://issues.apache.org/jira/browse/SPARK-23074 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed, dataframe, rdd > > Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex(): > {code:java} > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types.{LongType, StructField, StructType} > import org.apache.spark.sql.Row > def dfZipWithIndex( > df: DataFrame, > offset: Int = 1, > colName: String = "id", > inFront: Boolean = true > ) : DataFrame = { > df.sqlContext.createDataFrame( > df.rdd.zipWithIndex.map(ln => > Row.fromSeq( > (if (inFront) Seq(ln._2 + offset) else Seq()) > ++ ln._1.toSeq ++ > (if (inFront) Seq() else Seq(ln._2 + offset)) > ) > ), > StructType( > (if (inFront) Array(StructField(colName,LongType,false)) else > Array[StructField]()) > ++ df.schema.fields ++ > (if (inFront) Array[StructField]() else > Array(StructField(colName,LongType,false))) > ) > ) > } > {code} > credits: > [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Description: Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0. *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0. > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17555124#comment-17555124 ] ZygD commented on SPARK-38614: -- Just checked with the new 3.3.0 release. The error still persists. > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0, 3.2.1 or 3.3.0. > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Affects Version/s: 3.3.0 > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Component/s: SQL > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.2.0, 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Summary: After Spark update, df.show() shows incorrect F.percent_rank results (was: df.show() shows incorrect F.percent_rank results) > After Spark update, df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Description: Expected result is obtained using Spark 3.1.1, but not 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: Expected result is obtained using Spark 3.0.2, but not 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} > df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.1, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Description: Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: Expected result is obtained using Spark 3.1.1, but not 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} > df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Affects Version/s: 3.2.0 > df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0, 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.1.1, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Labels: correctness (was: ) > df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: ZygD >Priority: Major > Labels: correctness > > Expected result is obtained using Spark 3.0.2, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Summary: df.show() shows incorrect F.percent_rank results (was: df.show(3) does not equal df.show() first rows) > df.show() shows incorrect F.percent_rank results > > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: ZygD >Priority: Major > > Expected result is obtained using Spark 3.0.2, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Description: Expected result is obtained using Spark 3.0.2, but not 3.2.1 *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} > df.show(3) does not equal df.show() first rows > -- > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: ZygD >Priority: Major > > Expected result is obtained using Spark 3.0.2, but not 3.2.1 > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZygD updated SPARK-38614: - Description: *Minimal reproducible example* {code:java} from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) {code} *Expected result* {code:java} +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows{code} *Actual result* {code:java} +---+--+ | id|pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows{code} was: *Minimal reproducible example* ```python from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) ``` *Expected result* ```none +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows ``` *Actual result* ```none +---+--+ | id| pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows ``` > df.show(3) does not equal df.show() first rows > -- > > Key: SPARK-38614 > URL: https://issues.apache.org/jira/browse/SPARK-38614 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: ZygD >Priority: Major > > *Minimal reproducible example* > {code:java} > from pyspark.sql import SparkSession, functions as F, Window as W > spark = SparkSession.builder.getOrCreate() > > df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) > df.show(3) > df.show(5) {code} > *Expected result* > {code:java} > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > +---++ > only showing top 3 rows > +---++ > | id| pr| > +---++ > | 0| 0.0| > | 1|0.01| > | 2|0.02| > | 3|0.03| > | 4|0.04| > +---++ > only showing top 5 rows{code} > *Actual result* > {code:java} > +---+--+ > | id|pr| > +---+--+ > | 0| 0.0| > | 1|0.| > | 2|0.| > +---+--+ > only showing top 3 rows > +---+---+ > | id| pr| > +---+---+ > | 0|0.0| > | 1|0.2| > | 2|0.4| > | 3|0.6| > | 4|0.8| > +---+---+ > only showing top 5 rows{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38614) df.show(3) does not equal df.show() first rows
ZygD created SPARK-38614: Summary: df.show(3) does not equal df.show() first rows Key: SPARK-38614 URL: https://issues.apache.org/jira/browse/SPARK-38614 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.1 Reporter: ZygD *Minimal reproducible example* ```python from pyspark.sql import SparkSession, functions as F, Window as W spark = SparkSession.builder.getOrCreate() df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id'))) df.show(3) df.show(5) ``` *Expected result* ```none +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| +---++ only showing top 3 rows +---++ | id| pr| +---++ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---++ only showing top 5 rows ``` *Actual result* ```none +---+--+ | id| pr| +---+--+ | 0| 0.0| | 1|0.| | 2|0.| +---+--+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows ``` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org