[jira] [Commented] (SPARK-40472) Improve pyspark.sql.function example experience
[ https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606879#comment-17606879 ] deshanxiao commented on SPARK-40472: [~hyukjin.kwon] OK, thanks~ > Improve pyspark.sql.function example experience > --- > > Key: SPARK-40472 > URL: https://issues.apache.org/jira/browse/SPARK-40472 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > > There are many exanple in pyspark.sql.function: > {code:java} > Examples > > >>> df = spark.range(1) > >>> df.select(lit(5).alias('height'), df.id).show() > +--+---+ > |height| id| > +--+---+ > | 5| 0| > +--+---+ {code} > We can add import statements so that the user can directly run it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40472) Improve pyspark.sql.function example experience
[ https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao resolved SPARK-40472. Resolution: Fixed > Improve pyspark.sql.function example experience > --- > > Key: SPARK-40472 > URL: https://issues.apache.org/jira/browse/SPARK-40472 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > > There are many exanple in pyspark.sql.function: > {code:java} > Examples > > >>> df = spark.range(1) > >>> df.select(lit(5).alias('height'), df.id).show() > +--+---+ > |height| id| > +--+---+ > | 5| 0| > +--+---+ {code} > We can add import statements so that the user can directly run it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40472) Improve pyspark.sql.function example experience
[ https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40472: --- Description: There are many exanple in pyspark.sql.function: {code:java} Examples >>> df = spark.range(1) >>> df.select(lit(5).alias('height'), df.id).show() +--+---+ |height| id| +--+---+ | 5| 0| +--+---+ {code} We can add import statements so that the user can directly run it. > Improve pyspark.sql.function example experience > --- > > Key: SPARK-40472 > URL: https://issues.apache.org/jira/browse/SPARK-40472 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > > There are many exanple in pyspark.sql.function: > {code:java} > Examples > > >>> df = spark.range(1) > >>> df.select(lit(5).alias('height'), df.id).show() > +--+---+ > |height| id| > +--+---+ > | 5| 0| > +--+---+ {code} > We can add import statements so that the user can directly run it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40472) Improve pyspark.sql.function example experience
deshanxiao created SPARK-40472: -- Summary: Improve pyspark.sql.function example experience Key: SPARK-40472 URL: https://issues.apache.org/jira/browse/SPARK-40472 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40192) Remove redundant groupby
deshanxiao created SPARK-40192: -- Summary: Remove redundant groupby Key: SPARK-40192 URL: https://issues.apache.org/jira/browse/SPARK-40192 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-40103) Support read/write.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580623#comment-17580623 ] deshanxiao edited comment on SPARK-40103 at 8/17/22 7:23 AM: - Yes read.csv, read.csv2 have been used in R utils packages. was (Author: deshanxiao): Yes read.csv, read.csv2 have benn used in R utils packages. > Support read/write.csv() in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, almost languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40103) Support read/write.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580623#comment-17580623 ] deshanxiao commented on SPARK-40103: Yes read.csv, read.csv2 have benn used in R utils packages. > Support read/write.csv() in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, almost languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40103: --- Description: Today, almost languages support the DataFrameReader.csv API, only R is missing. we need to use df.read() to read the csv file. We need a more high-level api for it. Java: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] Scala: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] Python: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] was: Today, all major languages support the DataFrameReader.csv API, only R is missing. we need to use df.read() to read the csv file. We need a more high-level api for it. Java: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] Scala: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] Python: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] > Support read/write.csv() in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, almost languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40103: --- Issue Type: New Feature (was: Improvement) > Support read/write.csv() in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, almost languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40103) Support read.csv in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40103: --- Description: Today, all major languages support the DataFrameReader.csv API, only R is missing. we need to use df.read() to read the csv file. We need a more high-level api for it. Java: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] Scala: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] Python: [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] > Support read.csv in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, all major languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40103) Support read.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40103: --- Summary: Support read.csv() in SparkR (was: Support read.csv in SparkR) > Support read.csv() in SparkR > > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, all major languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR
[ https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-40103: --- Summary: Support read/write.csv() in SparkR (was: Support read.csv() in SparkR) > Support read/write.csv() in SparkR > -- > > Key: SPARK-40103 > URL: https://issues.apache.org/jira/browse/SPARK-40103 > Project: Spark > Issue Type: Improvement > Components: SparkR >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > Today, all major languages support the DataFrameReader.csv API, only R is > missing. we need to use df.read() to read the csv file. We need a more > high-level api for it. > Java: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html] > Scala: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame] > Python: > [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40103) Support read.csv in SparkR
deshanxiao created SPARK-40103: -- Summary: Support read.csv in SparkR Key: SPARK-40103 URL: https://issues.apache.org/jira/browse/SPARK-40103 Project: Spark Issue Type: Improvement Components: SparkR Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39934) takeRDD in R is slow
[ https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580183#comment-17580183 ] deshanxiao commented on SPARK-39934: [~hyukjin.kwon] I have confirmed the code below, the takeRDD method of RDD.R will only be used in the test. It doesn't affect the actual running code. Thank you~ > takeRDD in R is slow > > > Key: SPARK-39934 > URL: https://issues.apache.org/jira/browse/SPARK-39934 > Project: Spark > Issue Type: Improvement > Components: R >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > The api of SparkR:::takeRDD retrieves the result one partition per round. We > can re-implement it according to the current scala code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39934) takeRDD in R is slow
[ https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575221#comment-17575221 ] deshanxiao commented on SPARK-39934: [~hyukjin.kwon] Hi, Maybe there is something wrong with my presentation. I mean that *take * has performance problems because it only takes one partition at a time, even if take is not exposed to the user. > takeRDD in R is slow > > > Key: SPARK-39934 > URL: https://issues.apache.org/jira/browse/SPARK-39934 > Project: Spark > Issue Type: Improvement > Components: R >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > The api of SparkR:::takeRDD retrieves the result one partition per round. We > can re-implement it according to the current scala code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39934) takeRDD in R is slow
[ https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-39934: --- Description: The api of SparkR:::takeRDD retrieves the result one partition per round. We can re-implement it according to the current scala code. > takeRDD in R is slow > > > Key: SPARK-39934 > URL: https://issues.apache.org/jira/browse/SPARK-39934 > Project: Spark > Issue Type: Improvement > Components: R >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Major > > The api of SparkR:::takeRDD retrieves the result one partition per round. We > can re-implement it according to the current scala code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39934) takeRDD in R is slow
deshanxiao created SPARK-39934: -- Summary: takeRDD in R is slow Key: SPARK-39934 URL: https://issues.apache.org/jira/browse/SPARK-39934 Project: Spark Issue Type: Improvement Components: R Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39916) Merge SchemaUtils from mlib to SQL
[ https://issues.apache.org/jira/browse/SPARK-39916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-39916: --- Description: Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the SchemaUtils of mllib left a TODO tag to merge to SQL. Let's do this! (was: Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the SchemaUtils of mllib left a TODO tag. Let's do this!) > Merge SchemaUtils from mlib to SQL > -- > > Key: SPARK-39916 > URL: https://issues.apache.org/jira/browse/SPARK-39916 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > > Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the > SchemaUtils of mllib left a TODO tag to merge to SQL. Let's do this! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39916) Merge SchemaUtils from mlib to SQL
[ https://issues.apache.org/jira/browse/SPARK-39916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-39916: --- Description: Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the SchemaUtils of mllib left a TODO tag. Let's do this! > Merge SchemaUtils from mlib to SQL > -- > > Key: SPARK-39916 > URL: https://issues.apache.org/jira/browse/SPARK-39916 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 3.3.0 >Reporter: deshanxiao >Priority: Minor > > Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the > SchemaUtils of mllib left a TODO tag. Let's do this! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39916) Merge SchemaUtils from mlib to SQL
deshanxiao created SPARK-39916: -- Summary: Merge SchemaUtils from mlib to SQL Key: SPARK-39916 URL: https://issues.apache.org/jira/browse/SPARK-39916 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 3.3.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059916#comment-17059916 ] deshanxiao commented on SPARK-31140: Sure you are right. I just suggest that if we could add a new method "samplePartition" to do it. > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must spend too much time reading it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059908#comment-17059908 ] deshanxiao commented on SPARK-31140: [~viirya] Thanks for your comment! It mean that we can overwrite the *getPartitions* to filter the partition directly. If we have 200 partitions, the samplePartition will return 20 partitions when the ratio is 0.1. > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must spend too much time reading it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-31140: --- Description: RDD.sample use the function of *filter* to pick up the data we need. It means that if the raw data is very huge, we must spend too much time reading it. We can filter the raw partition to speed up the processing of sample. {code:java} override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } {code} was: RDD.sample use the function of *filter* to pick up the data we need. It means that if the raw data is very huge, we must cost too much time to read it. We can filter the raw partition to speed up the processing of sample. {code:java} override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } {code} > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must spend too much time reading it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31140) Support Quick sample in RDD
[ https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-31140: --- Description: RDD.sample use the function of *filter* to pick up the data we need. It means that if the raw data is very huge, we must cost too much time to read it. We can filter the raw partition to speed up the processing of sample. {code:java} override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } {code} was: RDD.sample use *filter* to read the raw data. It means that if the raw data is very huge, we must cost too much time to read it. We can filter the raw partition to speed up the processing of sample. {code:java} override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } {code} > Support Quick sample in RDD > --- > > Key: SPARK-31140 > URL: https://issues.apache.org/jira/browse/SPARK-31140 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > RDD.sample use the function of *filter* to pick up the data we need. It means > that if the raw data is very huge, we must cost too much time to read it. We > can filter the raw partition to speed up the processing of sample. > {code:java} > override def compute(splitIn: Partition, context: TaskContext): Iterator[U] > = { > val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] > val thisSampler = sampler.clone > thisSampler.setSeed(split.seed) > thisSampler.sample(firstParent[T].iterator(split.prev, context)) > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31140) Support Quick sample in RDD
deshanxiao created SPARK-31140: -- Summary: Support Quick sample in RDD Key: SPARK-31140 URL: https://issues.apache.org/jira/browse/SPARK-31140 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: deshanxiao RDD.sample use *filter* to read the raw data. It means that if the raw data is very huge, we must cost too much time to read it. We can filter the raw partition to speed up the processing of sample. {code:java} override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access
[ https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-31112: --- Description: Now, we use HiveClientImpl to access hive metastore. However, a long running rpc in hive will block all of the query. Currently, we use the member of externalCatalog in ShardState to access. It's singleton. Maybe, we can use multiple extrenal catalog instance to speed up metastore access in read-only situation. Original: Query 1: DatabaseExist -> getTable -> getPartiton (6s) Query 2: DatabaseExist -> getTable -> getPartiton (5s) Total cost: 11s Now: Query 1: DatabaseExist -> getTable -> getPartiton (6s) Query 2: DatabaseExist -> getTable -> getPartiton (5s) Total cost: 6s was:Now, we use HiveClientImpl to access hive metastore. However, a long running rpc in hive will block all of the query. Currently, we use the member of externalCatalog in ShardState to access. Maybe, we can use multiple extrenal catalog instance to speed up metastore access in read-only situation. > Use multiple extrenal catalog to speed up metastore access > -- > > Key: SPARK-31112 > URL: https://issues.apache.org/jira/browse/SPARK-31112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > Now, we use HiveClientImpl to access hive metastore. However, a long running > rpc in hive will block all of the query. Currently, we use the member of > externalCatalog in ShardState to access. It's singleton. > Maybe, we can use multiple extrenal catalog instance to speed up metastore > access in read-only situation. > Original: > Query 1: > DatabaseExist -> getTable -> getPartiton (6s) > Query 2: > DatabaseExist -> getTable -> getPartiton (5s) > Total cost: 11s > Now: > Query 1: > DatabaseExist -> getTable -> getPartiton (6s) > Query 2: > DatabaseExist -> getTable -> getPartiton (5s) > Total cost: 6s -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access
[ https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-31112: --- Description: Now, we use HiveClientImpl to access hive metastore. However, a long running rpc in hive will block all of the query. Currently, we use the member of externalCatalog in ShardState to access. Maybe, we can use multiple extrenal catalog instance to speed up metastore access in read-only situation. (was: Now, we use HiveClientImpl to access hive metastore. However, a long running rpc in hive will block all of the query. Currently, we use the member of externalCatalog in ShardState to access. Maybe, we can use ) > Use multiple extrenal catalog to speed up metastore access > -- > > Key: SPARK-31112 > URL: https://issues.apache.org/jira/browse/SPARK-31112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > Now, we use HiveClientImpl to access hive metastore. However, a long running > rpc in hive will block all of the query. Currently, we use the member of > externalCatalog in ShardState to access. Maybe, we can use multiple extrenal > catalog instance to speed up metastore access in read-only situation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access
[ https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-31112: --- Description: Now, we use HiveClientImpl to access hive metastore. However, a long running rpc in hive will block all of the query. Currently, we use the member of externalCatalog in ShardState to access. Maybe, we can use > Use multiple extrenal catalog to speed up metastore access > -- > > Key: SPARK-31112 > URL: https://issues.apache.org/jira/browse/SPARK-31112 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > Now, we use HiveClientImpl to access hive metastore. However, a long running > rpc in hive will block all of the query. Currently, we use the member of > externalCatalog in ShardState to access. Maybe, we can use -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access
deshanxiao created SPARK-31112: -- Summary: Use multiple extrenal catalog to speed up metastore access Key: SPARK-31112 URL: https://issues.apache.org/jira/browse/SPARK-31112 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root
[ https://issues.apache.org/jira/browse/SPARK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-30883: --- Environment: The java api *setWritable,setReadable and setExecutable* dosen't work well because root can read / write or execute every files. Maybe, we could cancel these tests or fast failure when the mvn test is starting. (was: The java api *setWritable,setReadable and setExecutable* dosen't work well when the user is root. Maybe, we could cancel these tests or fast failure when the mvn test is starting.) > Tests that use setWritable,setReadable and setExecutable should be cancel > when user is root > --- > > Key: SPARK-30883 > URL: https://issues.apache.org/jira/browse/SPARK-30883 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 > Environment: The java api *setWritable,setReadable and setExecutable* > dosen't work well because root can read / write or execute every files. > Maybe, we could cancel these tests or fast failure when the mvn test is > starting. >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root
[ https://issues.apache.org/jira/browse/SPARK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-30883: --- Environment: The java api *setWritable,setReadable and setExecutable* dosen't work well when the user is root. Maybe, we could cancel these tests or fast failure when the mvn test is starting. (was: The java api *setWritable,setReadable and setExecutable* dosen't work when the user is root. Maybe, we could cancel these tests or fast failure when the mvn test is starting.) > Tests that use setWritable,setReadable and setExecutable should be cancel > when user is root > --- > > Key: SPARK-30883 > URL: https://issues.apache.org/jira/browse/SPARK-30883 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 > Environment: The java api *setWritable,setReadable and setExecutable* > dosen't work well when the user is root. Maybe, we could cancel these tests > or fast failure when the mvn test is starting. >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root
deshanxiao created SPARK-30883: -- Summary: Tests that use setWritable,setReadable and setExecutable should be cancel when user is root Key: SPARK-30883 URL: https://issues.apache.org/jira/browse/SPARK-30883 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.0.0 Environment: The java api *setWritable,setReadable and setExecutable* dosen't work when the user is root. Maybe, we could cancel these tests or fast failure when the mvn test is starting. Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30123) PartitionPruning should consider more case
deshanxiao created SPARK-30123: -- Summary: PartitionPruning should consider more case Key: SPARK-30123 URL: https://issues.apache.org/jira/browse/SPARK-30123 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: deshanxiao If left has partitionScan and right has PruningFilter but hasBenefit is false. The right will never be added a SubQuery. {code:java} var partScan = getPartitionTableScan(l, left) if (partScan.isDefined && canPruneLeft(joinType) && hasPartitionPruningFilter(right)) { val hasBenefit = pruningHasBenefit(l, partScan.get, r, right) newLeft = insertPredicate(l, newLeft, r, right, rightKeys, hasBenefit) } else { partScan = getPartitionTableScan(r, right) if (partScan.isDefined && canPruneRight(joinType) && hasPartitionPruningFilter(left) ) { val hasBenefit = pruningHasBenefit(r, partScan.get, l, left) newRight = insertPredicate(r, newRight, l, left, leftKeys, hasBenefit) } } case _ => } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30106) DynamicPartitionPruningSuite#"no predicate on the dimension table" is not be tested
deshanxiao created SPARK-30106: -- Summary: DynamicPartitionPruningSuite#"no predicate on the dimension table" is not be tested Key: SPARK-30106 URL: https://issues.apache.org/jira/browse/SPARK-30106 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.0.0 Reporter: deshanxiao The test "no predicate on the dimension table is not be tested" has no partiton key. We can change the sql to test it. {code:java} Given("no predicate on the dimension table") withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") { val df = sql( """ |SELECT * FROM fact_sk f |JOIN dim_store s |ON f.date_id = s.store_id """.stripMargin) checkPartitionPruningPredicate(df, false, false) } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30073) HistoryPage render "count" cost too much time
[ https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984786#comment-16984786 ] deshanxiao commented on SPARK-30073: [~kabhwan] Sorry, I have changed it to spark2.3.2. Thank you! > HistoryPage render "count" cost too much time > - > > Key: SPARK-30073 > URL: https://issues.apache.org/jira/browse/SPARK-30073 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > > {code:java} > "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 > nid=0x2c744 runnable [0x7f23775e6000] >java.lang.Thread.State: RUNNABLE > at > org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native > Method) > at > org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) > at > org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) > at > org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) > at > org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) > at > org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) > at scala.collection.AbstractIterator.count(Iterator.scala:1336) > at > org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at org.spark_project.jetty.server.handler.ContextHandler.do > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time
[ https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-30073: --- Affects Version/s: (was: 3.0.0) 2.3.2 > HistoryPage render "count" cost too much time > - > > Key: SPARK-30073 > URL: https://issues.apache.org/jira/browse/SPARK-30073 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > > {code:java} > "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 > nid=0x2c744 runnable [0x7f23775e6000] >java.lang.Thread.State: RUNNABLE > at > org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native > Method) > at > org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) > at > org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) > at > org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) > at > org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) > at > org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) > at scala.collection.AbstractIterator.count(Iterator.scala:1336) > at > org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at org.spark_project.jetty.server.handler.ContextHandler.do > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time
[ https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-30073: --- Description: {code:java} "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 nid=0x2c744 runnable [0x7f23775e6000] java.lang.Thread.State: RUNNABLE at org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native Method) at org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) at org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) at org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) at org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) at org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) at scala.collection.AbstractIterator.count(Iterator.scala:1336) at org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at org.spark_project.jetty.server.handler.ContextHandler.do {code} > HistoryPage render "count" cost too much time > - > > Key: SPARK-30073 > URL: https://issues.apache.org/jira/browse/SPARK-30073 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > {code:java} > "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 > nid=0x2c744 runnable [0x7f23775e6000] >java.lang.Thread.State: RUNNABLE > at > org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native > Method) > at > org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) > at > org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) > at > org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) > at > org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) > at > org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) > at scala.collection.AbstractIterator.count(Iterator.scala:1336) > at > org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) > at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at org.spark_project.jetty.server.handler.ContextHandler.do > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time
[ https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-30073: --- Environment: (was: {code:java} "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 nid=0x2c744 runnable [0x7f23775e6000] java.lang.Thread.State: RUNNABLE at org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native Method) at org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) at org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) at org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) at org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) at org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) at scala.collection.AbstractIterator.count(Iterator.scala:1336) at org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at org.spark_project.jetty.server.handler.ContextHandler.do {code} ) > HistoryPage render "count" cost too much time > - > > Key: SPARK-30073 > URL: https://issues.apache.org/jira/browse/SPARK-30073 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30073) HistoryPage render "count" cost too much time
deshanxiao created SPARK-30073: -- Summary: HistoryPage render "count" cost too much time Key: SPARK-30073 URL: https://issues.apache.org/jira/browse/SPARK-30073 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.0.0 Environment: {code:java} "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 nid=0x2c744 runnable [0x7f23775e6000] java.lang.Thread.State: RUNNABLE at org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native Method) at org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162) at org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128) at org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144) at org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218) at org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) at scala.collection.AbstractIterator.count(Iterator.scala:1336) at org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82) at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at org.spark_project.jetty.server.handler.ContextHandler.do {code} Reporter: deshanxiao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade
[ https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983536#comment-16983536 ] deshanxiao commented on SPARK-27780: I can't argee it more. Add shuffle service version is very necessary. > Shuffle server & client should be versioned to enable smoother upgrade > -- > > Key: SPARK-27780 > URL: https://issues.apache.org/jira/browse/SPARK-27780 > Project: Spark > Issue Type: New Feature > Components: Shuffle, Spark Core >Affects Versions: 3.0.0 >Reporter: Imran Rashid >Priority: Major > > The external shuffle service is often upgraded at a different time than spark > itself. However, this causes problems when the protocol changes between the > shuffle service and the spark runtime -- this forces users to upgrade > everything simultaneously. > We should add versioning to the shuffle client & server, so they know what > messages the other will support. This would allow better handling of mixed > versions, from better error msgs to allowing some mismatched versions (with > reduced capabilities). > This originally came up in a discussion here: > https://github.com/apache/spark/pull/24565#issuecomment-493496466 > There are a few ways we could do the versioning which we still need to > discuss: > 1) Version specified by config. This allows for mixed versions across the > cluster and rolling upgrades. It also will let a spark 3.0 client talk to a > 2.4 shuffle service. But, may be a nuisance for users to get this right. > 2) Auto-detection during registration with local shuffle service. This makes > the versioning easy for the end user, and can even handle a 2.4 shuffle > service though it does not support the new versioning. However, it will not > handle a rolling upgrade correctly -- if the local shuffle service has been > upgraded, but other nodes in the cluster have not, it will get the version > wrong. > 3) Exchange versions per-connection. When a connection is opened, the server > & client could first exchange messages with their versions, so they know how > to continue communication after that. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29711) Dynamic adjust spark sql class log level in beeline
deshanxiao created SPARK-29711: -- Summary: Dynamic adjust spark sql class log level in beeline Key: SPARK-29711 URL: https://issues.apache.org/jira/browse/SPARK-29711 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: deshanxiao We can change the log level in beeline like: set spark.log.level=debug. It will not change a lot but useful -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only
deshanxiao created SPARK-28987: -- Summary: DiskBlockManager#createTempShuffleBlock should skip directory which is read-only Key: SPARK-28987 URL: https://issues.apache.org/jira/browse/SPARK-28987 Project: Spark Issue Type: Improvement Components: Shuffle Affects Versions: 3.0.0 Reporter: deshanxiao DiskBlockManager#createTempShuffleBlock only considers the path which is not exist. I think we could check whether the path is writeable or not. It's resonable beacuse we invoke createTempShuffleBlock to create a new path to write files in it. It should be writeable. stack: {code:java} Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task 1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor 251): java.io.FileNotFoundException: /home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74 (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150) at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:100) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28944) Expose peak memory of executor in metrics for parameter tuning
deshanxiao created SPARK-28944: -- Summary: Expose peak memory of executor in metrics for parameter tuning Key: SPARK-28944 URL: https://issues.apache.org/jira/browse/SPARK-28944 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: deshanxiao Maybe we can collect the peak of executor memory in heartbeat for parameter tuning like spark.executor.memory -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-28658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-28658: --- Description: In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. Maybe we can use a clearer state not a "success". was: In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. > Yarn FinalStatus is always "success" in yarn-client mode > -- > > Key: SPARK-28658 > URL: https://issues.apache.org/jira/browse/SPARK-28658 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > In yarn-client mode, the finalStatus of application will always be success > because the ApplicationMaster returns success when the driver disconnected. > A simple examle is that: > {code:java} > sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect > {code} > When we run the code in yarn-client mode, the finalStatus will be success. It > misleads us. Maybe we can use a clearer state not a "success". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode
deshanxiao created SPARK-28658: -- Summary: Yarn FinalStatus is always "success" in yarn-client mode Key: SPARK-28658 URL: https://issues.apache.org/jira/browse/SPARK-28658 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.0.0 Reporter: deshanxiao In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27171) Support Full-Partiton limit in the first scan
deshanxiao created SPARK-27171: -- Summary: Support Full-Partiton limit in the first scan Key: SPARK-27171 URL: https://issues.apache.org/jira/browse/SPARK-27171 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 2.3.2 Reporter: deshanxiao SparkPlan#executeTake must pick element starting with one partition. Sometimes it will be slow for some query. Although, Spark is better at batch query. It's not bad to add a switch to allow user search all partitons for the first time in limit. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26954) Do not attemp when user code throws exception
deshanxiao created SPARK-26954: -- Summary: Do not attemp when user code throws exception Key: SPARK-26954 URL: https://issues.apache.org/jira/browse/SPARK-26954 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.4.0, 2.3.3 Reporter: deshanxiao Yarn attemps the failed App depending on YarnRMClient#unregister. However, some attemps are useless: {code:java} sc.parallelize(Seq(1,2,3)).map(_ => throw new RuntimeException("exception")).collect() {code} Some environment errors, such as node dead, attemps reasonablely. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26954) Do not attemp when user code throws exception
[ https://issues.apache.org/jira/browse/SPARK-26954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26954: --- Description: Yarn attemps the failed App depending on YarnRMClient#unregister. However, some attemps are useless like the example: {code:java} sc.parallelize(Seq(1,2,3)).map(_ => throw new RuntimeException("exception")).collect() {code} Also some attemps when "FileNorFoundException" is thrown in user code looks unreasonable. Some environment errors, such as node dead, attemps reasonablely. So, it will be better that user exception will not attemp. was: Yarn attemps the failed App depending on YarnRMClient#unregister. However, some attemps are useless: {code:java} sc.parallelize(Seq(1,2,3)).map(_ => throw new RuntimeException("exception")).collect() {code} Some environment errors, such as node dead, attemps reasonablely. So, it will be bettler to at > Do not attemp when user code throws exception > - > > Key: SPARK-26954 > URL: https://issues.apache.org/jira/browse/SPARK-26954 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.3.3, 2.4.0 >Reporter: deshanxiao >Priority: Critical > > Yarn attemps the failed App depending on YarnRMClient#unregister. However, > some attemps are useless like the example: > {code:java} > sc.parallelize(Seq(1,2,3)).map(_ => throw new > RuntimeException("exception")).collect() > {code} > Also some attemps when "FileNorFoundException" is thrown in user code looks > unreasonable. > Some environment errors, such as node dead, attemps reasonablely. So, it will > be better that user exception will not attemp. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26954) Do not attemp when user code throws exception
[ https://issues.apache.org/jira/browse/SPARK-26954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26954: --- Description: Yarn attemps the failed App depending on YarnRMClient#unregister. However, some attemps are useless: {code:java} sc.parallelize(Seq(1,2,3)).map(_ => throw new RuntimeException("exception")).collect() {code} Some environment errors, such as node dead, attemps reasonablely. So, it will be bettler to at was: Yarn attemps the failed App depending on YarnRMClient#unregister. However, some attemps are useless: {code:java} sc.parallelize(Seq(1,2,3)).map(_ => throw new RuntimeException("exception")).collect() {code} Some environment errors, such as node dead, attemps reasonablely. > Do not attemp when user code throws exception > - > > Key: SPARK-26954 > URL: https://issues.apache.org/jira/browse/SPARK-26954 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.3.3, 2.4.0 >Reporter: deshanxiao >Priority: Critical > > Yarn attemps the failed App depending on YarnRMClient#unregister. However, > some attemps are useless: > {code:java} > sc.parallelize(Seq(1,2,3)).map(_ => throw new > RuntimeException("exception")).collect() > {code} > Some environment errors, such as node dead, attemps reasonablely. So, it will > be bettler to at -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26714) The job whose partiton num is zero not shown in WebUI
deshanxiao created SPARK-26714: -- Summary: The job whose partiton num is zero not shown in WebUI Key: SPARK-26714 URL: https://issues.apache.org/jira/browse/SPARK-26714 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.0, 2.3.1 Reporter: deshanxiao When the job's partiton is zero, it will still get a jobid but not shown in ui.I think it's strange. Example: mkdir /home/test/testdir sc.textFile("/home/test/testdir") -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
[ https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739083#comment-16739083 ] deshanxiao commented on SPARK-26570: [~hyukjin.kwon] OK, I will try it. Thank you! > Out of memory when InMemoryFileIndex bulkListLeafFiles > -- > > Key: SPARK-26570 > URL: https://issues.apache.org/jira/browse/SPARK-26570 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > Attachments: screenshot-1.png > > > The *bulkListLeafFiles* will collect all filestatus in memory for every query > which may cause the oom of driver. I use the spark 2.3.2 meeting with the > problem. Maybe the latest one also exists the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
[ https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26570: --- Description: The *bulkListLeafFiles* will collect all filestatus in memory for every query which may cause the oom of driver. I use the spark 2.3.2 meeting with the problem. Maybe the latest one also exists the problem. (was: The *bulkListLeafFiles* will collect all filestatus in memory for every query which may cause the oom of driver. I use the spark 2.3.2 meeting with the problem. Maybe the latest one ) > Out of memory when InMemoryFileIndex bulkListLeafFiles > -- > > Key: SPARK-26570 > URL: https://issues.apache.org/jira/browse/SPARK-26570 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > Attachments: screenshot-1.png > > > The *bulkListLeafFiles* will collect all filestatus in memory for every query > which may cause the oom of driver. I use the spark 2.3.2 meeting with the > problem. Maybe the latest one also exists the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
[ https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26570: --- Description: The *bulkListLeafFiles* will collect all filestatus in memory for every query which may cause the oom of driver. I use the spark 2.3.2 meeting with the problem. Maybe the latest one (was: The bulkListLeafFiles will collect all filestatus in memory for every query which may cause the oom of driver.) > Out of memory when InMemoryFileIndex bulkListLeafFiles > -- > > Key: SPARK-26570 > URL: https://issues.apache.org/jira/browse/SPARK-26570 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > Attachments: screenshot-1.png > > > The *bulkListLeafFiles* will collect all filestatus in memory for every query > which may cause the oom of driver. I use the spark 2.3.2 meeting with the > problem. Maybe the latest one -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
[ https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737061#comment-16737061 ] deshanxiao commented on SPARK-26570: !screenshot-1.png! > Out of memory when InMemoryFileIndex bulkListLeafFiles > -- > > Key: SPARK-26570 > URL: https://issues.apache.org/jira/browse/SPARK-26570 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > Attachments: screenshot-1.png > > > The bulkListLeafFiles will collect all filestatus in memory for every query > which may cause the oom of driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
[ https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26570: --- Attachment: screenshot-1.png > Out of memory when InMemoryFileIndex bulkListLeafFiles > -- > > Key: SPARK-26570 > URL: https://issues.apache.org/jira/browse/SPARK-26570 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: deshanxiao >Priority: Major > Attachments: screenshot-1.png > > > The bulkListLeafFiles will collect all filestatus in memory for every query > which may cause the oom of driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles
deshanxiao created SPARK-26570: -- Summary: Out of memory when InMemoryFileIndex bulkListLeafFiles Key: SPARK-26570 URL: https://issues.apache.org/jira/browse/SPARK-26570 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.2 Reporter: deshanxiao The bulkListLeafFiles will collect all filestatus in memory for every query which may cause the oom of driver. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab
[ https://issues.apache.org/jira/browse/SPARK-26457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735707#comment-16735707 ] deshanxiao commented on SPARK-26457: [~planga82] Hi, thanks for your reply! I know that yarn provided all hadoop configurations. But I guess it may be fine that the historyserver unify all configuration in it. I care the case where different hadoop version may have different behavior or some configurations need a special hadoop version. It will be convenient for us to debug some problems. Thanks a lot! > Show hadoop configurations in HistoryServer environment tab > --- > > Key: SPARK-26457 > URL: https://issues.apache.org/jira/browse/SPARK-26457 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Web UI >Affects Versions: 2.3.2, 2.4.0 > Environment: Maybe it is good to show some configurations in > HistoryServer environment tab for debugging some bugs about hadoop >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26528) FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" property
[ https://issues.apache.org/jira/browse/SPARK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26528: --- Priority: Minor (was: Major) > FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" > property > - > > Key: SPARK-26528 > URL: https://issues.apache.org/jira/browse/SPARK-26528 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: deshanxiao >Priority: Minor > > Running the FsHistoryProviderSuite in idea failled because the property > "spark.testing" not exist.In this situation, replay executor may replay a > file twice. > {code:java} > private val replayExecutor: ExecutorService = { > if (!conf.contains("spark.testing")) { > ThreadUtils.newDaemonFixedThreadPool(NUM_PROCESSING_THREADS, > "log-replay-executor") > } else { > MoreExecutors.sameThreadExecutor() > } > } > {code} > {code:java} > "SPARK-3697: ignore files that cannot be read." > 2 was not equal to 1 > ScalaTestFailureLocation: > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12 at > (FsHistoryProviderSuite.scala:179) > Expected :1 > Actual :2 > > org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 > at > org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) > at > org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) > at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:179) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) > at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) > at org.scalatest.Suite$class.run(Suite.scala:1147) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) > at org.scalatest.SuperEngine.runImpl(Engine.scala:521) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233) > at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$run(FsHistoryProviderSuite.scala:51) > at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:258) > at >
[jira] [Created] (SPARK-26528) FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" property
deshanxiao created SPARK-26528: -- Summary: FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" property Key: SPARK-26528 URL: https://issues.apache.org/jira/browse/SPARK-26528 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0, 2.3.2 Reporter: deshanxiao Running the FsHistoryProviderSuite in idea failled because the property "spark.testing" not exist.In this situation, replay executor may replay a file twice. {code:java} private val replayExecutor: ExecutorService = { if (!conf.contains("spark.testing")) { ThreadUtils.newDaemonFixedThreadPool(NUM_PROCESSING_THREADS, "log-replay-executor") } else { MoreExecutors.sameThreadExecutor() } } {code} {code:java} "SPARK-3697: ignore files that cannot be read." 2 was not equal to 1 ScalaTestFailureLocation: org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12 at (FsHistoryProviderSuite.scala:179) Expected :1 Actual :2 org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:179) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) at org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) at org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229) at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at org.scalatest.Suite$class.run(Suite.scala:1147) at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) at org.scalatest.SuperEngine.runImpl(Engine.scala:521) at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233) at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31) at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210) at org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$run(FsHistoryProviderSuite.scala:51) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:258) at org.apache.spark.deploy.history.FsHistoryProviderSuite.run(FsHistoryProviderSuite.scala:51) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334) at
[jira] [Created] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab
deshanxiao created SPARK-26457: -- Summary: Show hadoop configurations in HistoryServer environment tab Key: SPARK-26457 URL: https://issues.apache.org/jira/browse/SPARK-26457 Project: Spark Issue Type: New Feature Components: Spark Core, Web UI Affects Versions: 2.4.0, 2.3.2 Environment: Maybe it is good to show some configurations in HistoryServer environment tab for debugging some bugs about hadoop Reporter: deshanxiao -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
[ https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718385#comment-16718385 ] deshanxiao commented on SPARK-26333: [~vanzin] Yes, you are right! Thank you very much! But why setReadable doesn't work as root? > FsHistoryProviderSuite failed because setReadable doesn't work in RedHat > > > Key: SPARK-26333 > URL: https://issues.apache.org/jira/browse/SPARK-26333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: deshanxiao >Priority: Major > > FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot > be read.". I try to invoke logFile2.canRead after invoking > "setReadable(false, false)" . And I find that the result of > "logFile2.canRead" is true but in my ubuntu16.04 return false. > The environment: > RedHat: > Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) > (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 > 22:26:13 UTC 2017 > JDK > Java version: 1.8.0_151, vendor: Oracle Corporation > {code:java} > org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 > at > org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) > at > org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) > at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) > at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
[ https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-26333: --- Comment: was deleted (was: [~vanzin] No, I am not running as root.) > FsHistoryProviderSuite failed because setReadable doesn't work in RedHat > > > Key: SPARK-26333 > URL: https://issues.apache.org/jira/browse/SPARK-26333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: deshanxiao >Priority: Major > > FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot > be read.". I try to invoke logFile2.canRead after invoking > "setReadable(false, false)" . And I find that the result of > "logFile2.canRead" is true but in my ubuntu16.04 return false. > The environment: > RedHat: > Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) > (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 > 22:26:13 UTC 2017 > JDK > Java version: 1.8.0_151, vendor: Oracle Corporation > {code:java} > org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 > at > org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) > at > org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) > at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) > at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
[ https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718337#comment-16718337 ] deshanxiao commented on SPARK-26333: [~vanzin] No, I am not running as root. > FsHistoryProviderSuite failed because setReadable doesn't work in RedHat > > > Key: SPARK-26333 > URL: https://issues.apache.org/jira/browse/SPARK-26333 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: deshanxiao >Priority: Major > > FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot > be read.". I try to invoke logFile2.canRead after invoking > "setReadable(false, false)" . And I find that the result of > "logFile2.canRead" is true but in my ubuntu16.04 return false. > The environment: > RedHat: > Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) > (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 > 22:26:13 UTC 2017 > JDK > Java version: 1.8.0_151, vendor: Oracle Corporation > {code:java} > org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 > at > org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) > at > org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) > at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) > at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) > at > org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) > at scala.collection.immutable.List.foreach(List.scala:381) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
deshanxiao created SPARK-26333: -- Summary: FsHistoryProviderSuite failed because setReadable doesn't work in RedHat Key: SPARK-26333 URL: https://issues.apache.org/jira/browse/SPARK-26333 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: deshanxiao FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot be read.". I try to invoke logFile2.canRead after invoking "setReadable(false, false)" . And I find that the result of "logFile2.canRead" is true but in my ubuntu16.04 return false. The environment: RedHat: Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 UTC 2017 JDK Java version: 1.8.0_151, vendor: Oracle Corporation {code:java} org.scalatest.exceptions.TestFailedException: 2 was not equal to 1 at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340) at org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668) at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182) at org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182) at org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68) at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) at org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203) at org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) at scala.collection.immutable.List.foreach(List.scala:381) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25120) EventLogListener may miss driver SparkListenerBlockManagerAdded event
[ https://issues.apache.org/jira/browse/SPARK-25120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580670#comment-16580670 ] deshanxiao commented on SPARK-25120: Sure, I find the tab "Executors" in HistorySever sometimes miss the info of driver in executor-id colunm, it isn't convenient when we analysis the problem of driver. [~hyukjin.kwon] > EventLogListener may miss driver SparkListenerBlockManagerAdded event > -- > > Key: SPARK-25120 > URL: https://issues.apache.org/jira/browse/SPARK-25120 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: deshanxiao >Priority: Major > > Sometimes in spark history tab "Executors" , it couldn't find driver > information because the event of SparkListenerBlockManagerAdded is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25120) EventLogListener may miss driver SparkListenerBlockManagerAdded event
deshanxiao created SPARK-25120: -- Summary: EventLogListener may miss driver SparkListenerBlockManagerAdded event Key: SPARK-25120 URL: https://issues.apache.org/jira/browse/SPARK-25120 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: deshanxiao Sometimes in spark history tab "Executors" , it couldn't find driver information because the event of SparkListenerBlockManagerAdded is lost. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed
deshanxiao created SPARK-25100: -- Summary: Using KryoSerializer and setting registrationRequired true can lead job failed Key: SPARK-25100 URL: https://issues.apache.org/jira/browse/SPARK-25100 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.1 Reporter: deshanxiao When spark.serializer is org.apache.spark.serializer.KryoSerializer and spark.kryo.registrationRequired is true in SparkCOnf. I invoked saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because the class TaskCommitMessage hasn't be registered. {code:java} java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage Note: To register this class use: kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class); at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488) at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52) at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97) at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517) at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622) at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org