[jira] [Commented] (SPARK-40472) Improve pyspark.sql.function example experience

2022-09-19 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606879#comment-17606879
 ] 

deshanxiao commented on SPARK-40472:


[~hyukjin.kwon] OK, thanks~ 

> Improve pyspark.sql.function example experience
> ---
>
> Key: SPARK-40472
> URL: https://issues.apache.org/jira/browse/SPARK-40472
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> There are many exanple in pyspark.sql.function:
> {code:java}
>     Examples
>     
>     >>> df = spark.range(1)
>     >>> df.select(lit(5).alias('height'), df.id).show()
>     +--+---+
>     |height| id|
>     +--+---+
>     |     5|  0|
>     +--+---+ {code}
> We can add import statements so that the user can directly run it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40472) Improve pyspark.sql.function example experience

2022-09-19 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao resolved SPARK-40472.

Resolution: Fixed

> Improve pyspark.sql.function example experience
> ---
>
> Key: SPARK-40472
> URL: https://issues.apache.org/jira/browse/SPARK-40472
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> There are many exanple in pyspark.sql.function:
> {code:java}
>     Examples
>     
>     >>> df = spark.range(1)
>     >>> df.select(lit(5).alias('height'), df.id).show()
>     +--+---+
>     |height| id|
>     +--+---+
>     |     5|  0|
>     +--+---+ {code}
> We can add import statements so that the user can directly run it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40472) Improve pyspark.sql.function example experience

2022-09-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40472:
---
Description: 
There are many exanple in pyspark.sql.function:
{code:java}
    Examples
    
    >>> df = spark.range(1)
    >>> df.select(lit(5).alias('height'), df.id).show()
    +--+---+
    |height| id|
    +--+---+
    |     5|  0|
    +--+---+ {code}
We can add import statements so that the user can directly run it.

> Improve pyspark.sql.function example experience
> ---
>
> Key: SPARK-40472
> URL: https://issues.apache.org/jira/browse/SPARK-40472
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> There are many exanple in pyspark.sql.function:
> {code:java}
>     Examples
>     
>     >>> df = spark.range(1)
>     >>> df.select(lit(5).alias('height'), df.id).show()
>     +--+---+
>     |height| id|
>     +--+---+
>     |     5|  0|
>     +--+---+ {code}
> We can add import statements so that the user can directly run it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40472) Improve pyspark.sql.function example experience

2022-09-16 Thread deshanxiao (Jira)
deshanxiao created SPARK-40472:
--

 Summary: Improve pyspark.sql.function example experience
 Key: SPARK-40472
 URL: https://issues.apache.org/jira/browse/SPARK-40472
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40192) Remove redundant groupby

2022-08-23 Thread deshanxiao (Jira)
deshanxiao created SPARK-40192:
--

 Summary: Remove redundant groupby
 Key: SPARK-40192
 URL: https://issues.apache.org/jira/browse/SPARK-40192
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40103) Support read/write.csv() in SparkR

2022-08-17 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580623#comment-17580623
 ] 

deshanxiao edited comment on SPARK-40103 at 8/17/22 7:23 AM:
-

Yes read.csv, read.csv2 have been used in R utils packages.


was (Author: deshanxiao):
Yes read.csv, read.csv2 have benn used in R utils packages.

> Support read/write.csv() in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, almost languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40103) Support read/write.csv() in SparkR

2022-08-17 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580623#comment-17580623
 ] 

deshanxiao commented on SPARK-40103:


Yes read.csv, read.csv2 have benn used in R utils packages.

> Support read/write.csv() in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, almost languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR

2022-08-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40103:
---
Description: 
Today, almost languages support the DataFrameReader.csv API, only R is missing. 
we need to use df.read() to read the csv file. We need a more high-level api 
for it.

Java:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]

Scala:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]

Python:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]

  was:
Today, all major languages support the DataFrameReader.csv API, only R is 
missing. we need to use df.read() to read the csv file. We need a more 
high-level api for it.

Java:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]

Scala:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]

Python:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]


> Support read/write.csv() in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, almost languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR

2022-08-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40103:
---
Issue Type: New Feature  (was: Improvement)

> Support read/write.csv() in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, almost languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40103) Support read.csv in SparkR

2022-08-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40103:
---
Description: 
Today, all major languages support the DataFrameReader.csv API, only R is 
missing. we need to use df.read() to read the csv file. We need a more 
high-level api for it.

Java:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]

Scala:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]

Python:
[DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]

> Support read.csv in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, all major languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40103) Support read.csv() in SparkR

2022-08-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40103:
---
Summary: Support read.csv() in SparkR  (was: Support read.csv in SparkR)

> Support read.csv() in SparkR
> 
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, all major languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40103) Support read/write.csv() in SparkR

2022-08-16 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-40103:
---
Summary: Support read/write.csv() in SparkR  (was: Support read.csv() in 
SparkR)

> Support read/write.csv() in SparkR
> --
>
> Key: SPARK-40103
> URL: https://issues.apache.org/jira/browse/SPARK-40103
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> Today, all major languages support the DataFrameReader.csv API, only R is 
> missing. we need to use df.read() to read the csv file. We need a more 
> high-level api for it.
> Java:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html]
> Scala:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/DataFrameReader.html#csv(paths:String*):org.apache.spark.sql.DataFrame]
> Python:
> [DataFrameReader.csv()|https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrameReader.csv.html#pyspark.sql.DataFrameReader.csv]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40103) Support read.csv in SparkR

2022-08-16 Thread deshanxiao (Jira)
deshanxiao created SPARK-40103:
--

 Summary: Support read.csv in SparkR
 Key: SPARK-40103
 URL: https://issues.apache.org/jira/browse/SPARK-40103
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 3.3.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39934) takeRDD in R is slow

2022-08-16 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17580183#comment-17580183
 ] 

deshanxiao commented on SPARK-39934:


[~hyukjin.kwon] I have confirmed the code below, the takeRDD method of RDD.R 
will only be used in the test. It doesn't affect the actual running code. Thank 
you~

> takeRDD in R is slow
> 
>
> Key: SPARK-39934
> URL: https://issues.apache.org/jira/browse/SPARK-39934
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> The api of SparkR:::takeRDD retrieves the result one partition per round. We 
> can re-implement it according to the current scala code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39934) takeRDD in R is slow

2022-08-04 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575221#comment-17575221
 ] 

deshanxiao commented on SPARK-39934:


[~hyukjin.kwon] Hi, Maybe there is something wrong with my presentation. I mean 
that *take * has performance problems because it only takes one partition at a 
time, even if take is not exposed to the user.

> takeRDD in R is slow
> 
>
> Key: SPARK-39934
> URL: https://issues.apache.org/jira/browse/SPARK-39934
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> The api of SparkR:::takeRDD retrieves the result one partition per round. We 
> can re-implement it according to the current scala code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39934) takeRDD in R is slow

2022-08-01 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-39934:
---
Description: The api of SparkR:::takeRDD retrieves the result one partition 
per round. We can re-implement it according to the current scala code.

> takeRDD in R is slow
> 
>
> Key: SPARK-39934
> URL: https://issues.apache.org/jira/browse/SPARK-39934
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Major
>
> The api of SparkR:::takeRDD retrieves the result one partition per round. We 
> can re-implement it according to the current scala code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39934) takeRDD in R is slow

2022-08-01 Thread deshanxiao (Jira)
deshanxiao created SPARK-39934:
--

 Summary: takeRDD in R is slow
 Key: SPARK-39934
 URL: https://issues.apache.org/jira/browse/SPARK-39934
 Project: Spark
  Issue Type: Improvement
  Components: R
Affects Versions: 3.3.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39916) Merge SchemaUtils from mlib to SQL

2022-07-28 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-39916:
---
Description: Today we have two SchemaUtils: SQL SchemaUtils and mllib 
SchemaUtils. the SchemaUtils of mllib left a TODO tag to merge to SQL. Let's do 
this!  (was: Today we have two SchemaUtils: SQL SchemaUtils and mllib 
SchemaUtils. the SchemaUtils of mllib left a TODO tag. Let's do this!)

> Merge SchemaUtils from mlib to SQL
> --
>
> Key: SPARK-39916
> URL: https://issues.apache.org/jira/browse/SPARK-39916
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the 
> SchemaUtils of mllib left a TODO tag to merge to SQL. Let's do this!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39916) Merge SchemaUtils from mlib to SQL

2022-07-28 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-39916:
---
Description: Today we have two SchemaUtils: SQL SchemaUtils and mllib 
SchemaUtils. the SchemaUtils of mllib left a TODO tag. Let's do this!

> Merge SchemaUtils from mlib to SQL
> --
>
> Key: SPARK-39916
> URL: https://issues.apache.org/jira/browse/SPARK-39916
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 3.3.0
>Reporter: deshanxiao
>Priority: Minor
>
> Today we have two SchemaUtils: SQL SchemaUtils and mllib SchemaUtils. the 
> SchemaUtils of mllib left a TODO tag. Let's do this!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39916) Merge SchemaUtils from mlib to SQL

2022-07-28 Thread deshanxiao (Jira)
deshanxiao created SPARK-39916:
--

 Summary: Merge SchemaUtils from mlib to SQL
 Key: SPARK-39916
 URL: https://issues.apache.org/jira/browse/SPARK-39916
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, SQL
Affects Versions: 3.3.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31140) Support Quick sample in RDD

2020-03-15 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059916#comment-17059916
 ] 

deshanxiao commented on SPARK-31140:


Sure you are right. I just suggest that if we could add a new method 
"samplePartition" to do it.

> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31140) Support Quick sample in RDD

2020-03-15 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059908#comment-17059908
 ] 

deshanxiao commented on SPARK-31140:


[~viirya] Thanks for your comment! It mean that we can overwrite the 
*getPartitions* to filter the partition directly. If we have 200 partitions, 
the samplePartition will return 20 partitions when the ratio is 0.1.

> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31140) Support Quick sample in RDD

2020-03-12 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-31140:
---
Description: 
RDD.sample use the function of *filter* to pick up the data we need. It means 
that if the raw data is very huge, we must spend too much time reading it. We 
can filter the raw partition to speed up the processing of sample.


{code:java}
  override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = 
{
val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
val thisSampler = sampler.clone
thisSampler.setSeed(split.seed)
thisSampler.sample(firstParent[T].iterator(split.prev, context))
  }
{code}


  was:
RDD.sample use the function of *filter* to pick up the data we need. It means 
that if the raw data is very huge, we must cost too much time to read it. We 
can filter the raw partition to speed up the processing of sample.


{code:java}
  override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = 
{
val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
val thisSampler = sampler.clone
thisSampler.setSeed(split.seed)
thisSampler.sample(firstParent[T].iterator(split.prev, context))
  }
{code}



> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31140) Support Quick sample in RDD

2020-03-12 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-31140:
---
Description: 
RDD.sample use the function of *filter* to pick up the data we need. It means 
that if the raw data is very huge, we must cost too much time to read it. We 
can filter the raw partition to speed up the processing of sample.


{code:java}
  override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = 
{
val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
val thisSampler = sampler.clone
thisSampler.setSeed(split.seed)
thisSampler.sample(firstParent[T].iterator(split.prev, context))
  }
{code}


  was:
RDD.sample use *filter* to read the raw data. It means that if the raw data is 
very huge, we must cost too much time to read it. We can filter the raw 
partition to speed up the processing of sample.


{code:java}
  override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = 
{
val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
val thisSampler = sampler.clone
thisSampler.setSeed(split.seed)
thisSampler.sample(firstParent[T].iterator(split.prev, context))
  }
{code}



> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must cost too much time to read it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31140) Support Quick sample in RDD

2020-03-12 Thread deshanxiao (Jira)
deshanxiao created SPARK-31140:
--

 Summary: Support Quick sample in RDD
 Key: SPARK-31140
 URL: https://issues.apache.org/jira/browse/SPARK-31140
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: deshanxiao


RDD.sample use *filter* to read the raw data. It means that if the raw data is 
very huge, we must cost too much time to read it. We can filter the raw 
partition to speed up the processing of sample.


{code:java}
  override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = 
{
val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
val thisSampler = sampler.clone
thisSampler.setSeed(split.seed)
thisSampler.sample(firstParent[T].iterator(split.prev, context))
  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access

2020-03-11 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-31112:
---
Description: 
Now, we use HiveClientImpl to access hive metastore. However,  a long running 
rpc in hive will block all of the query. Currently, we use the member of 
externalCatalog in ShardState to access. It's singleton.

Maybe, we can use multiple extrenal catalog instance to speed up metastore 
access in read-only situation.

Original:

Query 1:

DatabaseExist -> getTable -> getPartiton  (6s)

Query 2:

DatabaseExist -> getTable -> getPartiton   (5s)

Total cost: 11s

Now:

Query 1:

DatabaseExist -> getTable -> getPartiton  (6s)

Query 2:

DatabaseExist -> getTable -> getPartiton   (5s)

Total cost: 6s

  was:Now, we use HiveClientImpl to access hive metastore. However,  a long 
running rpc in hive will block all of the query. Currently, we use the member 
of externalCatalog in ShardState to access. Maybe, we can use multiple extrenal 
catalog instance to speed up metastore access in read-only situation.


> Use multiple extrenal catalog to speed up metastore access
> --
>
> Key: SPARK-31112
> URL: https://issues.apache.org/jira/browse/SPARK-31112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> Now, we use HiveClientImpl to access hive metastore. However,  a long running 
> rpc in hive will block all of the query. Currently, we use the member of 
> externalCatalog in ShardState to access. It's singleton.
> Maybe, we can use multiple extrenal catalog instance to speed up metastore 
> access in read-only situation.
> Original:
> Query 1:
> DatabaseExist -> getTable -> getPartiton  (6s)
> Query 2:
> DatabaseExist -> getTable -> getPartiton   (5s)
> Total cost: 11s
> Now:
> Query 1:
> DatabaseExist -> getTable -> getPartiton  (6s)
> Query 2:
> DatabaseExist -> getTable -> getPartiton   (5s)
> Total cost: 6s



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access

2020-03-11 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-31112:
---
Description: Now, we use HiveClientImpl to access hive metastore. However,  
a long running rpc in hive will block all of the query. Currently, we use the 
member of externalCatalog in ShardState to access. Maybe, we can use multiple 
extrenal catalog instance to speed up metastore access in read-only situation.  
(was: Now, we use HiveClientImpl to access hive metastore. However,  a long 
running rpc in hive will block all of the query. Currently, we use the member 
of externalCatalog in ShardState to access. Maybe, we can use )

> Use multiple extrenal catalog to speed up metastore access
> --
>
> Key: SPARK-31112
> URL: https://issues.apache.org/jira/browse/SPARK-31112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> Now, we use HiveClientImpl to access hive metastore. However,  a long running 
> rpc in hive will block all of the query. Currently, we use the member of 
> externalCatalog in ShardState to access. Maybe, we can use multiple extrenal 
> catalog instance to speed up metastore access in read-only situation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access

2020-03-11 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-31112:
---
Description: Now, we use HiveClientImpl to access hive metastore. However,  
a long running rpc in hive will block all of the query. Currently, we use the 
member of externalCatalog in ShardState to access. Maybe, we can use 

> Use multiple extrenal catalog to speed up metastore access
> --
>
> Key: SPARK-31112
> URL: https://issues.apache.org/jira/browse/SPARK-31112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> Now, we use HiveClientImpl to access hive metastore. However,  a long running 
> rpc in hive will block all of the query. Currently, we use the member of 
> externalCatalog in ShardState to access. Maybe, we can use 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31112) Use multiple extrenal catalog to speed up metastore access

2020-03-10 Thread deshanxiao (Jira)
deshanxiao created SPARK-31112:
--

 Summary: Use multiple extrenal catalog to speed up metastore access
 Key: SPARK-31112
 URL: https://issues.apache.org/jira/browse/SPARK-31112
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root

2020-02-20 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30883:
---
Environment: The java api *setWritable,setReadable and setExecutable* 
dosen't work well because root can read / write or execute every files. Maybe, 
we could cancel these tests or fast failure when the mvn test is starting.  
(was: The java api *setWritable,setReadable and setExecutable* dosen't work 
well when the user is root. Maybe, we could cancel these tests or fast failure 
when the mvn test is starting.)

> Tests that use setWritable,setReadable and setExecutable should be cancel 
> when user is root
> ---
>
> Key: SPARK-30883
> URL: https://issues.apache.org/jira/browse/SPARK-30883
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
> Environment: The java api *setWritable,setReadable and setExecutable* 
> dosen't work well because root can read / write or execute every files. 
> Maybe, we could cancel these tests or fast failure when the mvn test is 
> starting.
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root

2020-02-19 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30883:
---
Environment: The java api *setWritable,setReadable and setExecutable* 
dosen't work well when the user is root. Maybe, we could cancel these tests or 
fast failure when the mvn test is starting.  (was: The java api 
*setWritable,setReadable and setExecutable* dosen't work when the user is root. 
Maybe, we could cancel these tests or fast failure when the mvn test is 
starting.)

> Tests that use setWritable,setReadable and setExecutable should be cancel 
> when user is root
> ---
>
> Key: SPARK-30883
> URL: https://issues.apache.org/jira/browse/SPARK-30883
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
> Environment: The java api *setWritable,setReadable and setExecutable* 
> dosen't work well when the user is root. Maybe, we could cancel these tests 
> or fast failure when the mvn test is starting.
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30883) Tests that use setWritable,setReadable and setExecutable should be cancel when user is root

2020-02-19 Thread deshanxiao (Jira)
deshanxiao created SPARK-30883:
--

 Summary: Tests that use setWritable,setReadable and setExecutable 
should be cancel when user is root
 Key: SPARK-30883
 URL: https://issues.apache.org/jira/browse/SPARK-30883
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.0.0
 Environment: The java api *setWritable,setReadable and setExecutable* 
dosen't work when the user is root. Maybe, we could cancel these tests or fast 
failure when the mvn test is starting.
Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30123) PartitionPruning should consider more case

2019-12-04 Thread deshanxiao (Jira)
deshanxiao created SPARK-30123:
--

 Summary: PartitionPruning should consider more case
 Key: SPARK-30123
 URL: https://issues.apache.org/jira/browse/SPARK-30123
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: deshanxiao


If left has partitionScan and right has PruningFilter but hasBenefit is false. 
The right will never be added a SubQuery.


{code:java}
var partScan = getPartitionTableScan(l, left)
if (partScan.isDefined && canPruneLeft(joinType) &&
hasPartitionPruningFilter(right)) {
  val hasBenefit = pruningHasBenefit(l, partScan.get, r, right)
  newLeft = insertPredicate(l, newLeft, r, right, rightKeys, 
hasBenefit)
} else {
  partScan = getPartitionTableScan(r, right)
  if (partScan.isDefined && canPruneRight(joinType) &&
  hasPartitionPruningFilter(left) ) {
val hasBenefit = pruningHasBenefit(r, partScan.get, l, left)
newRight = insertPredicate(r, newRight, l, left, leftKeys, 
hasBenefit)
  }
}
  case _ =>
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30106) DynamicPartitionPruningSuite#"no predicate on the dimension table" is not be tested

2019-12-02 Thread deshanxiao (Jira)
deshanxiao created SPARK-30106:
--

 Summary: DynamicPartitionPruningSuite#"no predicate on the 
dimension table" is not be tested
 Key: SPARK-30106
 URL: https://issues.apache.org/jira/browse/SPARK-30106
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.0.0
Reporter: deshanxiao


The test "no predicate on the dimension table is not be tested" has no partiton 
key. We can change the sql to test it.

{code:java}
  Given("no predicate on the dimension table")
  withSQLConf(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT * FROM fact_sk f
|JOIN dim_store s
|ON f.date_id = s.store_id
  """.stripMargin)

checkPartitionPruningPredicate(df, false, false)
  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30073) HistoryPage render "count" cost too much time

2019-11-28 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16984786#comment-16984786
 ] 

deshanxiao commented on SPARK-30073:


[~kabhwan]
Sorry, I have changed it to spark2.3.2. Thank you!

> HistoryPage render "count" cost too much time
> -
>
> Key: SPARK-30073
> URL: https://issues.apache.org/jira/browse/SPARK-30073
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
>
> {code:java}
> "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
> nid=0x2c744 runnable [0x7f23775e6000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
> Method)
> at 
> org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
> at scala.collection.AbstractIterator.count(Iterator.scala:1336)
> at 
> org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
> at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
> at org.spark_project.jetty.server.handler.ContextHandler.do
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time

2019-11-28 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30073:
---
Affects Version/s: (was: 3.0.0)
   2.3.2

> HistoryPage render "count" cost too much time
> -
>
> Key: SPARK-30073
> URL: https://issues.apache.org/jira/browse/SPARK-30073
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
>
> {code:java}
> "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
> nid=0x2c744 runnable [0x7f23775e6000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
> Method)
> at 
> org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
> at scala.collection.AbstractIterator.count(Iterator.scala:1336)
> at 
> org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
> at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
> at org.spark_project.jetty.server.handler.ContextHandler.do
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time

2019-11-28 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30073:
---
Description: 
{code:java}
"qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
nid=0x2c744 runnable [0x7f23775e6000]
   java.lang.Thread.State: RUNNABLE
at 
org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
Method)
at 
org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
at 
org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
at 
org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
at scala.collection.AbstractIterator.count(Iterator.scala:1336)
at 
org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at org.spark_project.jetty.server.handler.ContextHandler.do
{code}


> HistoryPage render "count" cost too much time
> -
>
> Key: SPARK-30073
> URL: https://issues.apache.org/jira/browse/SPARK-30073
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> {code:java}
> "qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
> nid=0x2c744 runnable [0x7f23775e6000]
>java.lang.Thread.State: RUNNABLE
> at 
> org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
> Method)
> at 
> org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
> at 
> org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
> at 
> org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
> at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
> at scala.collection.AbstractIterator.count(Iterator.scala:1336)
> at 
> org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
> at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
> at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
> at org.spark_project.jetty.server.handler.ContextHandler.do
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30073) HistoryPage render "count" cost too much time

2019-11-28 Thread deshanxiao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-30073:
---
Environment: (was: 
{code:java}
"qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
nid=0x2c744 runnable [0x7f23775e6000]
   java.lang.Thread.State: RUNNABLE
at 
org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
Method)
at 
org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
at 
org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
at 
org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
at scala.collection.AbstractIterator.count(Iterator.scala:1336)
at 
org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at org.spark_project.jetty.server.handler.ContextHandler.do
{code}
)

> HistoryPage render "count" cost too much time
> -
>
> Key: SPARK-30073
> URL: https://issues.apache.org/jira/browse/SPARK-30073
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30073) HistoryPage render "count" cost too much time

2019-11-28 Thread deshanxiao (Jira)
deshanxiao created SPARK-30073:
--

 Summary: HistoryPage render "count" cost too much time
 Key: SPARK-30073
 URL: https://issues.apache.org/jira/browse/SPARK-30073
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0
 Environment: 
{code:java}
"qtp1010584177-537" #537 daemon prio=5 os_prio=0 tid=0x7f2734185000 
nid=0x2c744 runnable [0x7f23775e6000]
   java.lang.Thread.State: RUNNABLE
at 
org.fusesource.leveldbjni.internal.NativeIterator$IteratorJNI.Prev(Native 
Method)
at 
org.fusesource.leveldbjni.internal.NativeIterator.prev(NativeIterator.java:162)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.peekPrev(JniDBIterator.java:128)
at 
org.fusesource.leveldbjni.internal.JniDBIterator.prev(JniDBIterator.java:144)
at 
org.apache.spark.util.kvstore.LevelDBIterator.loadNext(LevelDBIterator.java:218)
at 
org.apache.spark.util.kvstore.LevelDBIterator.hasNext(LevelDBIterator.java:111)
at 
scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:42)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115)
at scala.collection.AbstractIterator.count(Iterator.scala:1336)
at 
org.apache.spark.deploy.history.HistoryPage.render(HistoryPage.scala:50)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at org.spark_project.jetty.server.handler.ContextHandler.do
{code}

Reporter: deshanxiao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27780) Shuffle server & client should be versioned to enable smoother upgrade

2019-11-27 Thread deshanxiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983536#comment-16983536
 ] 

deshanxiao commented on SPARK-27780:


I can't argee it more. Add shuffle service version is very necessary. 

> Shuffle server & client should be versioned to enable smoother upgrade
> --
>
> Key: SPARK-27780
> URL: https://issues.apache.org/jira/browse/SPARK-27780
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle, Spark Core
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Priority: Major
>
> The external shuffle service is often upgraded at a different time than spark 
> itself.  However, this causes problems when the protocol changes between the 
> shuffle service and the spark runtime -- this forces users to upgrade 
> everything simultaneously.
> We should add versioning to the shuffle client & server, so they know what 
> messages the other will support.  This would allow better handling of mixed 
> versions, from better error msgs to allowing some mismatched versions (with 
> reduced capabilities).
> This originally came up in a discussion here: 
> https://github.com/apache/spark/pull/24565#issuecomment-493496466
> There are a few ways we could do the versioning which we still need to 
> discuss:
> 1) Version specified by config.  This allows for mixed versions across the 
> cluster and rolling upgrades.  It also will let a spark 3.0 client talk to a 
> 2.4 shuffle service.  But, may be a nuisance for users to get this right.
> 2) Auto-detection during registration with local shuffle service.  This makes 
> the versioning easy for the end user, and can even handle a 2.4 shuffle 
> service though it does not support the new versioning.  However, it will not 
> handle a rolling upgrade correctly -- if the local shuffle service has been 
> upgraded, but other nodes in the cluster have not, it will get the version 
> wrong.
> 3) Exchange versions per-connection.  When a connection is opened, the server 
> & client could first exchange messages with their versions, so they know how 
> to continue communication after that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29711) Dynamic adjust spark sql class log level in beeline

2019-11-01 Thread deshanxiao (Jira)
deshanxiao created SPARK-29711:
--

 Summary: Dynamic adjust spark sql class log level in beeline
 Key: SPARK-29711
 URL: https://issues.apache.org/jira/browse/SPARK-29711
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: deshanxiao


We can change the log level in beeline like: set spark.log.level=debug. It will 
not change a lot but useful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only

2019-09-05 Thread deshanxiao (Jira)
deshanxiao created SPARK-28987:
--

 Summary: DiskBlockManager#createTempShuffleBlock should skip 
directory which is read-only
 Key: SPARK-28987
 URL: https://issues.apache.org/jira/browse/SPARK-28987
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: deshanxiao


DiskBlockManager#createTempShuffleBlock only considers the path which is not 
exist. I think we could check whether the path is writeable or not. It's 
resonable beacuse we invoke createTempShuffleBlock to create a new path to 
write files in it. It should be writeable.

stack:
{code:java}
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task 
1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor 
251): java.io.FileNotFoundException: 
/home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74
 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at 
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139)
at 
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150)
at 
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:100)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

{code}




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28944) Expose peak memory of executor in metrics for parameter tuning

2019-09-01 Thread deshanxiao (Jira)
deshanxiao created SPARK-28944:
--

 Summary: Expose peak memory of executor in metrics for parameter 
tuning
 Key: SPARK-28944
 URL: https://issues.apache.org/jira/browse/SPARK-28944
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: deshanxiao


Maybe we can collect the peak of executor memory in heartbeat for parameter 
tuning like spark.executor.memory



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode

2019-08-08 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-28658:
---
Description: 
In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us. Maybe we can use a clearer state not a "success".



  was:
In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us.



>  Yarn FinalStatus is always "success"  in yarn-client mode
> --
>
> Key: SPARK-28658
> URL: https://issues.apache.org/jira/browse/SPARK-28658
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Major
>
> In yarn-client mode,  the finalStatus of application will always be success 
> because the ApplicationMaster returns success when the driver disconnected.
> A simple examle is that:
> {code:java}
> sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
> {code}
> When we run the code in yarn-client mode, the finalStatus will be success. It 
> misleads us. Maybe we can use a clearer state not a "success".



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode

2019-08-08 Thread deshanxiao (JIRA)
deshanxiao created SPARK-28658:
--

 Summary:  Yarn FinalStatus is always "success"  in yarn-client mode
 Key: SPARK-28658
 URL: https://issues.apache.org/jira/browse/SPARK-28658
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: deshanxiao


In yarn-client mode,  the finalStatus of application will always be success 
because the ApplicationMaster returns success when the driver disconnected.

A simple examle is that:


{code:java}
sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect
{code}

When we run the code in yarn-client mode, the finalStatus will be success. It 
misleads us.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27171) Support Full-Partiton limit in the first scan

2019-03-14 Thread deshanxiao (JIRA)
deshanxiao created SPARK-27171:
--

 Summary: Support Full-Partiton limit in the first scan
 Key: SPARK-27171
 URL: https://issues.apache.org/jira/browse/SPARK-27171
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0, 2.3.2
Reporter: deshanxiao


SparkPlan#executeTake must pick element starting with one partition. Sometimes 
it will be slow for some query. Although, Spark is better at batch query. It's 
not bad to add a switch to allow user search all partitons for the first time 
in limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26954) Do not attemp when user code throws exception

2019-02-21 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26954:
--

 Summary: Do not attemp when user code throws exception
 Key: SPARK-26954
 URL: https://issues.apache.org/jira/browse/SPARK-26954
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.4.0, 2.3.3
Reporter: deshanxiao


Yarn attemps the failed App depending on YarnRMClient#unregister. However, some 
attemps are useless:

{code:java}
sc.parallelize(Seq(1,2,3)).map(_ => throw new 
RuntimeException("exception")).collect()
{code}

Some environment errors, such as node dead, attemps reasonablely.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26954) Do not attemp when user code throws exception

2019-02-21 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26954:
---
Description: 
Yarn attemps the failed App depending on YarnRMClient#unregister. However, some 
attemps are useless like the example:

{code:java}
sc.parallelize(Seq(1,2,3)).map(_ => throw new 
RuntimeException("exception")).collect()
{code}

Also some attemps when "FileNorFoundException" is thrown in user code looks 
unreasonable.

Some environment errors, such as node dead, attemps reasonablely. So, it will 
be better that user exception will not attemp.


  was:
Yarn attemps the failed App depending on YarnRMClient#unregister. However, some 
attemps are useless:

{code:java}
sc.parallelize(Seq(1,2,3)).map(_ => throw new 
RuntimeException("exception")).collect()
{code}

Some environment errors, such as node dead, attemps reasonablely. So, it will 
be bettler to at



> Do not attemp when user code throws exception
> -
>
> Key: SPARK-26954
> URL: https://issues.apache.org/jira/browse/SPARK-26954
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.3, 2.4.0
>Reporter: deshanxiao
>Priority: Critical
>
> Yarn attemps the failed App depending on YarnRMClient#unregister. However, 
> some attemps are useless like the example:
> {code:java}
> sc.parallelize(Seq(1,2,3)).map(_ => throw new 
> RuntimeException("exception")).collect()
> {code}
> Also some attemps when "FileNorFoundException" is thrown in user code looks 
> unreasonable.
> Some environment errors, such as node dead, attemps reasonablely. So, it will 
> be better that user exception will not attemp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26954) Do not attemp when user code throws exception

2019-02-21 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26954:
---
Description: 
Yarn attemps the failed App depending on YarnRMClient#unregister. However, some 
attemps are useless:

{code:java}
sc.parallelize(Seq(1,2,3)).map(_ => throw new 
RuntimeException("exception")).collect()
{code}

Some environment errors, such as node dead, attemps reasonablely. So, it will 
be bettler to at


  was:
Yarn attemps the failed App depending on YarnRMClient#unregister. However, some 
attemps are useless:

{code:java}
sc.parallelize(Seq(1,2,3)).map(_ => throw new 
RuntimeException("exception")).collect()
{code}

Some environment errors, such as node dead, attemps reasonablely.



> Do not attemp when user code throws exception
> -
>
> Key: SPARK-26954
> URL: https://issues.apache.org/jira/browse/SPARK-26954
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.3.3, 2.4.0
>Reporter: deshanxiao
>Priority: Critical
>
> Yarn attemps the failed App depending on YarnRMClient#unregister. However, 
> some attemps are useless:
> {code:java}
> sc.parallelize(Seq(1,2,3)).map(_ => throw new 
> RuntimeException("exception")).collect()
> {code}
> Some environment errors, such as node dead, attemps reasonablely. So, it will 
> be bettler to at



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26714) The job whose partiton num is zero not shown in WebUI

2019-01-24 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26714:
--

 Summary: The job whose partiton num is zero not shown in WebUI
 Key: SPARK-26714
 URL: https://issues.apache.org/jira/browse/SPARK-26714
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.0, 2.3.1
Reporter: deshanxiao


When the job's partiton is zero, it will still get a jobid but not shown in 
ui.I think it's strange.

Example:

mkdir /home/test/testdir

sc.textFile("/home/test/testdir")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-09 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739083#comment-16739083
 ] 

deshanxiao commented on SPARK-26570:


[~hyukjin.kwon] OK, I will try it. Thank you!

> Out of memory when InMemoryFileIndex bulkListLeafFiles
> --
>
> Key: SPARK-26570
> URL: https://issues.apache.org/jira/browse/SPARK-26570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The *bulkListLeafFiles* will collect all filestatus in memory for every query 
> which may cause the oom of driver. I use the spark 2.3.2 meeting with the 
> problem. Maybe the latest one also exists the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-08 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26570:
---
Description: The *bulkListLeafFiles* will collect all filestatus in memory 
for every query which may cause the oom of driver. I use the spark 2.3.2 
meeting with the problem. Maybe the latest one also exists the problem.  (was: 
The *bulkListLeafFiles* will collect all filestatus in memory for every query 
which may cause the oom of driver. I use the spark 2.3.2 meeting with the 
problem. Maybe the latest one )

> Out of memory when InMemoryFileIndex bulkListLeafFiles
> --
>
> Key: SPARK-26570
> URL: https://issues.apache.org/jira/browse/SPARK-26570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The *bulkListLeafFiles* will collect all filestatus in memory for every query 
> which may cause the oom of driver. I use the spark 2.3.2 meeting with the 
> problem. Maybe the latest one also exists the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-08 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26570:
---
Description: The *bulkListLeafFiles* will collect all filestatus in memory 
for every query which may cause the oom of driver. I use the spark 2.3.2 
meeting with the problem. Maybe the latest one   (was: The bulkListLeafFiles 
will collect all filestatus in memory for every query which may cause the oom 
of driver.)

> Out of memory when InMemoryFileIndex bulkListLeafFiles
> --
>
> Key: SPARK-26570
> URL: https://issues.apache.org/jira/browse/SPARK-26570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The *bulkListLeafFiles* will collect all filestatus in memory for every query 
> which may cause the oom of driver. I use the spark 2.3.2 meeting with the 
> problem. Maybe the latest one 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-08 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737061#comment-16737061
 ] 

deshanxiao commented on SPARK-26570:


 !screenshot-1.png! 

> Out of memory when InMemoryFileIndex bulkListLeafFiles
> --
>
> Key: SPARK-26570
> URL: https://issues.apache.org/jira/browse/SPARK-26570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The bulkListLeafFiles will collect all filestatus in memory for every query 
> which may cause the oom of driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-08 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26570:
---
Attachment: screenshot-1.png

> Out of memory when InMemoryFileIndex bulkListLeafFiles
> --
>
> Key: SPARK-26570
> URL: https://issues.apache.org/jira/browse/SPARK-26570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: deshanxiao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The bulkListLeafFiles will collect all filestatus in memory for every query 
> which may cause the oom of driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26570) Out of memory when InMemoryFileIndex bulkListLeafFiles

2019-01-08 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26570:
--

 Summary: Out of memory when InMemoryFileIndex bulkListLeafFiles
 Key: SPARK-26570
 URL: https://issues.apache.org/jira/browse/SPARK-26570
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.2
Reporter: deshanxiao


The bulkListLeafFiles will collect all filestatus in memory for every query 
which may cause the oom of driver.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab

2019-01-07 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735707#comment-16735707
 ] 

deshanxiao commented on SPARK-26457:


[~planga82]
Hi, thanks for your reply! I know that yarn provided all hadoop configurations. 
But I guess it may be fine that the historyserver unify all configuration in 
it. I care the case where different hadoop version may have  different behavior 
or some configurations need a special hadoop version. It will be convenient for 
us to debug some problems.

Thanks a lot!

> Show hadoop configurations in HistoryServer environment tab
> ---
>
> Key: SPARK-26457
> URL: https://issues.apache.org/jira/browse/SPARK-26457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.2, 2.4.0
> Environment: Maybe it is good to show some configurations in 
> HistoryServer environment tab for debugging some bugs about hadoop
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26528) FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" property

2019-01-03 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26528:
---
Priority: Minor  (was: Major)

> FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" 
> property 
> -
>
> Key: SPARK-26528
> URL: https://issues.apache.org/jira/browse/SPARK-26528
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: deshanxiao
>Priority: Minor
>
> Running the FsHistoryProviderSuite in idea failled because the property 
> "spark.testing" not exist.In this situation, replay executor may replay a 
> file twice.
> {code:java}
>   private val replayExecutor: ExecutorService = {
> if (!conf.contains("spark.testing")) {
>   ThreadUtils.newDaemonFixedThreadPool(NUM_PROCESSING_THREADS, 
> "log-replay-executor")
> } else {
>   MoreExecutors.sameThreadExecutor()
> }
>   }
> {code}
> {code:java}
> "SPARK-3697: ignore files that cannot be read."
> 2 was not equal to 1
> ScalaTestFailureLocation: 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12 at 
> (FsHistoryProviderSuite.scala:179)
> Expected :1
> Actual   :2
>  
> org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:179)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$run(FsHistoryProviderSuite.scala:51)
>   at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:258)
>   at 
> 

[jira] [Created] (SPARK-26528) FsHistoryProviderSuite failed in IDEA because not exist "spark.testing" property

2019-01-03 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26528:
--

 Summary: FsHistoryProviderSuite failed in IDEA because not exist 
"spark.testing" property 
 Key: SPARK-26528
 URL: https://issues.apache.org/jira/browse/SPARK-26528
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0, 2.3.2
Reporter: deshanxiao


Running the FsHistoryProviderSuite in idea failled because the property 
"spark.testing" not exist.In this situation, replay executor may replay a file 
twice.


{code:java}
  private val replayExecutor: ExecutorService = {
if (!conf.contains("spark.testing")) {
  ThreadUtils.newDaemonFixedThreadPool(NUM_PROCESSING_THREADS, 
"log-replay-executor")
} else {
  MoreExecutors.sameThreadExecutor()
}
  }
{code}


{code:java}
"SPARK-3697: ignore files that cannot be read."

2 was not equal to 1
ScalaTestFailureLocation: 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12 at 
(FsHistoryProviderSuite.scala:179)
Expected :1
Actual   :2
 

org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
at 
org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
at 
org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:179)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
at org.scalatest.Suite$class.run(Suite.scala:1147)
at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$run(FsHistoryProviderSuite.scala:51)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:258)
at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.run(FsHistoryProviderSuite.scala:51)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
at 

[jira] [Created] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab

2018-12-27 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26457:
--

 Summary: Show hadoop configurations in HistoryServer environment 
tab
 Key: SPARK-26457
 URL: https://issues.apache.org/jira/browse/SPARK-26457
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core, Web UI
Affects Versions: 2.4.0, 2.3.2
 Environment: Maybe it is good to show some configurations in 
HistoryServer environment tab for debugging some bugs about hadoop
Reporter: deshanxiao






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat

2018-12-11 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718385#comment-16718385
 ] 

deshanxiao commented on SPARK-26333:


[~vanzin] Yes, you are right! Thank you very much! But why setReadable doesn't 
work as root? 

> FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
> 
>
> Key: SPARK-26333
> URL: https://issues.apache.org/jira/browse/SPARK-26333
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: deshanxiao
>Priority: Major
>
> FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot 
> be read.". I try to invoke logFile2.canRead after invoking 
> "setReadable(false, false)" . And I find that the result of 
> "logFile2.canRead" is true but in my ubuntu16.04 return false.
> The environment:
> RedHat:
> Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 
> 22:26:13 UTC 2017
> JDK
> Java version: 1.8.0_151, vendor: Oracle Corporation
> {code:java}
>  org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat

2018-12-11 Thread deshanxiao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

deshanxiao updated SPARK-26333:
---
Comment: was deleted

(was: [~vanzin] No, I am not running as root.)

> FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
> 
>
> Key: SPARK-26333
> URL: https://issues.apache.org/jira/browse/SPARK-26333
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: deshanxiao
>Priority: Major
>
> FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot 
> be read.". I try to invoke logFile2.canRead after invoking 
> "setReadable(false, false)" . And I find that the result of 
> "logFile2.canRead" is true but in my ubuntu16.04 return false.
> The environment:
> RedHat:
> Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 
> 22:26:13 UTC 2017
> JDK
> Java version: 1.8.0_151, vendor: Oracle Corporation
> {code:java}
>  org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat

2018-12-11 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718337#comment-16718337
 ] 

deshanxiao commented on SPARK-26333:


[~vanzin] No, I am not running as root.

> FsHistoryProviderSuite failed because setReadable doesn't work in RedHat
> 
>
> Key: SPARK-26333
> URL: https://issues.apache.org/jira/browse/SPARK-26333
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: deshanxiao
>Priority: Major
>
> FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot 
> be read.". I try to invoke logFile2.canRead after invoking 
> "setReadable(false, false)" . And I find that the result of 
> "logFile2.canRead" is true but in my ubuntu16.04 return false.
> The environment:
> RedHat:
> Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) 
> (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 
> 22:26:13 UTC 2017
> JDK
> Java version: 1.8.0_151, vendor: Oracle Corporation
> {code:java}
>  org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
>   at 
> org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
>   at 
> org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
>   at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
>   at 
> org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26333) FsHistoryProviderSuite failed because setReadable doesn't work in RedHat

2018-12-10 Thread deshanxiao (JIRA)
deshanxiao created SPARK-26333:
--

 Summary: FsHistoryProviderSuite failed because setReadable doesn't 
work in RedHat
 Key: SPARK-26333
 URL: https://issues.apache.org/jira/browse/SPARK-26333
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: deshanxiao


FsHistoryProviderSuite failed in case "SPARK-3697: ignore files that cannot be 
read.". I try to invoke logFile2.canRead after invoking "setReadable(false, 
false)" . And I find that the result of "logFile2.canRead" is true but in my 
ubuntu16.04 return false.

The environment:

RedHat:
Linux version 3.10.0-693.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc 
version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Sep 12 22:26:13 
UTC 2017

JDK
Java version: 1.8.0_151, vendor: Oracle Corporation

{code:java}
 org.scalatest.exceptions.TestFailedException: 2 was not equal to 1
  at org.scalatest.MatchersHelper$.indicateFailure(MatchersHelper.scala:340)
  at 
org.scalatest.Matchers$ShouldMethodHelper$.shouldMatcher(Matchers.scala:6668)
  at org.scalatest.Matchers$AnyShouldWrapper.should(Matchers.scala:6704)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:183)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12$$anonfun$apply$7.apply(FsHistoryProviderSuite.scala:182)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.org$apache$spark$deploy$history$FsHistoryProviderSuite$$updateAndCheck(FsHistoryProviderSuite.scala:841)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:182)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite$$anonfun$12.apply(FsHistoryProviderSuite.scala:148)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
  at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
  at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.org$scalatest$BeforeAndAfter$$super$runTest(FsHistoryProviderSuite.scala:51)
  at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:203)
  at 
org.apache.spark.deploy.history.FsHistoryProviderSuite.runTest(FsHistoryProviderSuite.scala:51)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
  at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
  at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
  at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25120) EventLogListener may miss driver SparkListenerBlockManagerAdded event

2018-08-14 Thread deshanxiao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16580670#comment-16580670
 ] 

deshanxiao commented on SPARK-25120:


Sure, I find the tab "Executors" in HistorySever sometimes miss the info of 
driver in executor-id colunm, it isn't convenient when we analysis the problem 
of driver. [~hyukjin.kwon]

> EventLogListener may miss driver SparkListenerBlockManagerAdded event 
> --
>
> Key: SPARK-25120
> URL: https://issues.apache.org/jira/browse/SPARK-25120
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: deshanxiao
>Priority: Major
>
> Sometimes in spark history tab "Executors" , it couldn't find driver 
> information because the event of SparkListenerBlockManagerAdded is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25120) EventLogListener may miss driver SparkListenerBlockManagerAdded event

2018-08-14 Thread deshanxiao (JIRA)
deshanxiao created SPARK-25120:
--

 Summary: EventLogListener may miss driver 
SparkListenerBlockManagerAdded event 
 Key: SPARK-25120
 URL: https://issues.apache.org/jira/browse/SPARK-25120
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: deshanxiao


Sometimes in spark history tab "Executors" , it couldn't find driver 
information because the event of SparkListenerBlockManagerAdded is lost.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25100) Using KryoSerializer and setting registrationRequired true can lead job failed

2018-08-13 Thread deshanxiao (JIRA)
deshanxiao created SPARK-25100:
--

 Summary: Using KryoSerializer and setting registrationRequired 
true can lead job failed
 Key: SPARK-25100
 URL: https://issues.apache.org/jira/browse/SPARK-25100
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.1
Reporter: deshanxiao


When spark.serializer is org.apache.spark.serializer.KryoSerializer and  
spark.kryo.registrationRequired is true in SparkCOnf. I invoked  
saveAsNewAPIHadoopDataset to store data in hdfs. The job will fail because the 
class TaskCommitMessage hasn't be registered.

 
{code:java}
java.lang.IllegalArgumentException: Class is not registered: 
org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage
Note: To register this class use: 
kryo.register(org.apache.spark.internal.io.FileCommitProtocol$TaskCommitMessage.class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
at com.twitter.chill.KryoBase.getRegistration(KryoBase.scala:52)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
at 
org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:347)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org