from:"shahid \(JIRA\)"

[jira] [Updated] (SPARK-35746) Task id in the Stage page timeline is incorrect

2021-06-11 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35746:
---
Attachment: image-2021-06-12-07-03-09-808.png

> Task id in the Stage page timeline is incorrect
> ---
>
> Key: SPARK-35746
> URL: https://issues.apache.org/jira/browse/SPARK-35746
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.2
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-06-12-07-03-09-808.png
>
>
> !image-2021-06-12-06-56-21-486.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35746) Task id in the Stage page timeline is incorrect

2021-06-11 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35746:
---
Description: !image-2021-06-12-07-03-09-808.png!  (was: 
!image-2021-06-12-06-56-21-486.png!)

> Task id in the Stage page timeline is incorrect
> ---
>
> Key: SPARK-35746
> URL: https://issues.apache.org/jira/browse/SPARK-35746
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0, 3.1.2
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-06-12-07-03-09-808.png
>
>
> !image-2021-06-12-07-03-09-808.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35746) Task id in the Stage page timeline is incorrect

2021-06-11 Thread shahid (Jira)

shahid created SPARK-35746:
--

 Summary: Task id in the Stage page timeline is incorrect
 Key: SPARK-35746
 URL: https://issues.apache.org/jira/browse/SPARK-35746
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.1.2, 3.0.0
Reporter: shahid


!image-2021-06-12-06-56-21-486.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35423) The output of PCA is inconsistent

2021-05-31 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354810#comment-17354810
 ] 

shahid commented on SPARK-35423:


I would like to analyse this issue

> The output of PCA is inconsistent
> -
>
> Key: SPARK-35423
> URL: https://issues.apache.org/jira/browse/SPARK-35423
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 3.1.1
> Environment: Spark Version： 3.1.1 
>Reporter: cqfrog
>Priority: Major
>
> 1. The example from doc
>  
> {code:java}
> import org.apache.spark.ml.feature.PCA
> import org.apache.spark.ml.linalg.Vectors
> val data = Array(
>   Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
>   Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
>   Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
> )
> val df = spark.createDataFrame(data.map(Tuple1.apply)).toDF("features")
> val pca = new PCA()
>   .setInputCol("features")
>   .setOutputCol("pcaFeatures")
>   .setK(3)
>   .fit(df)
> val result = pca.transform(df).select("pcaFeatures")
> result.show(false)
> {code}
>  
>  
> the output show:
> {code:java}
> +---+
> |pcaFeatures|
> +---+
> |[1.6485728230883807,-4.013282700516296,-5.524543751369388] |
> |[-4.645104331781534,-1.1167972663619026,-5.524543751369387]|
> |[-6.428880535676489,-5.337951427775355,-5.524543751369389] |
> +---+
> {code}
> 2. change the Vector format
> I modified the code from "Vectors.sparse(5, Seq((1, 1.0), (3, 7.0)))" to 
> "Vectors.dense(0.0,1.0,0.0,7.0,0.0)" 。
> but the output show：
> {code:java}
> ++
> |pcaFeatures |
> ++
> |[1.6485728230883814,-4.0132827005162985,-1.0091435193998504]|
> |[-4.645104331781533,-1.1167972663619048,-1.0091435193998501]|
> |[-6.428880535676488,-5.337951427775359,-1.009143519399851]  |
> ++
> {code}
> It's strange that the two outputs are inconsistent. Why?
> Thanks.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35567) Explain cost is not showing statistics for all the nodes

2021-05-30 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35567:
---
Description: 
Explain cost command doesn't show statistics for all the nodes in most of the 
TPCDS queries

For eg: Query1

!image-2021-05-31-05-09-09-637.png!

> Explain cost is not showing statistics for all the nodes
> 
>
> Key: SPARK-35567
> URL: https://issues.apache.org/jira/browse/SPARK-35567
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0, 3.1.2
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-31-05-09-09-637.png
>
>
> Explain cost command doesn't show statistics for all the nodes in most of the 
> TPCDS queries
> For eg: Query1
> !image-2021-05-31-05-09-09-637.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35567) Explain cost is not showing statistics for all the nodes

2021-05-30 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35567:
---
Attachment: image-2021-05-31-05-09-09-637.png

> Explain cost is not showing statistics for all the nodes
> 
>
> Key: SPARK-35567
> URL: https://issues.apache.org/jira/browse/SPARK-35567
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 3.0.0, 3.1.2
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-31-05-09-09-637.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35567) Explain cost is not showing statistics for all the nodes

2021-05-30 Thread shahid (Jira)

shahid created SPARK-35567:
--

 Summary: Explain cost is not showing statistics for all the nodes
 Key: SPARK-35567
 URL: https://issues.apache.org/jira/browse/SPARK-35567
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, SQL
Affects Versions: 3.1.2, 3.0.0
Reporter: shahid






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22639) no rowcount estimation returned if groupby clause involves substring

2021-05-25 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-22639:
---
Labels:   (was: bulk-closed)

> no rowcount estimation returned if groupby clause involves substring
> 
>
> Key: SPARK-22639
> URL: https://issues.apache.org/jira/browse/SPARK-22639
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.2.0, 3.1.1
>Reporter: ey-chih chow
>Priority: Major
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> CBO can not estimate rowcount if the groupby clause of a query involves the 
> expression substring.  For example, we can not estimate the row count of the 
> following query, extracted from TPC-DS queries and based on the TPC-DS schema:
> SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, 
> sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`   
> FROM  store_sales  INNER JOIN date_dim ON (date_dim.`d_date_sk` = 
> store_sales.`ss_sold_date_sk`)  INNER JOIN item ON (store_sales.`ss_item_sk` 
> = item.`i_item_sk`)  
> GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, 
> 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22639) no rowcount estimation returned if groupby clause involves substring

2021-05-25 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-22639:
---
Affects Version/s: 3.1.1

> no rowcount estimation returned if groupby clause involves substring
> 
>
> Key: SPARK-22639
> URL: https://issues.apache.org/jira/browse/SPARK-22639
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.2.0, 3.1.1
>Reporter: ey-chih chow
>Priority: Major
>  Labels: bulk-closed
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> CBO can not estimate rowcount if the groupby clause of a query involves the 
> expression substring.  For example, we can not estimate the row count of the 
> following query, extracted from TPC-DS queries and based on the TPC-DS schema:
> SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, 
> sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`   
> FROM  store_sales  INNER JOIN date_dim ON (date_dim.`d_date_sk` = 
> store_sales.`ss_sold_date_sk`)  INNER JOIN item ON (store_sales.`ss_item_sk` 
> = item.`i_item_sk`)  
> GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, 
> 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-22639) no rowcount estimation returned if groupby clause involves substring

2021-05-25 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid reopened SPARK-22639:


This is happening with latest master as well

> no rowcount estimation returned if groupby clause involves substring
> 
>
> Key: SPARK-22639
> URL: https://issues.apache.org/jira/browse/SPARK-22639
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.2.0
>Reporter: ey-chih chow
>Priority: Major
>  Labels: bulk-closed
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> CBO can not estimate rowcount if the groupby clause of a query involves the 
> expression substring.  For example, we can not estimate the row count of the 
> following query, extracted from TPC-DS queries and based on the TPC-DS schema:
> SELECT item.`i_brand`, count(1), date_dim.`d_year`, item.`i_brand_id`, 
> sum(store_sales.`ss_ext_sales_price`) AS `ext_price`, item.`i_item_sk`   
> FROM  store_sales  INNER JOIN date_dim ON (date_dim.`d_date_sk` = 
> store_sales.`ss_sold_date_sk`)  INNER JOIN item ON (store_sales.`ss_item_sk` 
> = item.`i_item_sk`)  
> GROUP BY item.`i_brand`, date_dim.`d_date`, substring(item.`i_item_desc`, 1, 
> 30), date_dim.`d_year`, item.`i_brand_id`, item.`i_item_sk`
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid resolved SPARK-35409.

Resolution: Won't Fix

> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
>
> scala> sql("explain cost select * from store_sales").show(false)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35409:
---
Attachment: (was: image-2021-05-15-13-20-39-138.png)

> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
>
> scala> sql("explain cost select * from store_sales").show(false)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35409:
---
Description: 
scala> sql("explain cost select * from store_sales").show(false)

 

 

 

 

 

  was:
scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-20-39-138.png!

 

 

 

 


> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-15-13-20-39-138.png
>
>
> scala> sql("explain cost select * from store_sales").show(false)
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35409:
---
Attachment: image-2021-05-15-13-20-39-138.png

> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-15-13-20-39-138.png
>
>
> scala> sql("explain cost select * from store_sales").show(false)
> !image-2021-05-15-13-15-49-161.png!
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35409:
---
Description: 
scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-15-49-161.png!

 

 

 

 

  was:
scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-15-49-161.png!

 

BBefore:

 

 


> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-15-13-20-39-138.png
>
>
> scala> sql("explain cost select * from store_sales").show(false)
> !image-2021-05-15-13-15-49-161.png!
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)

shahid created SPARK-35409:
--

 Summary: Explain spark sql plan command using show is difficult to 
read
 Key: SPARK-35409
 URL: https://issues.apache.org/jira/browse/SPARK-35409
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: shahid
 Attachments: image-2021-05-15-13-20-39-138.png

scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-15-49-161.png!

 

BBefore:

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35409) Explain spark sql plan command using show is difficult to read

2021-05-15 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-35409:
---
Description: 
scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-20-39-138.png!

 

 

 

 

  was:
scala> sql("explain cost select * from store_sales").show(false)

!image-2021-05-15-13-15-49-161.png!

 

 

 

 


> Explain spark sql plan command using show is difficult to read
> --
>
> Key: SPARK-35409
> URL: https://issues.apache.org/jira/browse/SPARK-35409
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: shahid
>Priority: Minor
> Attachments: image-2021-05-15-13-20-39-138.png
>
>
> scala> sql("explain cost select * from store_sales").show(false)
> !image-2021-05-15-13-20-39-138.png!
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35368) [SQL]Update histogram statistics for RANGE operator stats estimation

2021-05-11 Thread shahid (Jira)

shahid created SPARK-35368:
--

 Summary: [SQL]Update histogram statistics for RANGE operator stats 
estimation
 Key: SPARK-35368
 URL: https://issues.apache.org/jira/browse/SPARK-35368
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.1.0
Reporter: shahid






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-35079) Transform with udf gives incorrect result

2021-05-10 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342171#comment-17342171
 ] 

shahid edited comment on SPARK-35079 at 5/10/21, 10:29 PM:
---

Seems It is not reproducible with master branch?
`

{code:java}
+-+
|transform(value, lambdafunction(UDF(lambda x_0#3993), namedlambdavariable()))|
+-+
|[a, b, c]|
+-+


{code}

`




was (Author: shahid):
Seems It is not reproducible with master branch?
+-+
|transform(value, lambdafunction(UDF(lambda x_0#3993), namedlambdavariable()))|
+-+
|[a, b, c]|
+-+



> Transform with udf gives incorrect result
> -
>
> Key: SPARK-35079
> URL: https://issues.apache.org/jira/browse/SPARK-35079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: koert kuipers
>Priority: Minor
>
> i think this is a correctness bug in spark 3.1.1
> the behavior is correct in spark 3.0.1
> in spark 3.0.1:
> {code:java}
> scala> import spark.implicits._
> scala> import org.apache.spark.sql.functions._
> scala> val x = Seq(Seq("aa", "bb", "cc")).toDF
> x: org.apache.spark.sql.DataFrame = [value: array]
> scala> x.select(transform(col("value"), col => udf((_: 
> String).drop(1)).apply(col))).show
> +---+
> |transform(value, lambdafunction(UDF(lambda 'x), x))|
> +---+
> |  [a, b, c]|
> +---+
> {code}
> in spark 3.1.1:
> {code:java}
> scala> import spark.implicits._
> scala> import org.apache.spark.sql.functions._
> scala> val x = Seq(Seq("aa", "bb", "cc")).toDF
> x: org.apache.spark.sql.DataFrame = [value: array]
> scala> x.select(transform(col("value"), col => udf((_: 
> String).drop(1)).apply(col))).show
> +---+
> |transform(value, lambdafunction(UDF(lambda 'x), x))|
> +---+
> |  [c, c, c]|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35079) Transform with udf gives incorrect result

2021-05-10 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17342171#comment-17342171
 ] 

shahid commented on SPARK-35079:


Seems It is not reproducible with master branch?
+-+
|transform(value, lambdafunction(UDF(lambda x_0#3993), namedlambdavariable()))|
+-+
|[a, b, c]|
+-+



> Transform with udf gives incorrect result
> -
>
> Key: SPARK-35079
> URL: https://issues.apache.org/jira/browse/SPARK-35079
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: koert kuipers
>Priority: Minor
>
> i think this is a correctness bug in spark 3.1.1
> the behavior is correct in spark 3.0.1
> in spark 3.0.1:
> {code:java}
> scala> import spark.implicits._
> scala> import org.apache.spark.sql.functions._
> scala> val x = Seq(Seq("aa", "bb", "cc")).toDF
> x: org.apache.spark.sql.DataFrame = [value: array]
> scala> x.select(transform(col("value"), col => udf((_: 
> String).drop(1)).apply(col))).show
> +---+
> |transform(value, lambdafunction(UDF(lambda 'x), x))|
> +---+
> |  [a, b, c]|
> +---+
> {code}
> in spark 3.1.1:
> {code:java}
> scala> import spark.implicits._
> scala> import org.apache.spark.sql.functions._
> scala> val x = Seq(Seq("aa", "bb", "cc")).toDF
> x: org.apache.spark.sql.DataFrame = [value: array]
> scala> x.select(transform(col("value"), col => udf((_: 
> String).drop(1)).apply(col))).show
> +---+
> |transform(value, lambdafunction(UDF(lambda 'x), x))|
> +---+
> |  [c, c, c]|
> +---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35362) Update null count in the column stats for UNION stats estimation

2021-05-10 Thread shahid (Jira)

shahid created SPARK-35362:
--

 Summary: Update null count in the column stats for UNION stats 
estimation
 Key: SPARK-35362
 URL: https://issues.apache.org/jira/browse/SPARK-35362
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 3.0.2
Reporter: shahid






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30092) Number of active tasks is negative in Live UI Executors page

2019-12-04 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987799#comment-16987799
 ] 

shahid commented on SPARK-30092:


[~zhongyu09]Do you have any steps for reproducing the issue?

> Number of active tasks is negative in Live UI Executors page
> 
>
> Key: SPARK-30092
> URL: https://issues.apache.org/jira/browse/SPARK-30092
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.1
> Environment: Hadoop version: 2.7.3
> ResourceManager version: 2.7.3
>Reporter: ZhongYu
>Priority: Major
> Attachments: wx20191202-102...@2x.png
>
>
> The number of active tasks is negative in Live UI Executors page when there 
> is executor lost and task failure. I am using spark on yarn which built on 
> AWS spot instances. When yarn work lost, there is a large probability to 
> become negative active tasks in Spark Live UI.  
> I saw related tickets below and resolved in earlier version of Spark. But 
> Same things happened again in Spark 2.4.1. See attachment.
> https://issues.apache.org/jira/browse/SPARK-8560
> https://issues.apache.org/jira/browse/SPARK-10141
> https://issues.apache.org/jira/browse/SPARK-19356



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30068) Flaky test: org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite

2019-12-01 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid resolved SPARK-30068.

Resolution: Done

> Flaky test: org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite
> ---
>
> Key: SPARK-30068
> URL: https://issues.apache.org/jira/browse/SPARK-30068
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> [info] ThriftServerPageSuite:
> [info] - thriftserver page should load successfully *** FAILED *** (728 
> milliseconds)
> [info]   org.mockito.exceptions.base.MockitoException: ClassCastException 
> occurred while creating the mockito mock :
> [info]   class to mock : 'javax.servlet.http.HttpServletRequest', loaded by 
> classloader : 'sun.misc.Launcher$AppClassLoader@5479e3f'
> [info]   created class : 
> 'org.mockito.codegen.HttpServletRequest$MockitoMock$1557201756', loaded by 
> classloader : 
> 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@2693a1cb'
> [info]   proxy instance class : 
> 'org.mockito.codegen.HttpServletRequest$MockitoMock$1557201756', loaded by 
> classloader : 
> 'net.bytebuddy.dynamic.loading.MultipleParentClassLoader@2693a1cb'
> [info]   instance creation by : ObjenesisInstantiator
> [info] 
> [info] You might experience classloading issues, please ask the mockito 
> mailing-list.
> [info]   at 
> org.apache.spark.sql.hive.thriftserver.ui.ThriftServerPageSuite.$anonfun$new$1(ThriftServerPageSuite.scala:50)
> {noformat}
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/114566/testReport/]
> [https://github.com/spark-thriftserver/spark/runs/292406745]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30092) Number of active tasks is negative in UI Executors page

2019-12-01 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985805#comment-16985805
 ] 

shahid commented on SPARK-30092:


Is it Live UI or History UI? Also could you please check if any event drops 
happened or not?

> Number of active tasks is negative in UI Executors page
> ---
>
> Key: SPARK-30092
> URL: https://issues.apache.org/jira/browse/SPARK-30092
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.1
> Environment: Hadoop version: 2.7.3
> ResourceManager version: 2.7.3
>Reporter: ZhongYu
>Priority: Major
> Attachments: wx20191202-102...@2x.png
>
>
> The number of active tasks is negative in UI Executors page when there is 
> executor lost and task failure. I am using spark on yarn which built on AWS 
> spot instances. When yarn work lost, there is a large probability to become 
> negative active tasks in Spark UI.  
> I saw related tickets below and resolved in earlier version of Spark. But 
> Same things happened again in Spark 2.4.1. See attachment.
> https://issues.apache.org/jira/browse/SPARK-8560
> https://issues.apache.org/jira/browse/SPARK-10141
> https://issues.apache.org/jira/browse/SPARK-19356



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29902) Add listener event queue capacity configuration to documentation

2019-11-14 Thread shahid (Jira)

shahid created SPARK-29902:
--

 Summary: Add listener event queue capacity configuration to 
documentation
 Key: SPARK-29902
 URL: https://issues.apache.org/jira/browse/SPARK-29902
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.0.0
Reporter: shahid


Add listener event queue capacity configuration to documentation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29827) Wrong persist strategy in mllib.clustering.BisectingKMeans.run

2019-11-10 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971336#comment-16971336
 ] 

shahid commented on SPARK-29827:


I would like to analyze this issue

> Wrong persist strategy in mllib.clustering.BisectingKMeans.run
> --
>
> Key: SPARK-29827
> URL: https://issues.apache.org/jira/browse/SPARK-29827
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.4.3
>Reporter: Dong Wang
>Priority: Major
>
> There are three persist misuses in mllib.clustering.BisectingKMeans.run.
>  * First, the rdd {color:#de350b}_input_{color} should be persisted, because 
> it was not only used by the action _first(),_ but also used by other __ 
> actions in the following code.
>  * Second, the rdd {color:#de350b}_assignments_{color} should be persisted. 
> It was used in the fuction _summarize()_ more than once, which containts an 
> action on _assignments_.
>  * Third, once the rdd _{color:#de350b}assignments{color}_ is persisted_,_ 
> persisting the rdd {color:#de350b}_norms_{color} would be unnecessary. 
> Because {color:#de350b}_norms_ {color} is an intermediate rdd. Since its 
> child rdd {color:#de350b}_assignments_{color} is persisted, it is unnecessary 
> to persist {color:#de350b}_norms_{color} anymore.
> {code:scala}
>   private[spark] def run(
>   input: RDD[Vector],
>   instr: Option[Instrumentation]): BisectingKMeansModel = {
> if (input.getStorageLevel == StorageLevel.NONE) {
>   logWarning(s"The input RDD ${input.id} is not directly cached, which 
> may hurt performance if"
> + " its parent RDDs are also not cached.")
> }
> // Needs to persist input
> val d = input.map(_.size).first() 
> logInfo(s"Feature dimension: $d.")
> val dMeasure: DistanceMeasure = 
> DistanceMeasure.decodeFromString(this.distanceMeasure)
> // Compute and cache vector norms for fast distance computation.
> val norms = input.map(v => Vectors.norm(v, 
> 2.0)).persist(StorageLevel.MEMORY_AND_DISK)  // Unnecessary persist
> val vectors = input.zip(norms).map { case (x, norm) => new 
> VectorWithNorm(x, norm) }
> var assignments = vectors.map(v => (ROOT_INDEX, v))  // Needs to persist
> var activeClusters = summarize(d, assignments, dMeasure)
> {code}
> This issue is reported by our tool CacheCheck, which is used to dynamically 
> detecting persist()/unpersist() api misuses.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968760#comment-16968760
 ] 

shahid commented on SPARK-29765:


Event drop can also happen that if we don't provide sufficient memory. That 
case increasing the configs won't help I think. 

If the event drop happens the entire UI behaves weirdly.

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>

[jira] [Commented] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968737#comment-16968737
 ] 

shahid commented on SPARK-29765:


Thanks. Yes, you can try increasing 
`spark.scheduler.listenerbus.eventqueue.size`. 

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian Jira

[jira] [Commented] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968707#comment-16968707
 ] 

shahid commented on SPARK-29765:


Yes, then it seems some event drop has happened. Could you check the logs 
related to the event drop? To prevent this, I think you can increase the Event 
queue capacity which is by default 1000 I think

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
>

[jira] [Commented] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968652#comment-16968652
 ] 

shahid commented on SPARK-29765:


I still not sure about the root cause, as I am not able to reproduce with small 
data. From the number I can see that it is something related to cleaning up the 
store, when the number of tasks exceeds the threshold. If you can still 
reproduce with the same data even after increasing the threshold, then it might 
be due to some other issue.

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
>

[jira] [Comment Edited] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968642#comment-16968642
 ] 

shahid edited comment on SPARK-29765 at 11/6/19 7:39 PM:
-

Just to narrow down the problem, can you please increase 
`spark.ui.retainedTasks` from 10 to 20 and check if the issue still 
exist ? Thanks


was (Author: shahid):
Just to narrow down the problem, can you please increase 
`spark.ui.retainedTasks` from 10 to 20 ? Thanks

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
>

[jira] [Commented] (SPARK-29765) Monitoring UI throws IndexOutOfBoundsException when accessing metrics of attempt in stage

2019-11-06 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968642#comment-16968642
 ] 

shahid commented on SPARK-29765:


Just to narrow down the problem, can you please increase 
`spark.ui.retainedTasks` from 10 to 20 ? Thanks

> Monitoring UI throws IndexOutOfBoundsException when accessing metrics of 
> attempt in stage
> -
>
> Key: SPARK-29765
> URL: https://issues.apache.org/jira/browse/SPARK-29765
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: Amazon EMR 5.27
>Reporter: Viacheslav Tradunsky
>Priority: Major
>
> When clicking on one of the largest tasks by input, I get to 
> [http://:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0|http://10.207.110.207:20888/proxy/application_1572992299050_0001/stages/stage/?id=74=0]
>  with 500 error
> {code:java}
> java.lang.IndexOutOfBoundsException: 95745 at 
> scala.collection.immutable.Vector.checkRangeConvert(Vector.scala:132) at 
> scala.collection.immutable.Vector.apply(Vector.scala:122) at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply$mcDJ$sp(AppStatusStore.scala:255)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> org.apache.spark.status.AppStatusStore$$anonfun$scanTasks$1$1.apply(AppStatusStore.scala:254)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>  at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:246) at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
> scala.collection.mutable.ArrayOps$ofLong.map(ArrayOps.scala:246) at 
> org.apache.spark.status.AppStatusStore.scanTasks$1(AppStatusStore.scala:254) 
> at 
> org.apache.spark.status.AppStatusStore.taskSummary(AppStatusStore.scala:287) 
> at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:321) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748){code}



--
This message was

[jira] [Updated] (SPARK-29588) Improvements in WebUI JDBC/ODBC server page

2019-11-06 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-29588:
---
Priority: Major  (was: Minor)

> Improvements in WebUI JDBC/ODBC server page
> ---
>
> Key: SPARK-29588
> URL: https://issues.apache.org/jira/browse/SPARK-29588
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Major
>
> Improvements in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener

2019-11-02 Thread shahid (Jira)

shahid created SPARK-29726:
--

 Summary: Support KV store for listener HiveThriftServer2Listener
 Key: SPARK-29726
 URL: https://issues.apache.org/jira/browse/SPARK-29726
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.0.0
Reporter: shahid


Support KVstore for HiveThriftServer2Listener



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener

2019-11-02 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965409#comment-16965409
 ] 

shahid commented on SPARK-29726:


I will raise a PR

> Support KV store for listener HiveThriftServer2Listener
> ---
>
> Key: SPARK-29726
> URL: https://issues.apache.org/jira/browse/SPARK-29726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support KVstore for HiveThriftServer2Listener



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29725) Add UT for WebUI page for JDBC/ODBC tab

2019-11-02 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965408#comment-16965408
 ] 

shahid commented on SPARK-29725:


I will raise a PR

> Add UT for WebUI page for JDBC/ODBC tab
> ---
>
> Key: SPARK-29725
> URL: https://issues.apache.org/jira/browse/SPARK-29725
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Add UT for WebUI page for JDBC/ODBC tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29725) Add UT for WebUI page for JDBC/ODBC tab

2019-11-02 Thread shahid (Jira)

shahid created SPARK-29725:
--

 Summary: Add UT for WebUI page for JDBC/ODBC tab
 Key: SPARK-29725
 URL: https://issues.apache.org/jira/browse/SPARK-29725
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.0.0
Reporter: shahid


Add UT for WebUI page for JDBC/ODBC tab



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI

2019-11-02 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965407#comment-16965407
 ] 

shahid commented on SPARK-29724:


I will raise a PR

> Support JDBC/ODBC tab for HistoryServer WebUI
> -
>
> Key: SPARK-29724
> URL: https://issues.apache.org/jira/browse/SPARK-29724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Major
>
> Support JDBC/ODBC tab for HistoryServerWebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI

2019-11-02 Thread shahid (Jira)

shahid created SPARK-29724:
--

 Summary: Support JDBC/ODBC tab for HistoryServer WebUI
 Key: SPARK-29724
 URL: https://issues.apache.org/jira/browse/SPARK-29724
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 3.0.0
Reporter: shahid


Support JDBC/ODBC tab for HistoryServerWebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29590) Support hiding table in JDBC/ODBC server page in WebUI

2019-10-28 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961006#comment-16961006
 ] 

shahid commented on SPARK-29590:


Hi [~hyukjin.kwon], This JIRA is similar to the JIRA 
https://issues.apache.org/jira/browse/SPARK-25575. This is for supporting in 
JDBC/ODBC server page

> Support hiding table in JDBC/ODBC server page in WebUI
> --
>
> Key: SPARK-29590
> URL: https://issues.apache.org/jira/browse/SPARK-29590
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support hiding table in JDBC/ODBC server page in WebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29590) Support hiding table in JDBC/ODBC server page in WebUI

2019-10-24 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958909#comment-16958909
 ] 

shahid commented on SPARK-29590:


I will raise a PR

> Support hiding table in JDBC/ODBC server page in WebUI
> --
>
> Key: SPARK-29590
> URL: https://issues.apache.org/jira/browse/SPARK-29590
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support hiding table in JDBC/ODBC server page in WebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29589) Support pagination for sql session stats table in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958907#comment-16958907
 ] 

shahid commented on SPARK-29589:


I will raise a PR

> Support pagination for sql session stats table in JDBC/ODBC server page
> ---
>
> Key: SPARK-29589
> URL: https://issues.apache.org/jira/browse/SPARK-29589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support pagination for sql session stats table in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29590) Support hiding table in JDBC/ODBC server page in WebUI

2019-10-24 Thread shahid (Jira)

shahid created SPARK-29590:
--

 Summary: Support hiding table in JDBC/ODBC server page in WebUI
 Key: SPARK-29590
 URL: https://issues.apache.org/jira/browse/SPARK-29590
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support hiding table in JDBC/ODBC server page in WebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29588) Improvements in WebUI JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-29588:
---
Summary: Improvements in WebUI JDBC/ODBC server page  (was: Improvements in 
JDBC/ODBC server page)

> Improvements in WebUI JDBC/ODBC server page
> ---
>
> Key: SPARK-29588
> URL: https://issues.apache.org/jira/browse/SPARK-29588
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Improvements in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29589) Support pagination for sql session stats table in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)

shahid created SPARK-29589:
--

 Summary: Support pagination for sql session stats table in 
JDBC/ODBC server page
 Key: SPARK-29589
 URL: https://issues.apache.org/jira/browse/SPARK-29589
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support pagination for sql session stats table in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29588) Improvements in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)

shahid created SPARK-29588:
--

 Summary: Improvements in JDBC/ODBC server page
 Key: SPARK-29588
 URL: https://issues.apache.org/jira/browse/SPARK-29588
 Project: Spark
  Issue Type: Umbrella
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Improvements in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29571) Fix UT in AllExecutionsPageSuite class

2019-10-23 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958171#comment-16958171
 ] 

shahid commented on SPARK-29571:


Could you clarify which UT is failing?

> Fix UT in  AllExecutionsPageSuite class
> ---
>
> Key: SPARK-29571
> URL: https://issues.apache.org/jira/browse/SPARK-29571
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Ankit Raj Boudh
>Priority: Minor
>
> sorting should be successful UT in class AllExecutionsPageSuite failing due 
> to invalid assert condition. Needs to handle this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29559) Support pagination for JDBC/ODBC UI page

2019-10-22 Thread shahid (Jira)

shahid created SPARK-29559:
--

 Summary: Support pagination for JDBC/ODBC UI page
 Key: SPARK-29559
 URL: https://issues.apache.org/jira/browse/SPARK-29559
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support pagination for JDBC/ODBC UI page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29235) CrossValidatorModel.avgMetrics disappears after model is written/read again

2019-09-24 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937427#comment-16937427
 ] 

shahid commented on SPARK-29235:


I would like to analyze the issue.

> CrossValidatorModel.avgMetrics disappears after model is written/read again
> ---
>
> Key: SPARK-29235
> URL: https://issues.apache.org/jira/browse/SPARK-29235
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.1, 2.4.3
> Environment: Databricks cluster:
> {
>     "num_workers": 4,
>     "cluster_name": "mabedfor-test-classfix",
>     "spark_version": "5.3.x-cpu-ml-scala2.11",
>     "spark_conf": {
>     "spark.databricks.delta.preview.enabled": "true"
>     },
>     "node_type_id": "Standard_DS12_v2",
>     "driver_node_type_id": "Standard_DS12_v2",
>     "ssh_public_keys": [],
>     "custom_tags": {},
>     "spark_env_vars": {
>     "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
>     },
>     "autotermination_minutes": 120,
>     "enable_elastic_disk": true,
>     "cluster_source": "UI",
>     "init_scripts": [],
>     "cluster_id": "0722-165622-calls746"
> }
>Reporter: Matthew Bedford
>Priority: Minor
>
>  
>  Right after a CrossValidatorModel is trained, it has avgMetrics.  After the 
> model is written to disk and read later, it no longer has avgMetrics.  To 
> reproduce:
> {{from pyspark.ml.tuning import CrossValidator, CrossValidatorModel}}
> {{cv = CrossValidator(...) #fill with params}}
> {{cvModel = cv.fit(trainDF) #given dataframe with training data}}
> {{data}}{{print(cvModel.avgMetrics) #prints a nonempty list as expected}}
> {{cvModel.write().save({color:#172b4d}"/tmp/model"{color})}}
> {{cvModel2 = 
> CrossValidatorModel.read().load({color:#172b4d}"/tmp/model"{color})}}
> {{print(cvModel2.avgMetrics) #BUG - prints an empty list}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28599) Support pagination and sorting for ThriftServerPage

2019-08-02 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898614#comment-16898614
 ] 

shahid commented on SPARK-28599:


Thanks [~yumwang]. I will analyze this.
In SQL tab, we supported pagination, mainly because of the performance issue. 
Like crashing the page when loading large number of queries etc. I would like 
to know if any similar issue is there in the thriftserver tab.

> Support pagination and sorting for ThriftServerPage
> ---
>
> Key: SPARK-28599
> URL: https://issues.apache.org/jira/browse/SPARK-28599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> SQLTab support pagination and sorting, but ThriftServerPage missing this 
> feature.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25788) Elastic net penalties for GLMs

2019-07-18 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16887874#comment-16887874
 ] 

shahid commented on SPARK-25788:


[~pralabhkumar] yeah. Please go ahead. I think I don't have enough time 
bandwidth to look into the issue.

> Elastic net penalties for GLMs 
> ---
>
> Key: SPARK-25788
> URL: https://issues.apache.org/jira/browse/SPARK-25788
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Christian Lorentzen
>Priority: Major
>
> Currently, both LinearRegression and LogisticRegression support an elastic 
> net penality (setElasticNetParam), i.e. L1 and L2 penalties. This feature 
> could and should also be added to GeneralizedLinearRegression.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27802) SparkUI throws NoSuchElementException when inconsistency appears between `ExecutorStageSummaryWrapper`s and `ExecutorSummaryWrapper`s

2019-06-25 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872740#comment-16872740
 ] 

shahid commented on SPARK-27802:


Could you please provide a reproducible step for the issue?

> SparkUI throws NoSuchElementException when inconsistency appears between 
> `ExecutorStageSummaryWrapper`s and `ExecutorSummaryWrapper`s
> -
>
> Key: SPARK-27802
> URL: https://issues.apache.org/jira/browse/SPARK-27802
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: liupengcheng
>Priority: Major
>
> Recently, we hit this issue when testing spark2.3. It report the following 
> error messages when clicking on the stage UI link.
> We add more logs to print the executorId(here is 10) to debug, and finally 
> find out that it's caused by the inconsistency between the list of 
> `ExecutorStageSummaryWrapper` and the `ExecutorSummaryWrapper` in the 
> KVStore. The number of deadExecutors may exceeded threshold and being removed 
> from list of `ExecutorSummaryWrapper`, however, it may still be kept in the 
> list of `ExecutorStageSummaryWrapper` in the store.
> {code:java}
> HTTP ERROR 500
> Problem accessing /stages/stage/. Reason:
> Server Error
> Caused by:
> java.util.NoSuchElementException: 10
>   at 
> org.apache.spark.util.kvstore.InMemoryStore.read(InMemoryStore.java:83)
>   at 
> org.apache.spark.status.ElementTrackingStore.read(ElementTrackingStore.scala:95)
>   at 
> org.apache.spark.status.AppStatusStore.executorSummary(AppStatusStore.scala:70)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:99)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable$$anonfun$createExecutorTable$2.apply(ExecutorTable.scala:92)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable.createExecutorTable(ExecutorTable.scala:92)
>   at 
> org.apache.spark.ui.jobs.ExecutorTable.toNodeSeq(ExecutorTable.scala:75)
>   at org.apache.spark.ui.jobs.StagePage.render(StagePage.scala:478)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
>   at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:82)
>   at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>   at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:166)
>   at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
>

[jira] [Commented] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.

2019-05-08 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835598#comment-16835598
 ] 

shahid commented on SPARK-25861:


[~drKalko]Normally Browser supports refresh page.
This JIRA was for removing unused code from spark 

> Remove unused refreshInterval parameter from the headerSparkPage method.
> 
>
> Key: SPARK-25861
> URL: https://issues.apache.org/jira/browse/SPARK-25861
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: shahid
>Assignee: shahid
>Priority: Trivial
> Fix For: 3.0.0
>
>
> https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221
>  
> refreshInterval is not used anywhere in the headerSparkPage method. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27636) Remove cached RDD blocks after PIC execution

2019-05-05 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833462#comment-16833462
 ] 

shahid commented on SPARK-27636:


I will raise a PR

> Remove cached RDD blocks after PIC execution
> 
>
> Key: SPARK-27636
> URL: https://issues.apache.org/jira/browse/SPARK-27636
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.3.3, 2.4.2, 3.0.0
>Reporter: shahid
>Priority: Major
>
> Test steps to reproduce:
> 1) bin/spark-shell
> val dataset = spark.createDataFrame(Seq(
>  (0L, 1L, 1.0),
>  (1L,2L,1.0),
>  (3L, 4L,1.0),
>  (4L,0L,0.1))).toDF("src", "dst", "weight")
> val model = new PowerIterationClustering().
>  setMaxIter(10).
>  setInitMode("degree").
>   setWeightCol("weight") 
>  val prediction = model.assignClusters(dataset).select("id", "cluster")
> 2) Open storage tab of the UI. we can see many RDD block cached, even after 
> running the PIC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27636) Remove cached RDD blocks after PIC execution

2019-05-05 Thread shahid (JIRA)

shahid created SPARK-27636:
--

 Summary: Remove cached RDD blocks after PIC execution
 Key: SPARK-27636
 URL: https://issues.apache.org/jira/browse/SPARK-27636
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 2.4.2, 2.3.3, 3.0.0
Reporter: shahid


Test steps to reproduce:
1) bin/spark-shell
val dataset = spark.createDataFrame(Seq(
 (0L, 1L, 1.0),
 (1L,2L,1.0),
 (3L, 4L,1.0),
 (4L,0L,0.1))).toDF("src", "dst", "weight")
val model = new PowerIterationClustering().
 setMaxIter(10).
 setInitMode("degree").
  setWeightCol("weight") 
 val prediction = model.assignClusters(dataset).select("id", "cluster")

2) Open storage tab of the UI. we can see many RDD block cached, even after 
running the PIC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27068) Support failed jobs ui and completed jobs ui use different queue

2019-04-18 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821286#comment-16821286
 ] 

shahid edited comment on SPARK-27068 at 4/18/19 4:31 PM:
-

cc [~srowen] Can we raise a PR for the issue?. The actual issue is, when there 
are lots of jobs, UI cleans older jobs if the number of jobs exceeds a 
threshold. Eventually it removes failure jobs as well. If user want to see the 
reason for failure, it won't be available in UI. 
 The solution could be, we can remove the jobs only from successful jobs table 
and retain failed or killed jobs table. Kindly give the feedback


was (Author: shahid):
cc [~srowen] Can we raise a PR for the issue?. The actual issue is, when there 
are lots of jobs, UI cleans older jobs if the number of jobs exceeds a 
threshold. Eventually it removes failure jobs as well. If user want to see the 
reason for failure, it won't be available in UI. 
 The solution could be, we can remove the jobs only from successful jobs table 
and retain failed of killed jobs table. Kindly give the feedback

> Support failed jobs ui and completed jobs ui use different queue
> 
>
> Key: SPARK-27068
> URL: https://issues.apache.org/jira/browse/SPARK-27068
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
>
> For some long running jobs,we may want to check out the cause of some failed 
> jobs.
> But most jobs has completed and failed jobs ui may disappear, we can use 
> different queue for this two kinds of jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27068) Support failed jobs ui and completed jobs ui use different queue

2019-04-18 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821286#comment-16821286
 ] 

shahid commented on SPARK-27068:


cc [~srowen] Can we raise a PR for the issue?. The actual issue is, when there 
are lots of jobs, UI cleans older jobs if the number of jobs exceeds a 
threshold. Eventually it removes failure jobs as well. If user want to see the 
reason for failure, it won't be available in UI. 
 The solution could be, we can remove the jobs only from successful jobs table 
and retain failed of killed jobs table. Kindly give the feedback

> Support failed jobs ui and completed jobs ui use different queue
> 
>
> Key: SPARK-27068
> URL: https://issues.apache.org/jira/browse/SPARK-27068
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
>
> For some long running jobs,we may want to check out the cause of some failed 
> jobs.
> But most jobs has completed and failed jobs ui may disappear, we can use 
> different queue for this two kinds of jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct

2019-04-17 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820525#comment-16820525
 ] 

shahid commented on SPARK-27468:


[~zsxwing] Thanks

> "Storage Level" in "RDD Storage Page" is not correct
> 
>
> Key: SPARK-27468
> URL: https://issues.apache.org/jira/browse/SPARK-27468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1
>Reporter: Shixiong Zhu
>Priority: Major
> Attachments: Screenshot from 2019-04-17 10-42-55.png
>
>
> I ran the following unit test and checked the UI.
> {code}
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[2,1,1024]")
>   .set("spark.ui.enabled", "true")
> sc = new SparkContext(conf)
> val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
> rdd.count()
> Thread.sleep(360)
> {code}
> The storage level is "Memory Deserialized 1x Replicated" in the RDD storage 
> page.
> I tried to debug and found this is because Spark emitted the following two 
> events:
> {code}
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 
> 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 
> replicas),56,0))
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 
> 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 
> replicas),56,0))
> {code}
> The storage level in the second event will overwrite the first one. "1 
> replicas" comes from this line: 
> https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457
> Maybe AppStatusListener should calculate the replicas from events?
> Another fact we may need to think about is when replicas is 2, will two Spark 
> events arrive in the same order? Currently, two RPCs from different executors 
> can arrive in any order.
> Credit goes to [~srfnmnk] who reported this issue originally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27485) Certain query plans fail to run when autoBroadcastJoinThreshold is set to -1

2019-04-17 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820189#comment-16820189
 ] 

shahid commented on SPARK-27485:


Could you please share test to reproduce this?

> Certain query plans fail to run when autoBroadcastJoinThreshold is set to -1
> 
>
> Key: SPARK-27485
> URL: https://issues.apache.org/jira/browse/SPARK-27485
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.4.0
>Reporter: Muthu Jayakumar
>Priority: Minor
>
> Certain queries fail with
> {noformat}
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:349)
>   at scala.None$.get(Option.scala:347)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$reorder$1(EnsureRequirements.scala:238)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.$anonfun$reorder$1$adapted(EnsureRequirements.scala:233)
>   at scala.collection.immutable.List.foreach(List.scala:388)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.reorder(EnsureRequirements.scala:233)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.reorderJoinKeys(EnsureRequirements.scala:262)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.org$apache$spark$sql$execution$exchange$EnsureRequirements$$reorderJoinPredicates(EnsureRequirements.scala:289)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:304)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements$$anonfun$apply$1.applyOrElse(EnsureRequirements.scala:296)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$4(TreeNode.scala:282)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:282)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$1(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:326)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:324)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:275)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:296)
>   at 
> org.apache.spark.sql.execution.exchange.EnsureRequirements.apply(EnsureRequirements.scala:38)
>   at 
> org.apache.spark.sql.execution.QueryExecution.$anonfun$prepareForExecution$1(QueryExecution.scala:87)
>   at 
>

[jira] [Created] (SPARK-27486) Enable History server storage information test

2019-04-17 Thread shahid (JIRA)

shahid created SPARK-27486:
--

 Summary: Enable History server storage information test
 Key: SPARK-27486
 URL: https://issues.apache.org/jira/browse/SPARK-27486
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.1, 2.3.3, 3.0.0
Reporter: shahid


After SPARK-22050, we can store the information about block updated events to 
eventLog, if we enable "spark.eventLog.logBlockUpdates.enabled=true". We have 
disabled the test related to storage in the History server suite after 
SPARK-13845. So, we can enable the test, by adding an eventlog corresponding to 
the application, which has enabled "spark.eventLog.logBlockUpdates.enabled=true"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27486) Enable History server storage information test

2019-04-17 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819759#comment-16819759
 ] 

shahid commented on SPARK-27486:


I will raise a PR

> Enable History server storage information test
> --
>
> Key: SPARK-27486
> URL: https://issues.apache.org/jira/browse/SPARK-27486
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.4.1, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> After SPARK-22050, we can store the information about block updated events to 
> eventLog, if we enable "spark.eventLog.logBlockUpdates.enabled=true". We have 
> disabled the test related to storage in the History server suite after 
> SPARK-13845. So, we can enable the test, by adding an eventlog corresponding 
> to the application, which has enabled 
> "spark.eventLog.logBlockUpdates.enabled=true"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct

2019-04-16 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819679#comment-16819679
 ] 

shahid commented on SPARK-27468:


Hi [~srfnmnk], I tried to reproduce in the master branch. The steps followed 
shown below
1) bin/spark-shell --master local[2]


{code:java}
scala> import org.apache.spark.storage.StorageLevel
scala> val rdd = sc.parallelize(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
scala > rdd.count
{code}

Storage tab in the UI is shown below ,

 !Screenshot from 2019-04-17 10-42-55.png! 

So, it seems I am not able to reproduce the issue. Could you please tell me if 
the test steps are correct or I need to enable any configurations. Thank you


> "Storage Level" in "RDD Storage Page" is not correct
> 
>
> Key: SPARK-27468
> URL: https://issues.apache.org/jira/browse/SPARK-27468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1
>Reporter: Shixiong Zhu
>Priority: Major
> Attachments: Screenshot from 2019-04-17 10-42-55.png
>
>
> I ran the following unit test and checked the UI.
> {code}
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[2,1,1024]")
>   .set("spark.ui.enabled", "true")
> sc = new SparkContext(conf)
> val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
> rdd.count()
> Thread.sleep(360)
> {code}
> The storage level is "Memory Deserialized 1x Replicated" in the RDD storage 
> page.
> I tried to debug and found this is because Spark emitted the following two 
> events:
> {code}
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 
> 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 
> replicas),56,0))
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 
> 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 
> replicas),56,0))
> {code}
> The storage level in the second event will overwrite the first one. "1 
> replicas" comes from this line: 
> https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457
> Maybe AppStatusListener should calculate the replicas from events?
> Another fact we may need to think about is when replicas is 2, will two Spark 
> events arrive in the same order? Currently, two RPCs from different executors 
> can arrive in any order.
> Credit goes to [~srfnmnk] who reported this issue originally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct

2019-04-16 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-27468:
---
Attachment: Screenshot from 2019-04-17 10-42-55.png

> "Storage Level" in "RDD Storage Page" is not correct
> 
>
> Key: SPARK-27468
> URL: https://issues.apache.org/jira/browse/SPARK-27468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1
>Reporter: Shixiong Zhu
>Priority: Major
> Attachments: Screenshot from 2019-04-17 10-42-55.png
>
>
> I ran the following unit test and checked the UI.
> {code}
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[2,1,1024]")
>   .set("spark.ui.enabled", "true")
> sc = new SparkContext(conf)
> val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
> rdd.count()
> Thread.sleep(360)
> {code}
> The storage level is "Memory Deserialized 1x Replicated" in the RDD storage 
> page.
> I tried to debug and found this is because Spark emitted the following two 
> events:
> {code}
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 
> 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 
> replicas),56,0))
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 
> 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 
> replicas),56,0))
> {code}
> The storage level in the second event will overwrite the first one. "1 
> replicas" comes from this line: 
> https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457
> Maybe AppStatusListener should calculate the replicas from events?
> Another fact we may need to think about is when replicas is 2, will two Spark 
> events arrive in the same order? Currently, two RPCs from different executors 
> can arrive in any order.
> Credit goes to [~srfnmnk] who reported this issue originally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16859) History Server storage information is missing

2019-04-16 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819662#comment-16819662
 ] 

shahid edited comment on SPARK-16859 at 4/17/19 2:07 AM:
-

[~Hauer] [~toopt4] Can you try enabling 
"spark.eventLog.logBlockUpdates.enabled=true" and see, if still History server 
storage tab is empty?


was (Author: shahid):
[~Hauer][~toopt4] Can you try enabling 
"spark.eventLog.logBlockUpdates.enabled=true" and see, if still History server 
storage tab is empty?

> History Server storage information is missing
> -
>
> Key: SPARK-16859
> URL: https://issues.apache.org/jira/browse/SPARK-16859
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Andrei Ivanov
>Priority: Major
>  Labels: historyserver, newbie
>
> It looks like job history storage tab in history server is broken for 
> completed jobs since *1.6.2*. 
> More specifically it's broken since 
> [SPARK-13845|https://issues.apache.org/jira/browse/SPARK-13845].
> I've fixed for my installation by effectively reverting the above patch 
> ([see|https://github.com/EinsamHauer/spark/commit/3af62ea09af8bb350c8c8a9117149c09b8feba08]).
> IMHO, the most straightforward fix would be to implement 
> _SparkListenerBlockUpdated_ serialization to JSON in _JsonProtocol_ making 
> sure it works from _ReplayListenerBus_.
> The downside will be that it will still work incorrectly with pre patch job 
> histories. But then, it doesn't work since *1.6.2* anyhow.
> PS: I'd really love to have this fixed eventually. But I'm pretty new to 
> Apache Spark and missing hands on Scala experience. So  I'd prefer that it be 
> fixed by someone experienced with roadmap vision. If nobody volunteers I'll 
> try to patch myself.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16859) History Server storage information is missing

2019-04-16 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819662#comment-16819662
 ] 

shahid commented on SPARK-16859:


[~Hauer][~toopt4] Can you try enabling 
"spark.eventLog.logBlockUpdates.enabled=true" and see, if still History server 
storage tab is empty?

> History Server storage information is missing
> -
>
> Key: SPARK-16859
> URL: https://issues.apache.org/jira/browse/SPARK-16859
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Andrei Ivanov
>Priority: Major
>  Labels: historyserver, newbie
>
> It looks like job history storage tab in history server is broken for 
> completed jobs since *1.6.2*. 
> More specifically it's broken since 
> [SPARK-13845|https://issues.apache.org/jira/browse/SPARK-13845].
> I've fixed for my installation by effectively reverting the above patch 
> ([see|https://github.com/EinsamHauer/spark/commit/3af62ea09af8bb350c8c8a9117149c09b8feba08]).
> IMHO, the most straightforward fix would be to implement 
> _SparkListenerBlockUpdated_ serialization to JSON in _JsonProtocol_ making 
> sure it works from _ReplayListenerBus_.
> The downside will be that it will still work incorrectly with pre patch job 
> histories. But then, it doesn't work since *1.6.2* anyhow.
> PS: I'd really love to have this fixed eventually. But I'm pretty new to 
> Apache Spark and missing hands on Scala experience. So  I'd prefer that it be 
> fixed by someone experienced with roadmap vision. If nobody volunteers I'll 
> try to patch myself.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27465) Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package

2019-04-15 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818658#comment-16818658
 ] 

shahid commented on SPARK-27465:


I will analyze the issue.

> Kafka Client 0.11.0.0 is not Supporting the kafkatestutils package
> --
>
> Key: SPARK-27465
> URL: https://issues.apache.org/jira/browse/SPARK-27465
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Praveen
>Priority: Critical
>
> Hi Team,
> We are getting the below exceptions with Kafka Client Version 0.11.0.0 for 
> KafkaTestUtils Package. But its working fine when we use the Kafka Client 
> Version 0.10.0.1. Please suggest the way forwards. We are using the package "
> import org.apache.spark.streaming.kafka010.KafkaTestUtils;"
>  
> ERROR:
> java.lang.NoSuchMethodError: 
> kafka.server.KafkaServer$.$lessinit$greater$default$2()Lkafka/utils/Time;
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:110)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils$$anonfun$setupEmbeddedKafkaServer$2.apply(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2234)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>  at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2226)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:107)
>  at 
> org.apache.spark.streaming.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>  at 
> com.netcracker.rms.smart.esp.ESPTestEnv.prepareKafkaTestUtils(ESPTestEnv.java:203)
>  at com.netcracker.rms.smart.esp.ESPTestEnv.setUp(ESPTestEnv.java:157)
>  at 
> com.netcracker.rms.smart.esp.TestEventStreamProcessor.setUp(TestEventStreamProcessor.java:58)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27468) "Storage Level" in "RDD Storage Page" is not correct

2019-04-15 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818662#comment-16818662
 ] 

shahid commented on SPARK-27468:


I would like to analyze the issue.

> "Storage Level" in "RDD Storage Page" is not correct
> 
>
> Key: SPARK-27468
> URL: https://issues.apache.org/jira/browse/SPARK-27468
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.1
>Reporter: Shixiong Zhu
>Priority: Major
>
> I ran the following unit test and checked the UI.
> {code}
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local-cluster[2,1,1024]")
>   .set("spark.ui.enabled", "true")
> sc = new SparkContext(conf)
> val rdd = sc.makeRDD(1 to 10, 1).persist(StorageLevel.MEMORY_ONLY_2)
> rdd.count()
> Thread.sleep(360)
> {code}
> The storage level is "Memory Deserialized 1x Replicated" in the RDD storage 
> page.
> I tried to debug and found this is because Spark emitted the following two 
> events:
> {code}
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(1, 
> 10.8.132.160, 65473, None),rdd_0_0,StorageLevel(memory, deserialized, 2 
> replicas),56,0))
> event: SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(0, 
> 10.8.132.160, 65474, None),rdd_0_0,StorageLevel(memory, deserialized, 1 
> replicas),56,0))
> {code}
> The storage level in the second event will overwrite the first one. "1 
> replicas" comes from this line: 
> https://github.com/apache/spark/blob/3ab96d7acf870e53c9016b0b63d0b328eec23bed/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L1457
> Maybe AppStatusListener should calculate the replicas from events?
> Another fact we may need to think about is when replicas is 2, will two Spark 
> events arrive in the same order? Currently, two RPCs from different executors 
> can arrive in any order.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27068) Support failed jobs ui and completed jobs ui use different queue

2019-04-11 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815435#comment-16815435
 ] 

shahid commented on SPARK-27068:


Hi [~vanzin], Currently when cleaning up jobs when the UI job limit exceeds, 
AppStatusListener removes Completed as well as Failed Jobs.
Here the issue is, the user wants to see the failed jobs table, but it got 
removed while cleaning up the jobs.

Is it possible, not to remove "FailedJobs" table during cleaning, so that users 
can see the exact reason of failing?
I can raise a PR, if it is a valid scenario. 

Thanks

> Support failed jobs ui and completed jobs ui use different queue
> 
>
> Key: SPARK-27068
> URL: https://issues.apache.org/jira/browse/SPARK-27068
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
>
> For some long running jobs,we may want to check out the cause of some failed 
> jobs.
> But most jobs has completed and failed jobs ui may disappear, we can use 
> different queue for this two kinds of jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27434) memory leak in spark driver

2019-04-11 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815424#comment-16815424
 ] 

shahid commented on SPARK-27434:


Could you please provide steps for reproducing the issue?

> memory leak in spark driver
> ---
>
> Key: SPARK-27434
> URL: https://issues.apache.org/jira/browse/SPARK-27434
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
> Environment: OS: Centos 7
> JVM: 
> **_openjdk version "1.8.0_201"_
> _OpenJDK Runtime Environment (IcedTea 3.11.0) (Alpine 8.201.08-r0)_
> _OpenJDK 64-Bit Server VM (build 25.201-b08, mixed mode)_
> Spark version: 2.4.0
>Reporter: Ryne Yang
>Priority: Major
> Attachments: Screen Shot 2019-04-10 at 12.11.35 PM.png
>
>
> we got a OOM exception on the driver after driver has completed multiple 
> jobs(we are reusing spark context). 
> so we took a heap dump and looked at the leak analysis, found out that under 
> AsyncEventQueue there are 3.5GB of heap allocated. Possibly a leak. 
>  
> can someone take a look at? 
> here is the heap analysis: 
> !Screen Shot 2019-04-10 at 12.11.35 PM.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27068) Support failed jobs ui and completed jobs ui use different queue

2019-04-10 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815010#comment-16815010
 ] 

shahid commented on SPARK-27068:


Already failed jobs is in a single table right?. I meant, For running jobs, 
there will be one table, For completed one there will be another table and for 
failed jobs also there will be a seperate table. right? You mean to say, while 
clean up jobs we shouldn't remove from failed jobs table?

> Support failed jobs ui and completed jobs ui use different queue
> 
>
> Key: SPARK-27068
> URL: https://issues.apache.org/jira/browse/SPARK-27068
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: zhoukang
>Priority: Major
>
> For some long running jobs,we may want to check out the cause of some failed 
> jobs.
> But most jobs has completed and failed jobs ui may disappear, we can use 
> different queue for this two kinds of jobs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20656) Incremental parsing of event logs in SHS

2019-03-27 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-20656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16802945#comment-16802945
 ] 

shahid commented on SPARK-20656:


I would like to work on it. 

> Incremental parsing of event logs in SHS
> 
>
> Key: SPARK-20656
> URL: https://issues.apache.org/jira/browse/SPARK-20656
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This feature is mentioned in the spec attached to SPARK-18085 but there's not 
> a lot of discussion about it.
> It would be good to implement incremental parsing of event logs in the SHS. 
> With the new work, UI data is stored on disk, so it should be possible to 
> save enough metadata about the event log and the state of the listeners to 
> allow one to resume parsing the log of a live application at the point where 
> it stopped in the previous iteration. 
> This would considerably speed up parsing on updates, and could be done 
> speculatively so that UIs for new applications are available in the SHS 
> almost immediately.
> I'm filing this as a separate enhancement because I don't want to block 
> SPARK-18085 on this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27204) First time Loading application page from History Server is taking time when event log size is huge

2019-03-21 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798178#comment-16798178
 ] 

shahid commented on SPARK-27204:


Yes. actually, for loading the UI page, SHS need to replay the entire event 
log, first time. So, if the event log size is huge, it takes lot of time to 
open the app UI, first time. This issue is still there, as it wasn't addressed 
in SPARK-18085. 

> First time Loading application page from History Server is taking time when 
> event log size is huge
> --
>
> Key: SPARK-27204
> URL: https://issues.apache.org/jira/browse/SPARK-27204
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.3, 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> 1. Launch spark shell and submit a long running job.
> 2. Measure the loading time of Job History Page first time.
> 3. For Example Event Log Size = 18GB, With disk store, Application page 
> Loading time takes first time 47 Min



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27231) Stack overflow error, when we increase the number of iteration in PIC

2019-03-21 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-27231:
---
Description: 
 val dataset = spark.createDataFrame(Seq(
   (0L, 1L, 1.0),
   (0L, 2L, 1.0),
   (1L, 2L, 1.0),
   (3L, 4L, 1.0),
   (4L, 0L, 0.1)
 )).toDF("src", "dst", "weight")

 val model = new PowerIterationClustering().
   setK(2).
   setMaxIter(100).
   setInitMode("degree").
   setWeightCol("weight")

 val prediction = model.assignClusters(dataset).select("id", "cluster")

java.lang.StackOverflowError
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2188)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)


  was:
 val dataset = spark.createDataFrame(Seq(
   (0L, 1L, 1.0),
   (0L, 2L, 1.0),
   (1L, 2L, 1.0),
   (3L, 4L, 1.0),
   (4L, 0L, 0.1)
 )).toDF("src", "dst", "weight")

 val model = new PowerIterationClustering().
   setK(2).
   setMaxIter(100).
   setInitMode("degree").
   setWeightCol("weight")

 val prediction = model.assignClusters(dataset).select("id", "cluster")


> Stack overflow error, when we increase the number of iteration in PIC
> -
>
> Key: SPARK-27231
> URL: https://issues.apache.org/jira/browse/SPARK-27231
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.3, 2.4.0
>Reporter: shahid
>Priority: Minor
>
>  val dataset = spark.createDataFrame(Seq(
>(0L, 1L, 1.0),
>(0L, 2L, 1.0),
>(1L, 2L, 1.0),
>(3L, 4L, 1.0),
>(4L, 0L, 0.1)
>  )).toDF("src", "dst", "weight")
>  val model = new PowerIterationClustering().
>setK(2).
>setMaxIter(100).
>setInitMode("degree").
>setWeightCol("weight")
>  val prediction = model.assignClusters(dataset).select("id", "cluster")
> java.lang.StackOverflowError
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2188)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27231) Stack overflow error, when we increase the number of iteration in PIC

2019-03-21 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798170#comment-16798170
 ] 

shahid commented on SPARK-27231:


I am analyzing the issue,

> Stack overflow error, when we increase the number of iteration in PIC
> -
>
> Key: SPARK-27231
> URL: https://issues.apache.org/jira/browse/SPARK-27231
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.3, 2.4.0
>Reporter: shahid
>Priority: Minor
>
>  val dataset = spark.createDataFrame(Seq(
>(0L, 1L, 1.0),
>(0L, 2L, 1.0),
>(1L, 2L, 1.0),
>(3L, 4L, 1.0),
>(4L, 0L, 0.1)
>  )).toDF("src", "dst", "weight")
>  val model = new PowerIterationClustering().
>setK(2).
>setMaxIter(100).
>setInitMode("degree").
>setWeightCol("weight")
>  val prediction = model.assignClusters(dataset).select("id", "cluster")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27231) Stack overflow error, when we increase the number of iteration in PIC

2019-03-21 Thread shahid (JIRA)

shahid created SPARK-27231:
--

 Summary: Stack overflow error, when we increase the number of 
iteration in PIC
 Key: SPARK-27231
 URL: https://issues.apache.org/jira/browse/SPARK-27231
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 2.4.0, 2.3.3
Reporter: shahid


 val dataset = spark.createDataFrame(Seq(
   (0L, 1L, 1.0),
   (0L, 2L, 1.0),
   (1L, 2L, 1.0),
   (3L, 4L, 1.0),
   (4L, 0L, 0.1)
 )).toDF("src", "dst", "weight")

 val model = new PowerIterationClustering().
   setK(2).
   setMaxIter(100).
   setInitMode("degree").
   setWeightCol("weight")

 val prediction = model.assignClusters(dataset).select("id", "cluster")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27204) First time Loading application page from History Server is taking time when event log size is huge

2019-03-21 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797888#comment-16797888
 ] 

shahid commented on SPARK-27204:


Hi @HyukjinKwon, this is still an issue after SPARK-18085. I will raise a PR

> First time Loading application page from History Server is taking time when 
> event log size is huge
> --
>
> Key: SPARK-27204
> URL: https://issues.apache.org/jira/browse/SPARK-27204
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.3, 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> 1. Launch spark shell and submit a long running job.
> 2. Measure the loading time of Job History Page first time.
> 3. For Example Event Log Size = 18GB, With disk store, Application page 
> Loading time takes first time 47 Min



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27169) number of active tasks is negative on executors page

2019-03-20 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797066#comment-16797066
 ] 

shahid commented on SPARK-27169:


Yes. that means many event drops happens. Can you try increasing the queue 
size, "spark.scheduler.listenerbus.eventqueue.capacity" (default 1) might 
helps. If event drop happens, then UI display weirdly only, I'm not sure, from 
the UI side we can do anything.

Do you have any reproducible steps for that, so that I can try?

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png, 
> image-2019-03-19-15-17-25-522.png, image-2019-03-19-15-21-03-766.png, 
> job_1924.log, stage_3511.log
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27169) number of active tasks is negative on executors page

2019-03-20 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797066#comment-16797066
 ] 

shahid edited comment on SPARK-27169 at 3/20/19 11:26 AM:
--

Yes. that means many event drops happens. Can you try increasing the queue 
size, "spark.scheduler.listenerbus.eventqueue.capacity" (default 1) might 
helps. If event drop happens, then UI display weirdly, I'm not sure, from the 
UI side we can do anything.

Do you have any reproducible steps for that, so that I can try?


was (Author: shahid):
Yes. that means many event drops happens. Can you try increasing the queue 
size, "spark.scheduler.listenerbus.eventqueue.capacity" (default 1) might 
helps. If event drop happens, then UI display weirdly only, I'm not sure, from 
the UI side we can do anything.

Do you have any reproducible steps for that, so that I can try?

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png, 
> image-2019-03-19-15-17-25-522.png, image-2019-03-19-15-21-03-766.png, 
> job_1924.log, stage_3511.log
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27169) number of active tasks is negative on executors page

2019-03-20 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797040#comment-16797040
 ] 

shahid commented on SPARK-27169:


Hi, It seems, from the above log we can't say that event drop has happened or 
not. Could you please check in the driver log that "Dropping event from queue" 
phrase is there or not?

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png, 
> image-2019-03-19-15-17-25-522.png, image-2019-03-19-15-21-03-766.png, 
> job_1924.log, stage_3511.log
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27204) First time Loading application page from History Server is taking time when event log size is huge

2019-03-19 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796110#comment-16796110
 ] 

shahid commented on SPARK-27204:


Thanks. I will analyze,

> First time Loading application page from History Server is taking time when 
> event log size is huge
> --
>
> Key: SPARK-27204
> URL: https://issues.apache.org/jira/browse/SPARK-27204
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.3, 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> 1. Launch spark shell and submit a long running job.
> 2. Measure the loading time of Job History Page first time.
> 3. For Example Event Log Size = 18GB, With disk store, Application page 
> Loading time takes first time 47 Min



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27169) number of active tasks is negative on executors page

2019-03-19 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795878#comment-16795878
 ] 

shahid edited comment on SPARK-27169 at 3/19/19 9:38 AM:
-

Thank you. Could you please provide full event log if possible? 


was (Author: shahid):
Thank you. Could you please provide full event log if possible? I think, some 
task events are missed in the eventlog

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png, 
> image-2019-03-19-15-17-25-522.png, image-2019-03-19-15-21-03-766.png, 
> job_1924.log
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27169) number of active tasks is negative on executors page

2019-03-19 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795878#comment-16795878
 ] 

shahid commented on SPARK-27169:


Thank you. Could you please provide full event log if possible? I think, some 
task events are missed in the eventlog

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png, 
> image-2019-03-19-15-17-25-522.png, image-2019-03-19-15-21-03-766.png, 
> job_1924.log
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27169) number of active tasks is negative on executors page

2019-03-18 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794794#comment-16794794
 ] 

shahid commented on SPARK-27169:


Do you have eventlog corresponding to the application?

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27169) number of active tasks is negative on executors page

2019-03-16 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793675#comment-16793675
 ] 

shahid edited comment on SPARK-27169 at 3/16/19 8:40 AM:
-

Seems event drop has happened. Could you please provide logs,? 


was (Author: shahid):
Seems event drop has happened. 

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27169) number of active tasks is negative on executors page

2019-03-15 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793675#comment-16793675
 ] 

shahid commented on SPARK-27169:


Seems event drop has happened. 

> number of active tasks is negative on executors page
> 
>
> Key: SPARK-27169
> URL: https://issues.apache.org/jira/browse/SPARK-27169
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.2
>Reporter: acupple
>Priority: Minor
> Attachments: QQ20190315-102215.png, QQ20190315-102235.png
>
>
> I use spark to process some data in HDFS and HBASE, I use one thread consume 
> message from a queue, and then submit to a thread pool（16 fix size）for spark 
> processor.
> But when run for some time, the active jobs will be thousands, and number of 
> active tasks are negative.
> Actually, these jobs are already done when I check driver logs。
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27145) Close store after test, in the SQLAppStatusListenerSuite

2019-03-13 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-27145:
---
Summary: Close store after test, in the SQLAppStatusListenerSuite  (was: 
Close store after test, in SQLAppStatusListenerSuite)

> Close store after test, in the SQLAppStatusListenerSuite
> 
>
> Key: SPARK-27145
> URL: https://issues.apache.org/jira/browse/SPARK-27145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> We create many stores in the  SQLAppStatusListenerSuite, but we need to the 
> close store after test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27145) Close store after test, in SQLAppStatusListenerSuite

2019-03-13 Thread shahid (JIRA)

shahid created SPARK-27145:
--

 Summary: Close store after test, in SQLAppStatusListenerSuite
 Key: SPARK-27145
 URL: https://issues.apache.org/jira/browse/SPARK-27145
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.4.0, 2.3.3, 3.0.0
Reporter: shahid


We create many stores in the  SQLAppStatusListenerSuite, but we need to the 
close store after test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid resolved SPARK-23134.

Resolution: Duplicate

> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-23134:
---
Comment: was deleted

(was: Will resolve once the Jira SPARK-27012 merged)

> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid reopened SPARK-23134:


> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid resolved SPARK-23134.

Resolution: Duplicate

> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid reopened SPARK-23134:


Will resolve once the Jira SPARK-27012 merged

> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23134) WebUI is showing the cache table details even after cache idle timeout

2019-03-03 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid resolved SPARK-23134.

Resolution: Duplicate

> WebUI is showing the cache table details even after cache idle timeout
> --
>
> Key: SPARK-23134
> URL: https://issues.apache.org/jira/browse/SPARK-23134
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0, 2.2.1
> Environment:  Run Cache command with below configuration to cache the 
> RDD blocks
>   spark.dynamicAllocation.cachedExecutorIdleTimeout=120s
>   spark.dynamicAllocation.executorIdleTimeout=60s
>   spark.dynamicAllocation.enabled=true
>   spark.dynamicAllocation.minExecutors=0
>   spark.dynamicAllocation.maxExecutors=8
>  
>  
>  
>Reporter: shahid
>Priority: Major
>
> After cachedExecutorIdleTimeout, WebUI shows the cached partition details in 
> the storage tab. It should be the same scenario as in the case of uncache 
> table, where the storage tab of the web UI shows "RDD not found".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-02 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781670#comment-16781670
 ] 

shahid edited comment on SPARK-27019 at 3/2/19 9:44 PM:


seems event reordering has happened. Job start event came after sql execution 
end event, when the query failed. Could you please share spark eventLog for the 
application, if possible.


was (Author: shahid):
seems event reordering has happened. Job start event came after sql execution 
end event, when the query failed. Could you please share spark eventLog for the 
application, if possible, as I'm unable to reproduce in our cluster.

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781826#comment-16781826
 ] 

shahid edited comment on SPARK-27019 at 3/1/19 4:11 PM:


Yes. The issue happened because of the event reordering when the query failed. 
JobStart event came after the SQLExecutioEnd event, so the UI displayed 
weirdly. I will analyze the issue and send a patch.

 !Screenshot from 2019-03-01 21-31-48.png! 


was (Author: shahid):
Yes. The issue happened because of the event reordering when the query failed. 
JobStart event came after the SQLExecutioEnd event, so the UI displayed weirdly.

 !Screenshot from 2019-03-01 21-31-48.png! 

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781826#comment-16781826
 ] 

shahid commented on SPARK-27019:


Yes. The issue happened because of the event reordering when the query failed. 
JobStart event came after the SQLExecutioEnd event, so the UI displayed weirdly.

 !Screenshot from 2019-03-01 21-31-48.png! 

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-27019:
---
Attachment: Screenshot from 2019-03-01 21-31-48.png

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781670#comment-16781670
 ] 

shahid commented on SPARK-27019:


seems event reordering has happened. Job start event came after sql execution 
end event. 

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: query-1-details.png, query-1-list.png, query-job-1.png, 
> screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781670#comment-16781670
 ] 

shahid edited comment on SPARK-27019 at 3/1/19 1:33 PM:


seems event reordering has happened. Job start event came after sql execution 
end event, when the query failed. Could you please share spark eventLog for the 
application, if possible, as I'm unable to reproduce in our cluster.


was (Author: shahid):
seems event reordering has happened. Job start event came after sql execution 
end event, when the query failed.

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: query-1-details.png, query-1-list.png, query-job-1.png, 
> screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-01 Thread shahid (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781670#comment-16781670
 ] 

shahid edited comment on SPARK-27019 at 3/1/19 1:29 PM:


seems event reordering has happened. Job start event came after sql execution 
end event, when the query failed.


was (Author: shahid):
seems event reordering has happened. Job start event came after sql execution 
end event. 

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Priority: Major
> Attachments: query-1-details.png, query-1-list.png, query-job-1.png, 
> screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 282 matches

Mail list logo