[jira] [Assigned] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33911:


Assignee: (was: Apache Spark)

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254759#comment-17254759
 ] 

Apache Spark commented on SPARK-33911:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30925

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33911:


Assignee: Apache Spark

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33911:
---
Affects Version/s: 3.0.2

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33911:
---
Affects Version/s: 2.4.8

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33911:
---
Affects Version/s: 3.1.0

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33912) Refactor DependencyUtils ivy property parameter

2020-12-24 Thread angerszhu (Jira)
angerszhu created SPARK-33912:
-

 Summary: Refactor DependencyUtils ivy property parameter
 Key: SPARK-33912
 URL: https://issues.apache.org/jira/browse/SPARK-33912
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: angerszhu


according to https://github.com/apache/spark/pull/29966#discussion_r533573137



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33911:
---
Affects Version/s: (was: 2.4.8)
   (was: 3.1.0)

> Update SQL migration guide about changes in HiveClientImpl
> --
>
> Key: SPARK-33911
> URL: https://issues.apache.org/jira/browse/SPARK-33911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> 1. https://github.com/apache/spark/pull/30802
> 2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl

2020-12-24 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33911:
--

 Summary: Update SQL migration guide about changes in HiveClientImpl
 Key: SPARK-33911
 URL: https://issues.apache.org/jira/browse/SPARK-33911
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.8, 3.1.0, 3.2.0
Reporter: Maxim Gekk


1. https://github.com/apache/spark/pull/30802
2. https://github.com/apache/spark/pull/30711



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33910) Simplify conditional

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33910:

Description: 
Simplify CaseWhen/If conditionals. We can improve these cases:
1. Reduce read datasource.
2. Simple CaseWhen/If to support filter push down.
{code:sql}
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);

create temp view v1 as  
   
select 'a' as event_type, * from t1 
   
union all   
   
select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * from 
t2 

explain select * from v1 where event_type = 'a';
{code}

Before this PR:

{noformat}
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#30533, id#30535L]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: [], 
Format: Parquet
+- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END AS event_type#30534, id#30536L]
   +- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END = a)
  +- *(2) ColumnarToRow
 +- FileScan parquet default.t2[id#30536L] Batched: true, DataFilters: 
[(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END = a)], 
Format: Parquet
{noformat}

After this PR:


{noformat}
== Physical Plan ==
*(1) Project [a AS event_type#8, id#4L]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
Format: Parquet
{noformat}


  was:
Simplify CaseWhen/If with EqualTo if all values are Literal and always false, 
this is a real case from production:
{code:sql}
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);

create temp view v1 as  
   
select 'a' as event_type, * from t1 
   
union all   
   
select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * from 
t2 

explain select * from v1 where event_type = 'a';
{code}

Before this PR:


{noformat}
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#30533, id#30535L]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: [], 
Format: Parquet
+- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END AS event_type#30534, id#30536L]
   +- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END = a)
  +- *(2) ColumnarToRow
 +- FileScan parquet default.t2[id#30536L] Batched: true, DataFilters: 
[(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END = a)], 
Format: Parquet
{noformat}

After this PR:


{noformat}
== Physical Plan ==
*(1) Project [a AS event_type#8, id#4L]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
Format: Parquet
{noformat}



>  Simplify conditional
> -
>
> Key: SPARK-33910
> URL: https://issues.apache.org/jira/browse/SPARK-33910
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Simplify CaseWhen/If conditionals. We can improve these cases:
> 1. Reduce read datasource.
> 2. Simple CaseWhen/If to support filter push down.
> {code:sql}
> create table t1 using parquet as select * from range(100);
> create table t2 using parquet as select * from range(200);
> create temp view v1 as
>  
> select 'a' as event_type, * from t1   
>  
> union all 
>  
> select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * 
> from t2 
> explain select * from v1 where event_type = 'a';
> {code}
> Before this PR:
> {noformat}
> == Physical Plan ==
> Union
> :- *(1) Project [a AS event_type#30533, id#30535L]
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: 
> [], Format: Parquet
> +- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END AS event_type#30534, id#30536L]
>+- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN 
> c END = a)
>   +- *(2) ColumnarToRow
>  +- FileScan parquet default.t2[id#30536L] Batched: true, 
> DataFilters: [(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END = a)], Format: Parquet
> {noformat}
> After this PR:
> {noformat}
> == 

[jira] [Updated] (SPARK-33910) Simplify conditional

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33910:

Description: 
Simplify CaseWhen/If with EqualTo if all values are Literal and always false, 
this is a real case from production:
{code:sql}
create table t1 using parquet as select * from range(100);
create table t2 using parquet as select * from range(200);

create temp view v1 as  
   
select 'a' as event_type, * from t1 
   
union all   
   
select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * from 
t2 

explain select * from v1 where event_type = 'a';
{code}

Before this PR:


{noformat}
== Physical Plan ==
Union
:- *(1) Project [a AS event_type#30533, id#30535L]
:  +- *(1) ColumnarToRow
: +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: [], 
Format: Parquet
+- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END AS event_type#30534, id#30536L]
   +- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
END = a)
  +- *(2) ColumnarToRow
 +- FileScan parquet default.t2[id#30536L] Batched: true, DataFilters: 
[(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c END = a)], 
Format: Parquet
{noformat}

After this PR:


{noformat}
== Physical Plan ==
*(1) Project [a AS event_type#8, id#4L]
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
Format: Parquet
{noformat}


>  Simplify conditional
> -
>
> Key: SPARK-33910
> URL: https://issues.apache.org/jira/browse/SPARK-33910
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> Simplify CaseWhen/If with EqualTo if all values are Literal and always false, 
> this is a real case from production:
> {code:sql}
> create table t1 using parquet as select * from range(100);
> create table t2 using parquet as select * from range(200);
> create temp view v1 as
>  
> select 'a' as event_type, * from t1   
>  
> union all 
>  
> select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * 
> from t2 
> explain select * from v1 where event_type = 'a';
> {code}
> Before this PR:
> {noformat}
> == Physical Plan ==
> Union
> :- *(1) Project [a AS event_type#30533, id#30535L]
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: 
> [], Format: Parquet
> +- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END AS event_type#30534, id#30536L]
>+- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN 
> c END = a)
>   +- *(2) ColumnarToRow
>  +- FileScan parquet default.t2[id#30536L] Batched: true, 
> DataFilters: [(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END = a)], Format: Parquet
> {noformat}
> After this PR:
> {noformat}
> == Physical Plan ==
> *(1) Project [a AS event_type#8, id#4L]
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
> Format: Parquet
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33884) Simplify conditional if all branches are foldable boolean type

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33884:

Summary: Simplify conditional if all branches are foldable boolean type  
(was: Simplify CaseWhen when one clause is null and another is boolean)

> Simplify conditional if all branches are foldable boolean type
> --
>
> Key: SPARK-33884
> URL: https://issues.apache.org/jira/browse/SPARK-33884
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> This pr simplify {{CaseWhen}} when only one branch and one clause is null and 
> another is boolean. This simplify similar to SPARK-32721.
> ||Expression||After simplify||
> |case when cond then null else false end|and(cond, null)|
> |case when cond then null else true end|or(not(cond), null)|
> |case when cond then false else null end|and(not(cond), null)|
> |case when cond then false end|and(not(cond), null)|
> |case when cond then true else null end|or(cond, null)|
> |case when cond then true end|or(cond, null)|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33798) Add new rule to push down the foldable expressions through CaseWhen/If

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33798:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

> Add new rule to push down the foldable expressions through CaseWhen/If
> --
>
> Key: SPARK-33798
> URL: https://issues.apache.org/jira/browse/SPARK-33798
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Simplify CaseWhen/If with EqualTo if all values are Literal and always false, 
> this is a real case from production:
> {code:sql}
> create table t1 using parquet as select * from range(100);
> create table t2 using parquet as select * from range(200);
> create temp view v1 as
>  
> select 'a' as event_type, * from t1   
>  
> union all 
>  
> select CASE WHEN id = 1 THEN 'b' WHEN id = 3 THEN 'c' end as event_type, * 
> from t2 
> explain select * from v1 where event_type = 'a';
> {code}
> Before this PR:
> {noformat}
> == Physical Plan ==
> Union
> :- *(1) Project [a AS event_type#30533, id#30535L]
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t1[id#30535L] Batched: true, DataFilters: 
> [], Format: Parquet
> +- *(2) Project [CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END AS event_type#30534, id#30536L]
>+- *(2) Filter (CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN 
> c END = a)
>   +- *(2) ColumnarToRow
>  +- FileScan parquet default.t2[id#30536L] Batched: true, 
> DataFilters: [(CASE WHEN (id#30536L = 1) THEN b WHEN (id#30536L = 3) THEN c 
> END = a)], Format: Parquet
> {noformat}
> After this PR:
> {noformat}
> == Physical Plan ==
> *(1) Project [a AS event_type#8, id#4L]
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#4L] Batched: true, DataFilters: [], 
> Format: Parquet
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33845) Improve SimplifyConditionals

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33845:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

>  Improve SimplifyConditionals
> -
>
> Key: SPARK-33845
> URL: https://issues.apache.org/jira/browse/SPARK-33845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Simplify If(cond, TrueLiteral, FalseLiteral) to cond.
> Simplify If(cond, FalseLiteral, TrueLiteral) to Not(cond).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33847) Replace None of elseValue inside CaseWhen if all branches are FalseLiteral

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33847:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

> Replace None of elseValue inside CaseWhen if all branches are FalseLiteral
> --
>
> Key: SPARK-33847
> URL: https://issues.apache.org/jira/browse/SPARK-33847
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> {code:scala}
> spark.sql("create table t1 using parquet as select id from range(10)")
> spark.sql("select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 
> THEN 'b' end) = 'c' ").explain()
> {code}
> Before:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE 
> WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}
> After:
> {noformat}
> == Physical Plan ==
> LocalTableScan , [id#1L]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33848) Push the cast into (if / case) branches

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33848:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

> Push the cast into (if / case) branches
> ---
>
> Key: SPARK-33848
> URL: https://issues.apache.org/jira/browse/SPARK-33848
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> Push the cast into (if / case) branches. The use case is:
> {code:sql}
> create table t1 using parquet as select id from range(10);
> explain select id from t1 where (CASE WHEN id = 1 THEN '1' WHEN id = 3 THEN 
> '2' end) > 3;
> {code}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as 
> int) > 3)
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: 
> [(cast(CASE WHEN (id#1L = 1) THEN 1 WHEN (id#1L = 3) THEN 2 END as int) > 
> 3)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> LocalTableScan , [id#1L]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33861) Simplify conditional in predicate

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33861:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33884) Simplify CaseWhen when one clause is null and another is boolean

2020-12-24 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33884:

Parent: SPARK-33910
Issue Type: Sub-task  (was: Improvement)

> Simplify CaseWhen when one clause is null and another is boolean
> 
>
> Key: SPARK-33884
> URL: https://issues.apache.org/jira/browse/SPARK-33884
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Priority: Major
>
> This pr simplify {{CaseWhen}} when only one branch and one clause is null and 
> another is boolean. This simplify similar to SPARK-32721.
> ||Expression||After simplify||
> |case when cond then null else false end|and(cond, null)|
> |case when cond then null else true end|or(not(cond), null)|
> |case when cond then false else null end|and(not(cond), null)|
> |case when cond then false end|and(not(cond), null)|
> |case when cond then true else null end|or(cond, null)|
> |case when cond then true end|or(cond, null)|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33910) Simplify conditional

2020-12-24 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-33910:
---

 Summary:  Simplify conditional
 Key: SPARK-33910
 URL: https://issues.apache.org/jira/browse/SPARK-33910
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33804) Cleanup "view bounds are deprecated" compilation warnings

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254735#comment-17254735
 ] 

Apache Spark commented on SPARK-33804:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30924

> Cleanup "view bounds are deprecated" compilation warnings
> -
>
> Key: SPARK-33804
> URL: https://issues.apache.org/jira/browse/SPARK-33804
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are only 3 compilation warnings related to `view bounds are deprecated` 
> in SequenceFileRDDFunctions:
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:55:
>  view bounds are deprecated; use an implicit parameter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33804) Cleanup "view bounds are deprecated" compilation warnings

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33804:


Assignee: (was: Apache Spark)

> Cleanup "view bounds are deprecated" compilation warnings
> -
>
> Key: SPARK-33804
> URL: https://issues.apache.org/jira/browse/SPARK-33804
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Priority: Minor
>
> There are only 3 compilation warnings related to `view bounds are deprecated` 
> in SequenceFileRDDFunctions:
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:55:
>  view bounds are deprecated; use an implicit parameter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33804) Cleanup "view bounds are deprecated" compilation warnings

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33804:


Assignee: Apache Spark

> Cleanup "view bounds are deprecated" compilation warnings
> -
>
> Key: SPARK-33804
> URL: https://issues.apache.org/jira/browse/SPARK-33804
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> There are only 3 compilation warnings related to `view bounds are deprecated` 
> in SequenceFileRDDFunctions:
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:35:
>  view bounds are deprecated; use an implicit parameter instead.
> [WARNING] 
> /spark-source/core/src/main/scala/org/apache/spark/rdd/SequenceFileRDDFunctions.scala:55:
>  view bounds are deprecated; use an implicit parameter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33909:


Assignee: (was: Apache Spark)

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> It's better to check seed expression is legal at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254730#comment-17254730
 ] 

Apache Spark commented on SPARK-33909:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30923

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> It's better to check seed expression is legal at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33909:


Assignee: Apache Spark

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> It's better to check seed expression is legal at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254731#comment-17254731
 ] 

Apache Spark commented on SPARK-33909:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30923

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> It's better to check seed expression is legal at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33908:


Assignee: Apache Spark

> Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
> 
>
> Key: SPARK-33908
> URL: https://issues.apache.org/jira/browse/SPARK-33908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33908:


Assignee: (was: Apache Spark)

> Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
> 
>
> Key: SPARK-33908
> URL: https://issues.apache.org/jira/browse/SPARK-33908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254720#comment-17254720
 ] 

Apache Spark commented on SPARK-33908:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30922

> Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
> 
>
> Key: SPARK-33908
> URL: https://issues.apache.org/jira/browse/SPARK-33908
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33909:

Description: It's better to check seed expression is legal at analyzer side 
instead of execution.  (was: It's better to check seed expression data type at 
analyzer side instead of execution.)

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> It's better to check seed expression is legal at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33909) Check rand functions seed is legal at analyer side

2020-12-24 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33909:

Summary: Check rand functions seed is legal at analyer side  (was: Check 
rand functions seed data type at analyer side)

> Check rand functions seed is legal at analyer side
> --
>
> Key: SPARK-33909
> URL: https://issues.apache.org/jira/browse/SPARK-33909
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> It's better to check seed expression data type at analyzer side instead of 
> execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33909) Check rand functions seed data type at analyer side

2020-12-24 Thread ulysses you (Jira)
ulysses you created SPARK-33909:
---

 Summary: Check rand functions seed data type at analyer side
 Key: SPARK-33909
 URL: https://issues.apache.org/jira/browse/SPARK-33909
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you


It's better to check seed expression data type at analyzer side instead of 
execution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter

2020-12-24 Thread angerszhu (Jira)
angerszhu created SPARK-33908:
-

 Summary: Refact SparkSubmitUtils.resolveMavenCoordinates return 
parameter
 Key: SPARK-33908
 URL: https://issues.apache.org/jira/browse/SPARK-33908
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: angerszhu


Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33084) Add jar support ivy path

2020-12-24 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-33084.
--
Fix Version/s: 3.2.0
 Assignee: angerszhu
   Resolution: Fixed

Resolve by https://github.com/apache/spark/pull/29966

> Add jar support ivy path
> 
>
> Key: SPARK-33084
> URL: https://issues.apache.org/jira/browse/SPARK-33084
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> Support add jar with ivy path



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30027) Support codegen for filter exprs in HashAggregateExec

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30027:
-

Assignee: Takeshi Yamamuro

> Support codegen for filter exprs in HashAggregateExec
> -
>
> Key: SPARK-30027
> URL: https://issues.apache.org/jira/browse/SPARK-30027
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
>
> This intends to support codegen for filter exprs in HashAggregateExec.
> This is a follow-up work of https://issues.apache.org/jira/browse/SPARK-27986



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30027) Support codegen for filter exprs in HashAggregateExec

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30027.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 27019
[https://github.com/apache/spark/pull/27019]

> Support codegen for filter exprs in HashAggregateExec
> -
>
> Key: SPARK-30027
> URL: https://issues.apache.org/jira/browse/SPARK-30027
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.2.0
>
>
> This intends to support codegen for filter exprs in HashAggregateExec.
> This is a follow-up work of https://issues.apache.org/jira/browse/SPARK-27986



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33857) Unify random functions and make Uuid Shuffle support seed in SQL

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33857.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30864
[https://github.com/apache/spark/pull/30864]

> Unify random functions and make Uuid Shuffle support seed in SQL
> 
>
> Key: SPARK-33857
> URL: https://issues.apache.org/jira/browse/SPARK-33857
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.2.0
>
>
> Unify the seed of random functions
> 1. Add a hold place expression `DefaultSeed` as the defualt seed.
> 2. Change `Rand`,`Randn`,`Uuid`,`Shuffle` default seed to `DefaultSeed`.
> 3. Replace `DefaultSeed` to real seed at `ResolveRandomSeed` rule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33857) Unify random functions and make Uuid Shuffle support seed in SQL

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33857:
-

Assignee: ulysses you

> Unify random functions and make Uuid Shuffle support seed in SQL
> 
>
> Key: SPARK-33857
> URL: https://issues.apache.org/jira/browse/SPARK-33857
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
>
> Unify the seed of random functions
> 1. Add a hold place expression `DefaultSeed` as the defualt seed.
> 2. Change `Rand`,`Randn`,`Uuid`,`Shuffle` default seed to `DefaultSeed`.
> 3. Replace `DefaultSeed` to real seed at `ResolveRandomSeed` rule.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33907.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30921
[https://github.com/apache/spark/pull/30921]

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.1.0
>
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-12-24 Thread Jim Huang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254670#comment-17254670
 ] 

Jim Huang commented on SPARK-31685:
---

Thanks Rajeev for all the work you have done on this ticket so far.  

I am running into the same issue with the following stack and spark version
 * Hadoop 2.7.3
 * spark-2.4.5-bin-without-hadoop
 * yarn-client mode

The logic path to trigger this bug is a bit elusive and difficult to pinpoint 
because I have observed only few occurrences and each time the complete runtime 
duration of the Spark Structure Streaming job's wallclock time seem to be 
random (hundreds of hours). 

Currently, I *believe* it may have to do with another runtime event taking 
place.  The current hypothesis: when the original YARN node running one of the 
Spark executor fails for any reason (i.e. YARN healthcheck-script, YARN node 
being decommissioned, YARN container preemption, etc..), a new YARN container 
is assigned and started up on another YARN node by the YARN AM (Application 
Manager).  The symptom is reported by YARN AM that it tried to restart that 
particular Spark executor task within that container 3 times and failed with 
the exact error message reported here and caused the entire Spark job to fail.  
I believe this external event is another logic path that eventually hit the 
code you are testing.  

 

> Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN 
> expiration issue
> ---
>
> Key: SPARK-31685
> URL: https://issues.apache.org/jira/browse/SPARK-31685
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: Rajeev Kumar
>Priority: Major
>
> I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
> streaming with Kafka. Reading the stream from Kafka and saving it to HBase.
> I get this error on the driver after 24 hours.
>  
> {code:java}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at 
> org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171)
> at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630)
> at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> 

[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254669#comment-17254669
 ] 

Baohe Zhang commented on SPARK-33906:
-

The more underlay reason seems to be that the stage complete within a heartbeat 
period, so the heartbeat doesn't piggyback executor peak memory metrics.

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Blocker
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254668#comment-17254668
 ] 

Baohe Zhang commented on SPARK-33906:
-

[~dongjoon] Yes.

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Blocker
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23432) Expose executor memory metrics in the web UI for executors

2020-12-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254667#comment-17254667
 ] 

Dongjoon Hyun commented on SPARK-23432:
---

SPARK-33906 seems to report an issue at this patch.

> Expose executor memory metrics in the web UI for executors
> --
>
> Key: SPARK-23432
> URL: https://issues.apache.org/jira/browse/SPARK-23432
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Edward Lu
>Assignee: Zhongwei Zhu
>Priority: Major
> Fix For: 3.1.0
>
>
> Add the new memory metrics (jvmUsedMemory, executionMemory, storageMemory, 
> and unifiedMemory, etc.) to the executors tab, in the summary and for each 
> executor.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254666#comment-17254666
 ] 

Dongjoon Hyun commented on SPARK-33906:
---

Thank you for reporting, [~Baohe Zhang]. Is this caused by SPARK-23432?

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Blocker
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33906:
--
Priority: Blocker  (was: Major)

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Blocker
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33907:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33907:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33907:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254658#comment-17254658
 ] 

Apache Spark commented on SPARK-33907:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30921

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-33907:

Affects Version/s: 3.2.0

> Only prune columns of from_json if parsing options is empty
> ---
>
> Key: SPARK-33907
> URL: https://issues.apache.org/jira/browse/SPARK-33907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> For safety, we should only prune columns from from_json expression if the 
> parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33907) Only prune columns of from_json if parsing options is empty

2020-12-24 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-33907:
---

 Summary: Only prune columns of from_json if parsing options is 
empty
 Key: SPARK-33907
 URL: https://issues.apache.org/jira/browse/SPARK-33907
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


For safety, we should only prune columns from from_json expression if the 
parsing option is empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33784) Rename dataSourceRewriteRules and customDataSourceRewriteRules in BaseSessionStateBuilder

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33784.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/30917

> Rename dataSourceRewriteRules and customDataSourceRewriteRules in 
> BaseSessionStateBuilder
> -
>
> Key: SPARK-33784
> URL: https://issues.apache.org/jira/browse/SPARK-33784
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Anton Okolnychyi
>Priority: Blocker
> Fix For: 3.1.0
>
>
> This is under discussion at 
> https://github.com/apache/spark/pull/30558#discussion_r533885837.
> We happened to have rule extension that are not specific to Data source 
> rewrites (SPARK-33612), but we named it so, and people agree with having a 
> good name here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33621) Add a way to inject data source rewrite rules

2020-12-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33621:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> Add a way to inject data source rewrite rules
> -
>
> Key: SPARK-33621
> URL: https://issues.apache.org/jira/browse/SPARK-33621
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.1.0
>
>
> {{SparkSessionExtensions}} allow us to inject optimization rules but they are 
> added to operator optimization batch. There are cases when users need to run 
> rules after the operator optimization batch. Currently, this is not possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33906:


Assignee: Apache Spark

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Assignee: Apache Spark
>Priority: Major
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33906:


Assignee: (was: Apache Spark)

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Major
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33621) Add a way to inject data source rewrite rules

2020-12-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254651#comment-17254651
 ] 

Dongjoon Hyun commented on SPARK-33621:
---

This landed at branch-3.1 via https://github.com/apache/spark/pull/30917

> Add a way to inject data source rewrite rules
> -
>
> Key: SPARK-33621
> URL: https://issues.apache.org/jira/browse/SPARK-33621
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.1.0
>
>
> {{SparkSessionExtensions}} allow us to inject optimization rules but they are 
> added to operator optimization batch. There are cases when users need to run 
> rules after the operator optimization batch. Currently, this is not possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254650#comment-17254650
 ] 

Apache Spark commented on SPARK-33906:
--

User 'baohe-zhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30920

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Major
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33621) Add a way to inject data source rewrite rules

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254649#comment-17254649
 ] 

Apache Spark commented on SPARK-33621:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/30917

> Add a way to inject data source rewrite rules
> -
>
> Key: SPARK-33621
> URL: https://issues.apache.org/jira/browse/SPARK-33621
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> {{SparkSessionExtensions}} allow us to inject optimization rules but they are 
> added to operator optimization batch. There are cases when users need to run 
> rules after the operator optimization batch. Currently, this is not possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33621) Add a way to inject data source rewrite rules

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254648#comment-17254648
 ] 

Apache Spark commented on SPARK-33621:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/30917

> Add a way to inject data source rewrite rules
> -
>
> Key: SPARK-33621
> URL: https://issues.apache.org/jira/browse/SPARK-33621
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> {{SparkSessionExtensions}} allow us to inject optimization rules but they are 
> added to operator optimization batch. There are cases when users need to run 
> rules after the operator optimization batch. Currently, this is not possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Baohe Zhang updated SPARK-33906:

Description: 
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 10, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

 !executor-page.png! 

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
"JVMHeapMemory" : 135021152,
"JVMOffHeapMemory" : 149558576,
"OnHeapExecutionMemory" : 0,
"OffHeapExecutionMemory" : 0,
"OnHeapStorageMemory" : 3301,
"OffHeapStorageMemory" : 0,
"OnHeapUnifiedMemory" : 3301,
"OffHeapUnifiedMemory" : 0,
"DirectPoolMemory" : 67963178,
"MappedPoolMemory" : 0,
"ProcessTreeJVMVMemory" : 0,
"ProcessTreeJVMRSSMemory" : 0,
"ProcessTreePythonVMemory" : 0,
"ProcessTreePythonRSSMemory" : 0,
"ProcessTreeOtherVMemory" : 0,
"ProcessTreeOtherRSSMemory" : 0,
"MinorGCCount" : 15,
"MinorGCTime" : 101,
"MajorGCCount" : 0,
"MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
"stdout" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
"stderr" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
  },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
None in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
 The possible reason for returning the empty map is that the stage completion 
time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
has already been removed before the reportHeartbeat is called.

How to fix it?

Check if the peakMemoryMetrics is undefined in executorspage.js.

  was:
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 10, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

 !executor-page.png! 

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
 

[jira] [Commented] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254647#comment-17254647
 ] 

Baohe Zhang commented on SPARK-33906:
-

I will put a PR soon.

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Major
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Baohe Zhang updated SPARK-33906:

Description: 
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 10, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

 !executor-page.png! 

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
"JVMHeapMemory" : 135021152,
"JVMOffHeapMemory" : 149558576,
"OnHeapExecutionMemory" : 0,
"OffHeapExecutionMemory" : 0,
"OnHeapStorageMemory" : 3301,
"OffHeapStorageMemory" : 0,
"OnHeapUnifiedMemory" : 3301,
"OffHeapUnifiedMemory" : 0,
"DirectPoolMemory" : 67963178,
"MappedPoolMemory" : 0,
"ProcessTreeJVMVMemory" : 0,
"ProcessTreeJVMRSSMemory" : 0,
"ProcessTreePythonVMemory" : 0,
"ProcessTreePythonRSSMemory" : 0,
"ProcessTreeOtherVMemory" : 0,
"ProcessTreeOtherRSSMemory" : 0,
"MinorGCCount" : 15,
"MinorGCTime" : 101,
"MajorGCCount" : 0,
"MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
"stdout" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
"stderr" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
  },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
None in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
 The possible reason for returning the empty map is that the stage completion 
time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
has already been removed before the reportHeartbeat is called.

  was:
How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 10, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

!image-2020-12-24-14-12-22-983.png!

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {

[jira] [Updated] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Baohe Zhang updated SPARK-33906:

Attachment: executor-page.png

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Priority: Major
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
> !image-2020-12-24-14-12-22-983.png!
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-24 Thread Baohe Zhang (Jira)
Baohe Zhang created SPARK-33906:
---

 Summary: SPARK UI Executors page stuck when 
ExecutorSummary.peakMemoryMetrics is unset
 Key: SPARK-33906
 URL: https://issues.apache.org/jira/browse/SPARK-33906
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.2.0
Reporter: Baohe Zhang


How to reproduce it?

In mac OS standalone mode, open a spark-shell and run

$SPARK_HOME/bin/spark-shell --master spark://localhost:7077
{code:scala}
val x = sc.makeRDD(1 to 10, 5)
x.count()
{code}
Then open the app UI in the browser, and click the Executors page, will get 
stuck at this page: 

!image-2020-12-24-14-12-22-983.png!

Also the return JSON of REST API endpoint 
http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
miss "peakMemoryMetrics" for executors.
{noformat}
[ {
  "id" : "driver",
  "hostPort" : "192.168.1.241:50042",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 0,
  "maxTasks" : 0,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 0,
  "totalTasks" : 0,
  "totalDuration" : 0,
  "totalGCTime" : 0,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:18.033GMT",
  "executorLogs" : { },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "peakMemoryMetrics" : {
"JVMHeapMemory" : 135021152,
"JVMOffHeapMemory" : 149558576,
"OnHeapExecutionMemory" : 0,
"OffHeapExecutionMemory" : 0,
"OnHeapStorageMemory" : 3301,
"OffHeapStorageMemory" : 0,
"OnHeapUnifiedMemory" : 3301,
"OffHeapUnifiedMemory" : 0,
"DirectPoolMemory" : 67963178,
"MappedPoolMemory" : 0,
"ProcessTreeJVMVMemory" : 0,
"ProcessTreeJVMRSSMemory" : 0,
"ProcessTreePythonVMemory" : 0,
"ProcessTreePythonRSSMemory" : 0,
"ProcessTreeOtherVMemory" : 0,
"ProcessTreeOtherRSSMemory" : 0,
"MinorGCCount" : 15,
"MinorGCTime" : 101,
"MajorGCCount" : 0,
"MajorGCTime" : 0
  },
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
}, {
  "id" : "0",
  "hostPort" : "192.168.1.241:50054",
  "isActive" : true,
  "rddBlocks" : 0,
  "memoryUsed" : 0,
  "diskUsed" : 0,
  "totalCores" : 12,
  "maxTasks" : 12,
  "activeTasks" : 0,
  "failedTasks" : 0,
  "completedTasks" : 5,
  "totalTasks" : 5,
  "totalDuration" : 2107,
  "totalGCTime" : 25,
  "totalInputBytes" : 0,
  "totalShuffleRead" : 0,
  "totalShuffleWrite" : 0,
  "isBlacklisted" : false,
  "maxMemory" : 455501414,
  "addTime" : "2020-12-24T19:44:20.335GMT",
  "executorLogs" : {
"stdout" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
"stderr" : 
"http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
  },
  "memoryMetrics" : {
"usedOnHeapStorageMemory" : 0,
"usedOffHeapStorageMemory" : 0,
"totalOnHeapStorageMemory" : 455501414,
"totalOffHeapStorageMemory" : 0
  },
  "blacklistedInStages" : [ ],
  "attributes" : { },
  "resources" : { },
  "resourceProfileId" : 0,
  "isExcluded" : false,
  "excludedInStages" : [ ]
} ]
{noformat}

I debugged it and observed that ExecutorMetricsPoller
.getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
None in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
 The possible reason for returning the empty map is that the stage completion 
time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
has already been removed before the reportHeartbeat is called.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33905) Can't configure response header size

2020-12-24 Thread jcc (Jira)
jcc created SPARK-33905:
---

 Summary: Can't configure response header size
 Key: SPARK-33905
 URL: https://issues.apache.org/jira/browse/SPARK-33905
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 3.0.0
Reporter: jcc


We use JWT tokens to secure your spark UI. These tokens can get quite large. We 
have configured `{{spark.ui.requestHeaderSize}}`. However we also use the spark 
master UI as a proxy for our workers and for driver apps. When the HTTP request 
flow through the spark master UI we get this error when building the responses.

We believe a `{{spark.ui.responseHeaderSize` would solve this issue. Jetty's 
HttpConfig has a property to set the response header max size.}}

 

org.sparkproject.jetty.http.BadMessageException: 500: Request header too large
 at 
org.sparkproject.jetty.http.HttpGenerator.generateRequest(HttpGenerator.java:279)
 at 
org.sparkproject.jetty.client.http.HttpSenderOverHTTP$HeadersCallback.process(HttpSenderOverHTTP.java:231)
 at 
org.sparkproject.jetty.util.IteratingCallback.processing(IteratingCallback.java:241)
 at 
org.sparkproject.jetty.util.IteratingCallback.iterate(IteratingCallback.java:224)
 at 
org.sparkproject.jetty.client.http.HttpSenderOverHTTP.sendHeaders(HttpSenderOverHTTP.java:62)
 at org.sparkproject.jetty.client.HttpSender.send(HttpSender.java:214)
 at 
org.sparkproject.jetty.client.http.HttpChannelOverHTTP.send(HttpChannelOverHTTP.java:85)
 at org.sparkproject.jetty.client.HttpChannel.send(HttpChannel.java:128)
 at org.sparkproject.jetty.client.HttpConnection.send(HttpConnection.java:201)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33830) Describe the PURGE option of ALTER TABLE .. DROP PARTITION

2020-12-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-33830.

Resolution: Won't Fix

> Describe the PURGE option of ALTER TABLE .. DROP PARTITION
> --
>
> Key: SPARK-33830
> URL: https://issues.apache.org/jira/browse/SPARK-33830
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33904) Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33904:


Assignee: Apache Spark

> Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`
> ---
>
> Key: SPARK-33904
> URL: https://issues.apache.org/jira/browse/SPARK-33904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The v1 INSERT INTO command recognizes `spark_catalog` as the default session 
> catalog:
> {code:sql}
> spark-sql> create table spark_catalog.ns.tbl (c int);
> spark-sql> insert into spark_catalog.ns.tbl select 0;
> spark-sql> select * from spark_catalog.ns.tbl;
> 0
> {code}
> but the `saveAsTable()` and `insertInto()` methods don't allow to write a 
> table with explicitly specified catalog spark_catalog:
> {code:scala}
> scala> sql("CREATE NAMESPACE spark_catalog.ns")
> scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
>   ... 47 elided
> scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
>   ... 47 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33904) Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33904:


Assignee: (was: Apache Spark)

> Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`
> ---
>
> Key: SPARK-33904
> URL: https://issues.apache.org/jira/browse/SPARK-33904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v1 INSERT INTO command recognizes `spark_catalog` as the default session 
> catalog:
> {code:sql}
> spark-sql> create table spark_catalog.ns.tbl (c int);
> spark-sql> insert into spark_catalog.ns.tbl select 0;
> spark-sql> select * from spark_catalog.ns.tbl;
> 0
> {code}
> but the `saveAsTable()` and `insertInto()` methods don't allow to write a 
> table with explicitly specified catalog spark_catalog:
> {code:scala}
> scala> sql("CREATE NAMESPACE spark_catalog.ns")
> scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
>   ... 47 elided
> scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
>   ... 47 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33904) Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254628#comment-17254628
 ] 

Apache Spark commented on SPARK-33904:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30919

> Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`
> ---
>
> Key: SPARK-33904
> URL: https://issues.apache.org/jira/browse/SPARK-33904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v1 INSERT INTO command recognizes `spark_catalog` as the default session 
> catalog:
> {code:sql}
> spark-sql> create table spark_catalog.ns.tbl (c int);
> spark-sql> insert into spark_catalog.ns.tbl select 0;
> spark-sql> select * from spark_catalog.ns.tbl;
> 0
> {code}
> but the `saveAsTable()` and `insertInto()` methods don't allow to write a 
> table with explicitly specified catalog spark_catalog:
> {code:scala}
> scala> sql("CREATE NAMESPACE spark_catalog.ns")
> scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
>   ... 47 elided
> scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
>   ... 47 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33904) Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254627#comment-17254627
 ] 

Apache Spark commented on SPARK-33904:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30919

> Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`
> ---
>
> Key: SPARK-33904
> URL: https://issues.apache.org/jira/browse/SPARK-33904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v1 INSERT INTO command recognizes `spark_catalog` as the default session 
> catalog:
> {code:sql}
> spark-sql> create table spark_catalog.ns.tbl (c int);
> spark-sql> insert into spark_catalog.ns.tbl select 0;
> spark-sql> select * from spark_catalog.ns.tbl;
> 0
> {code}
> but the `saveAsTable()` and `insertInto()` methods don't allow to write a 
> table with explicitly specified catalog spark_catalog:
> {code:scala}
> scala> sql("CREATE NAMESPACE spark_catalog.ns")
> scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
>   ... 47 elided
> scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
> org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
> identifier spark_catalog.ns.tbl.
>   at 
> org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
>   ... 47 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33904) Recognize `spark_catalog` in `saveAsTable()` and `insertInto()`

2020-12-24 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33904:
--

 Summary: Recognize `spark_catalog` in `saveAsTable()` and 
`insertInto()`
 Key: SPARK-33904
 URL: https://issues.apache.org/jira/browse/SPARK-33904
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The v1 INSERT INTO command recognizes `spark_catalog` as the default session 
catalog:
{code:sql}
spark-sql> create table spark_catalog.ns.tbl (c int);
spark-sql> insert into spark_catalog.ns.tbl select 0;
spark-sql> select * from spark_catalog.ns.tbl;
0
{code}
but the `saveAsTable()` and `insertInto()` methods don't allow to write a table 
with explicitly specified catalog spark_catalog:
{code:scala}
scala> sql("CREATE NAMESPACE spark_catalog.ns")
scala> Seq(0).toDF().write.saveAsTable("spark_catalog.ns.tbl")
org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
identifier spark_catalog.ns.tbl.
  at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:629)
  ... 47 elided
scala> Seq(0).toDF().write.insertInto("spark_catalog.ns.tbl")
org.apache.spark.sql.AnalysisException: Couldn't find a catalog to handle the 
identifier spark_catalog.ns.tbl.
  at org.apache.spark.sql.DataFrameWriter.insertInto(DataFrameWriter.scala:498)
  ... 47 elided
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33900:
---
Fix Version/s: 3.2.0
   3.1.0
   3.0.2

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta resolved SPARK-33900.

Resolution: Fixed

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta reassigned SPARK-33900:
--

Assignee: dzcxzl

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.0.2, 3.1.0, 3.2.0
>
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33901) Char and Varchar display error after DDLs

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33901:


Assignee: (was: Apache Spark)

> Char and Varchar display error after DDLs
> -
>
> Key: SPARK-33901
> URL: https://issues.apache.org/jira/browse/SPARK-33901
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33901) Char and Varchar display error after DDLs

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33901:


Assignee: Apache Spark

> Char and Varchar display error after DDLs
> -
>
> Key: SPARK-33901
> URL: https://issues.apache.org/jira/browse/SPARK-33901
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33901) Char and Varchar display error after DDLs

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33901:


Assignee: Apache Spark

> Char and Varchar display error after DDLs
> -
>
> Key: SPARK-33901
> URL: https://issues.apache.org/jira/browse/SPARK-33901
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33901) Char and Varchar display error after DDLs

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254550#comment-17254550
 ] 

Apache Spark commented on SPARK-33901:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/30918

> Char and Varchar display error after DDLs
> -
>
> Key: SPARK-33901
> URL: https://issues.apache.org/jira/browse/SPARK-33901
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33903) CREATE VIEW AS SELECT FOR V2

2020-12-24 Thread Kent Yao (Jira)
Kent Yao created SPARK-33903:


 Summary: CREATE VIEW AS SELECT FOR V2
 Key: SPARK-33903
 URL: https://issues.apache.org/jira/browse/SPARK-33903
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33902) CREATE TABLE LIKE FOR V2

2020-12-24 Thread Kent Yao (Jira)
Kent Yao created SPARK-33902:


 Summary: CREATE TABLE LIKE FOR V2
 Key: SPARK-33902
 URL: https://issues.apache.org/jira/browse/SPARK-33902
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33901) Char and Varchar display error after DDLs

2020-12-24 Thread Kent Yao (Jira)
Kent Yao created SPARK-33901:


 Summary: Char and Varchar display error after DDLs
 Key: SPARK-33901
 URL: https://issues.apache.org/jira/browse/SPARK-33901
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao


CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33784) Rename dataSourceRewriteRules and customDataSourceRewriteRules in BaseSessionStateBuilder

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254514#comment-17254514
 ] 

Apache Spark commented on SPARK-33784:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/30917

> Rename dataSourceRewriteRules and customDataSourceRewriteRules in 
> BaseSessionStateBuilder
> -
>
> Key: SPARK-33784
> URL: https://issues.apache.org/jira/browse/SPARK-33784
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Anton Okolnychyi
>Priority: Blocker
>
> This is under discussion at 
> https://github.com/apache/spark/pull/30558#discussion_r533885837.
> We happened to have rule extension that are not specific to Data source 
> rewrites (SPARK-33612), but we named it so, and people agree with having a 
> good name here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254487#comment-17254487
 ] 

Apache Spark commented on SPARK-33900:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/30916

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33900:


Assignee: Apache Spark

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Assignee: Apache Spark
>Priority: Trivial
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33900:


Assignee: (was: Apache Spark)

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254485#comment-17254485
 ] 

Apache Spark commented on SPARK-33900:
--

User 'cxzl25' has created a pull request for this issue:
https://github.com/apache/spark/pull/30916

> Show shuffle read size / records correctly when only remotebytesread is 
> available
> -
>
> Key: SPARK-33900
> URL: https://issues.apache.org/jira/browse/SPARK-33900
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the stage page only displays the data of Shuffle Read Size / 
> Records when localBytesRead>0.
> Sometimes the data of shuffle read metrics is remoteBytesRead>0 
> localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33900) Show shuffle read size / records correctly when only remotebytesread is available

2020-12-24 Thread dzcxzl (Jira)
dzcxzl created SPARK-33900:
--

 Summary: Show shuffle read size / records correctly when only 
remotebytesread is available
 Key: SPARK-33900
 URL: https://issues.apache.org/jira/browse/SPARK-33900
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.1
Reporter: dzcxzl


At present, the stage page only displays the data of Shuffle Read Size / 
Records when localBytesRead>0.

Sometimes the data of shuffle read metrics is remoteBytesRead>0 
localBytesRead=0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254481#comment-17254481
 ] 

Apache Spark commented on SPARK-33899:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30915

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33899:


Assignee: Apache Spark

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33899:


Assignee: (was: Apache Spark)

> v1 SHOW TABLES fails with assert on spark_catalog
> -
>
> Key: SPARK-33899
> URL: https://issues.apache.org/jira/browse/SPARK-33899
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
> assert when a database is not specified:
> {code:sql}
> spark-sql> show tables in spark_catalog;
> 20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in 
> spark_catalog]
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:208)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
>   at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33899) v1 SHOW TABLES fails with assert on spark_catalog

2020-12-24 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33899:
--

 Summary: v1 SHOW TABLES fails with assert on spark_catalog
 Key: SPARK-33899
 URL: https://issues.apache.org/jira/browse/SPARK-33899
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The v1 SHOW TABLES, SHOW TABLE EXTENDED and SHOW VIEWS fail with internal 
assert when a database is not specified:
{code:sql}
spark-sql> show tables in spark_catalog;
20/12/24 11:19:46 ERROR SparkSQLDriver: Failed in [show tables in spark_catalog]
java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:208)
at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:366)
at 
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:49)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$3(AnalysisHelper.scala:90)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33892) Char and Varchar display in show table/column definition command

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33892:
---

Assignee: Kent Yao

> Char and Varchar display in show table/column definition command
> 
>
> Key: SPARK-33892
> URL: https://issues.apache.org/jira/browse/SPARK-33892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> shows char/varchar raw type in desc/show table/column command



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33892) Char and Varchar display in show table/column definition command

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33892.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30908
[https://github.com/apache/spark/pull/30908]

> Char and Varchar display in show table/column definition command
> 
>
> Key: SPARK-33892
> URL: https://issues.apache.org/jira/browse/SPARK-33892
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> shows char/varchar raw type in desc/show table/column command



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33881) Check null and empty string as partition values in v1 and v2 tests

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33881:
---

Assignee: Maxim Gekk

> Check null and empty string as partition values in v1 and v2 tests
> --
>
> Key: SPARK-33881
> URL: https://issues.apache.org/jira/browse/SPARK-33881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33881) Check null and empty string as partition values in v1 and v2 tests

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33881.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30893
[https://github.com/apache/spark/pull/30893]

> Check null and empty string as partition values in v1 and v2 tests
> --
>
> Key: SPARK-33881
> URL: https://issues.apache.org/jira/browse/SPARK-33881
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33443) LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33443:
---

Assignee: jiaan.geng

> LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]
> 
>
> Key: SPARK-33443
> URL: https://issues.apache.org/jira/browse/SPARK-33443
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> The current implement of LEAD/LAG don't support IGNORE/RESPECT NULLS, but the 
> mainstream database support this syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33443) LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33443.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30387
[https://github.com/apache/spark/pull/30387]

> LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]
> 
>
> Key: SPARK-33443
> URL: https://issues.apache.org/jira/browse/SPARK-33443
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.2.0
>
>
> The current implement of LEAD/LAG don't support IGNORE/RESPECT NULLS, but the 
> mainstream database support this syntax.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33861) Simplify conditional in predicate

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33861.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30865
[https://github.com/apache/spark/pull/30865]

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.2.0
>
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33861) Simplify conditional in predicate

2020-12-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33861:
---

Assignee: Yuming Wang

> Simplify conditional in predicate
> -
>
> Key: SPARK-33861
> URL: https://issues.apache.org/jira/browse/SPARK-33861
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> The use case is:
> {noformat}
> spark.sql("create table t1 using parquet as select id as a, id as b from 
> range(10)")
> spark.sql("select * from t1 where CASE WHEN a > 2 THEN b + 10 END > 
> 5").explain()
> {noformat}
> Before this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [CASE WHEN (a#3L > 2) THEN ((b#4L + 10) > 5) END], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct
> {noformat}
> After this pr:
> {noformat}
> == Physical Plan ==
> *(1) Filter (((isnotnull(a#3L) AND isnotnull(b#4L)) AND (a#3L > 2)) AND 
> ((b#4L + 10) > 5))
> +- *(1) ColumnarToRow
>+- FileScan parquet default.t1[a#3L,b#4L] Batched: true, DataFilters: 
> [isnotnull(a#3L), isnotnull(b#4L), (a#3L > 2), ((b#4L + 10) > 5)], Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a), IsNotNull(b), 
> GreaterThan(a,2)], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org