[jira] [Commented] (SPARK-21492) Memory leak in SortMergeJoin

2019-09-11 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928204#comment-16928204
 ] 

zhoukang commented on SPARK-21492:
--

Any progress of this issue? [~jiangxb1987]
We also encountered this problem

> Memory leak in SortMergeJoin
> 
>
> Key: SPARK-21492
> URL: https://issues.apache.org/jira/browse/SPARK-21492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0
>Reporter: Zhan Zhang
>Priority: Major
>
> In SortMergeJoin, if the iterator is not exhausted, there will be memory leak 
> caused by the Sort. The memory is not released until the task end, and cannot 
> be used by other operators causing performance drop or OOM.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928180#comment-16928180
 ] 

feiwang commented on SPARK-29037:
-

[~cloud_fan]

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly,  when this task complete, it will save output to committedTaskPath, 
> when all tasks of this stage success, all task output under committedTaskPath 
> will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in committedTaskPath, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> committedTaskPath dir.
> And when the task commit stage of new application success, all task output 
> under this committedTaskPath, which contains parts of old application's task 
> output , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Comment: was deleted

(was: If we have several applications, which insert overwrite a partition of 
same table, running  same time. There may be data corruption when they commit 
task output same time.)

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly,  when this task complete, it will save output to committedTaskPath, 
> when all tasks of this stage success, all task output under committedTaskPath 
> will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in committedTaskPath, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> committedTaskPath dir.
> And when the task commit stage of new application success, all task output 
> under this committedTaskPath, which contains parts of old application's task 
> output , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Description: 
When we insert overwrite a partition of table.
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly,  when this task complete, it will save output to committedTaskPath, 
when all tasks of this stage success, all task output under committedTaskPath 
will be moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in committedTaskPath, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this 
committedTaskPath dir.

And when the task commit stage of new application success, all task output 
under this committedTaskPath, which contains parts of old application's task 
output , would be moved to destination dir and the result is duplicated.



  was:
When we insert overwrite a partition of table.
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly,  when this task complete, it will save output to 
when all tasks of this stage success, all task output under staging dir will be 
moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in staging dir, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this staging 
dir.

And when the task commit stage of new application success, all task output 
under this staging dir, which contains parts of old application's task output , 
would be moved to destination dir and the result is duplicated.




> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly,  when this task complete, it will save output to committedTaskPath, 
> when all tasks of this stage success, all task output under committedTaskPath 
> will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in committedTaskPath, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> committedTaskPath dir.
> And when the task commit stage of new application success, all task output 
> under this committedTaskPath, which contains parts of old application's task 
> output , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29064) Add PrometheusResource to export Executor metrics

2019-09-11 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-29064:
-

 Summary: Add PrometheusResource to export Executor metrics
 Key: SPARK-29064
 URL: https://issues.apache.org/jira/browse/SPARK-29064
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Description: 
When we insert overwrite a partition of table.
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly,  when this task complete, it will save output to 
when all tasks of this stage success, all task output under staging dir will be 
moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in staging dir, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this staging 
dir.

And when the task commit stage of new application success, all task output 
under this staging dir, which contains parts of old application's task output , 
would be moved to destination dir and the result is duplicated.



  was:
When we insert overwrite a partition of table.
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly, when all tasks of this stage success, all task output under staging 
dir will be moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in staging dir, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this staging 
dir.

And when the task commit stage of new application success, all task output 
under this staging dir, which contains parts of old application's task output , 
would be moved to destination dir and the result is duplicated.




> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly,  when this task complete, it will save output to 
> when all tasks of this stage success, all task output under staging dir will 
> be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Affects Version/s: 2.3.3

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.3.3
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928170#comment-16928170
 ] 

feiwang commented on SPARK-29037:
-

If we have several applications, which insert overwrite a partition of same 
table, running  same time. There may be data corruption when they commit task 
output same time.

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928168#comment-16928168
 ] 

feiwang commented on SPARK-29037:
-

This committedTaskPath is hard coded in FileOutputCommitter class.

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928165#comment-16928165
 ] 

feiwang commented on SPARK-29037:
-

This is the unit test log.
 !screenshot-1.png! 

We can see that, the task's output will always be saved at 
$tablePath/_temporary/0/.

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Attachment: screenshot-1.png

> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun

2019-09-11 Thread feiwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

feiwang updated SPARK-29037:

Description: 
When we insert overwrite a partition of table.
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly, when all tasks of this stage success, all task output under staging 
dir will be moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in staging dir, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this staging 
dir.

And when the task commit stage of new application success, all task output 
under this staging dir, which contains parts of old application's task output , 
would be moved to destination dir and the result is duplicated.



  was:
For a stage, whose tasks commit output, a task saves output to a staging dir 
firstly, when all tasks of this stage success, all task output under staging 
dir will be moved to destination dir.

However, when we kill an application, which is committing tasks' output, parts 
of tasks' results will be kept in staging dir, which would not be cleared 
gracefully.

Then we rerun this application and the new application will reuse this staging 
dir.

And when the task commit stage of new application success, all task output 
under this staging dir, which contains parts of old application's task output , 
would be moved to destination dir and the result is duplicated.




> [Core] Spark gives duplicate result when an application was killed and rerun
> 
>
> Key: SPARK-29037
> URL: https://issues.apache.org/jira/browse/SPARK-29037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: feiwang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> When we insert overwrite a partition of table.
> For a stage, whose tasks commit output, a task saves output to a staging dir 
> firstly, when all tasks of this stage success, all task output under staging 
> dir will be moved to destination dir.
> However, when we kill an application, which is committing tasks' output, 
> parts of tasks' results will be kept in staging dir, which would not be 
> cleared gracefully.
> Then we rerun this application and the new application will reuse this 
> staging dir.
> And when the task commit stage of new application success, all task output 
> under this staging dir, which contains parts of old application's task output 
> , would be moved to destination dir and the result is duplicated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29063) fillna support for joined table

2019-09-11 Thread Yuanjian Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-29063:

Description: 
When you have a joined table that has the same field name from both original 
table, fillna will fail even if you specify a subset that does not include the 
'ambiguous' fields.
{code:java}
scala> val df1 = Seq(("f1-1", "f2", null), ("f1-2", null, null), ("f1-3", "f2", 
"f3-1"), ("f1-4", "f2", "f3-1")).toDF("f1", "f2", "f3")
scala> val df2 = Seq(("f1-1", null, null), ("f1-2", "f2", null), ("f1-3", "f2", 
"f4-1")).toDF("f1", "f2", "f4")
scala> val df_join = df1.alias("df1").join(df2.alias("df2"), Seq("f1"), 
joinType="left_outer")
scala> df_join.na.fill("", cols=Seq("f4"))

org.apache.spark.sql.AnalysisException: Reference 'f2' is ambiguous, could be: 
df1.f2, df2.f2.;
{code}

> fillna support for joined table
> ---
>
> Key: SPARK-29063
> URL: https://issues.apache.org/jira/browse/SPARK-29063
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> When you have a joined table that has the same field name from both original 
> table, fillna will fail even if you specify a subset that does not include 
> the 'ambiguous' fields.
> {code:java}
> scala> val df1 = Seq(("f1-1", "f2", null), ("f1-2", null, null), ("f1-3", 
> "f2", "f3-1"), ("f1-4", "f2", "f3-1")).toDF("f1", "f2", "f3")
> scala> val df2 = Seq(("f1-1", null, null), ("f1-2", "f2", null), ("f1-3", 
> "f2", "f4-1")).toDF("f1", "f2", "f4")
> scala> val df_join = df1.alias("df1").join(df2.alias("df2"), Seq("f1"), 
> joinType="left_outer")
> scala> df_join.na.fill("", cols=Seq("f4"))
> org.apache.spark.sql.AnalysisException: Reference 'f2' is ambiguous, could 
> be: df1.f2, df2.f2.;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29063) fillna support for joined table

2019-09-11 Thread Yuanjian Li (Jira)
Yuanjian Li created SPARK-29063:
---

 Summary: fillna support for joined table
 Key: SPARK-29063
 URL: https://issues.apache.org/jira/browse/SPARK-29063
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuanjian Li






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Lantao Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928144#comment-16928144
 ] 

Lantao Jin commented on SPARK-29038:


[~smilegator] Yes. It's physically stored. I will create a detail documentation 
which contains more details  to illustrate the implementation.

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread

2019-09-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29046.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25753
[https://github.com/apache/spark/pull/25753]

> Possible NPE on SQLConf.get when SparkContext is stopping in another thread
> ---
>
> Key: SPARK-29046
> URL: https://issues.apache.org/jira/browse/SPARK-29046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> We encountered NPE in listener code which deals with query plan - and 
> according to the stack trace below, only possible case of NPE is 
> SparkContext._dagScheduler being null, which is only possible while stopping 
> SparkContext (unless null is set from outside).
>  
> {code:java}
> 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO 
> cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 
> 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down19/09/11 00:22:24 INFO 
> cluster.SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices(serviceOption=None, services=List(), 
> started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught 
> exception during parsing eventjava.lang.NullPointerException at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at 
> org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at 
> com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102)
>  at 
> com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> 

[jira] [Assigned] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread

2019-09-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29046:


Assignee: Jungtaek Lim

> Possible NPE on SQLConf.get when SparkContext is stopping in another thread
> ---
>
> Key: SPARK-29046
> URL: https://issues.apache.org/jira/browse/SPARK-29046
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
>
> We encountered NPE in listener code which deals with query plan - and 
> according to the stack trace below, only possible case of NPE is 
> SparkContext._dagScheduler being null, which is only possible while stopping 
> SparkContext (unless null is set from outside).
>  
> {code:java}
> 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> server.AbstractConnector: Stopped 
> Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO 
> ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO 
> cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 
> 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each 
> executor to shut down19/09/11 00:22:24 INFO 
> cluster.SchedulerExtensionServices: Stopping 
> SchedulerExtensionServices(serviceOption=None, services=List(), 
> started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught 
> exception during parsing eventjava.lang.NullPointerException at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at 
> scala.Option.map(Option.scala:146) at 
> org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at 
> org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at 
> com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102)
>  at 
> com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239)
>  at 
> com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at 
> scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89)
>  at 
> com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72)
>  at 
> com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71)
>  at scala.Option.foreach(Option.scala:257) at 
> com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71)
>  at 
> 

[jira] [Updated] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dengziming updated SPARK-29050:
---
Issue Type: Improvement  (was: Bug)

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Trivial
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928121#comment-16928121
 ] 

dengziming commented on SPARK-29050:


[~srowen] thank you!

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Trivial
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dengziming updated SPARK-29050:
---
Issue Type: Bug  (was: Improvement)

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Trivial
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29062) Add V1_BATCH_WRITE to the TableCapabilityChecks in the Analyzer

2019-09-11 Thread Burak Yavuz (Jira)
Burak Yavuz created SPARK-29062:
---

 Summary: Add V1_BATCH_WRITE to the TableCapabilityChecks in the 
Analyzer
 Key: SPARK-29062
 URL: https://issues.apache.org/jira/browse/SPARK-29062
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Burak Yavuz


Currently the checks in the Analyzer require that V2 Tables have BATCH_WRITE 
defined for all tables that have V1 Write fallbacks. This is confusing as these 
tables may not have the V2 writer interface implemented yet.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29061) Prints bytecode statistics in debugCodegen

2019-09-11 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-29061:


 Summary: Prints bytecode statistics in debugCodegen
 Key: SPARK-29061
 URL: https://issues.apache.org/jira/browse/SPARK-29061
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


This ticket targets to print bytecode statistics (max class bytecode size, max 
method bytecode size, and max constant pool size) for generated classes in 
debug prints, {{debugCodegen}}. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29057) remove InsertIntoTable

2019-09-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29057.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25763
[https://github.com/apache/spark/pull/25763]

> remove InsertIntoTable
> --
>
> Key: SPARK-29057
> URL: https://issues.apache.org/jira/browse/SPARK-29057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Adrian Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928100#comment-16928100
 ] 

Adrian Wang commented on SPARK-29038:
-

This seems duplicates with our proposal of SPARK-26764 . We have implemented 
similar features and have already had it running in our customer's production 
environment.

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29041) Allow createDataFrame to accept bytes as binary type

2019-09-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29041.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25749
[https://github.com/apache/spark/pull/25749]

> Allow createDataFrame to accept bytes as binary type
> 
>
> Key: SPARK-29041
> URL: https://issues.apache.org/jira/browse/SPARK-29041
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> {code}
> spark.createDataFrame([[b"abcd"]], "col binary")
> {code}
> simply fails as below:
> in Python 3
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
> data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
>   File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
> % (dataType, obj, type(obj
> TypeError: field col: BinaryType can not accept object b'abcd' in type  'bytes'>
> {code}
> in Python 2:
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
> data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
> % (dataType, obj, type(obj
> TypeError: field col: BinaryType can not accept object 'abcd' in type  'str'>
> {code}
> {{bytes}} should also be able to accepted as binary type



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29041) Allow createDataFrame to accept bytes as binary type

2019-09-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29041:


Assignee: Hyukjin Kwon

> Allow createDataFrame to accept bytes as binary type
> 
>
> Key: SPARK-29041
> URL: https://issues.apache.org/jira/browse/SPARK-29041
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> spark.createDataFrame([[b"abcd"]], "col binary")
> {code}
> simply fails as below:
> in Python 3
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
> data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
>   File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
> % (dataType, obj, type(obj
> TypeError: field col: BinaryType can not accept object b'abcd' in type  'bytes'>
> {code}
> in Python 2:
> {code}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in 
> createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in 
> _createFromLocal
> data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in 
> verify_acceptable_types
> % (dataType, obj, type(obj
> TypeError: field col: BinaryType can not accept object 'abcd' in type  'str'>
> {code}
> {{bytes}} should also be able to accepted as binary type



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-24663:
--

Assignee: Jungtaek Lim

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-24663.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25725
[https://github.com/apache/spark/pull/25725]

> Flaky test: StreamingContextSuite "stop slow receiver gracefully"
> -
>
> Key: SPARK-24663
> URL: https://issues.apache.org/jira/browse/SPARK-24663
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> This is another test that sometimes fails on our build machines, although I 
> can't find failures on the riselab jenkins servers. Failure looks like:
> {noformat}
> org.scalatest.exceptions.TestFailedException: 0 was not greater than 0
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
>   at 
> org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335)
> {noformat}
> The test fails in about 2s, while a successful run generally takes 15s. 
> Looking at the logs, the receiver hasn't even started when things fail, which 
> points at a race during test initialization.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

2019-09-11 Thread Michael Heuer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921700#comment-16921700
 ] 

Michael Heuer edited comment on SPARK-27781 at 9/11/19 7:47 PM:


-This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 
without Hadoop.-
 https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/


was (Author: heuermh):
-This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 
without Hadoop.-
 [-https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/-]

> Tried to access method org.apache.avro.specific.SpecificData.()V
> --
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Michael Heuer
>Priority: Major
> Fix For: 2.4.4
>
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

2019-09-11 Thread Michael Heuer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Heuer resolved SPARK-27781.
---
Fix Version/s: 2.4.4
   Resolution: Fixed

> Tried to access method org.apache.avro.specific.SpecificData.()V
> --
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Michael Heuer
>Priority: Major
> Fix For: 2.4.4
>
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

2019-09-11 Thread Michael Heuer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921700#comment-16921700
 ] 

Michael Heuer edited comment on SPARK-27781 at 9/11/19 7:46 PM:


-This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 
without Hadoop.-
 [-https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/-]


was (Author: heuermh):
This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 
without Hadoop.
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/

> Tried to access method org.apache.avro.specific.SpecificData.()V
> --
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3, 2.4.4
>Reporter: Michael Heuer
>Priority: Major
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

2019-09-11 Thread Michael Heuer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927953#comment-16927953
 ] 

Michael Heuer commented on SPARK-27781:
---

This issue has been fixed in Spark 2.4.4, and fixed in ADAM Jenkins CI

https://github.com/bigdatagenomics/adam/pull/2206

> Tried to access method org.apache.avro.specific.SpecificData.()V
> --
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3, 2.4.4
>Reporter: Michael Heuer
>Priority: Major
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V

2019-09-11 Thread Michael Heuer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Heuer updated SPARK-27781:
--
Affects Version/s: (was: 2.4.4)

> Tried to access method org.apache.avro.specific.SpecificData.()V
> --
>
> Key: SPARK-27781
> URL: https://issues.apache.org/jira/browse/SPARK-27781
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Michael Heuer
>Priority: Major
> Attachments: reproduce.sh
>
>
> It appears that there is a conflict in avro dependency versions at runtime 
> when using Spark 2.4.3 and Scala 2.12 
> (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop 
> 2.7.7.
>  
> Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes 
> avro-1.8.2.jar
> {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}}
> {{jars/avro-1.8.2.jar}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
> {{jars/avro-ipc-1.8.2.jar}}
>  
> Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop 
> does not
> {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}}
> {{jars/avro-mapred-1.8.2-hadoop2.jar}}
>  
> Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which 
> conflicts at runtime
> {{$ find hadoop-2.7.7 -name *.jar | grep avro}}
> {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}}
> {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}}
> {{share/hadoop/tools/lib/avro-1.7.4.jar}}
> {{share/hadoop/common/lib/avro-1.7.4.jar}}
> {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}}
>  
> Issue filed downstream in
> [https://github.com/bigdatagenomics/adam/issues/2151]
>  
> Attached a smaller reproducing test case.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927912#comment-16927912
 ] 

koert kuipers commented on SPARK-29027:
---

[~gsomogyi] if you email me at koert at tresata dot com i can send logs

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  5.456 
> s]
> [INFO] Spark Project Networking 

[jira] [Created] (SPARK-29060) Add tree traversal helper for adaptive spark plans

2019-09-11 Thread Maryann Xue (Jira)
Maryann Xue created SPARK-29060:
---

 Summary: Add tree traversal helper for adaptive spark plans
 Key: SPARK-29060
 URL: https://issues.apache.org/jira/browse/SPARK-29060
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maryann Xue






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29059) Support for Hive Materialized Views in Spark SQL.

2019-09-11 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated SPARK-29059:
--
Description: 
Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark 
Catalyst does not optimize queries against Hive tables using Materialized View 
the way Apache Calcite does it for Hive. This Jira is to add support for the 
same.

We have developed it in our internal trunk and would like to open source it. It 
would consist of 3 major parts:
 # Reading MV related Hive Metadata
 # Implication Engine which would figure out if an expression exp1 implies 
another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar 
to RexImplication checker in Apache Calcite.
 # Catalyst rule to replace tables by it's Materialized view using Implication 
Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * 
from foo where x > 10 && x <110'  then query 'select * from foo where x > 70 
and x < 100' will be transformed into 'select * from mv where x >70 and x < 100'

Note that Implication Engine and Catalyst Rule is generic can be used even when 
Spark decides to have it's own Materialized View.

  was:
Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark 
Catalyst does not optimize queries against Hive tables using Materialized View 
the way Apache Calcite does it for Hive. This Jira is to add support for the 
same.

We have developed it in our internal track would like to open source it. It 
would consist of 3 major parts:
 # Reading MV related Hive Metadata
 # Implication Engine which would figure out if an expression exp1 implies 
another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar 
to RexImplication checker in Apache Calcite.
 # Catalyst rule to replace tables by it's Materialized view using Implication 
Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * 
from foo where x > 10 && x <110'  then query 'select * from foo where x > 70 
and x < 100' will be transformed into 'select * from mv where x >70 and x < 100'

Note that Implication Engine and Catalyst Rule is generic can be used even when 
Spark decides to have it's own Materialized View.


> Support for Hive Materialized Views in Spark SQL.
> -
>
> Key: SPARK-29059
> URL: https://issues.apache.org/jira/browse/SPARK-29059
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Amogh Margoor
>Priority: Minor
>
> Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark 
> Catalyst does not optimize queries against Hive tables using Materialized 
> View the way Apache Calcite does it for Hive. This Jira is to add support for 
> the same.
> We have developed it in our internal trunk and would like to open source it. 
> It would consist of 3 major parts:
>  # Reading MV related Hive Metadata
>  # Implication Engine which would figure out if an expression exp1 implies 
> another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar 
> to RexImplication checker in Apache Calcite.
>  # Catalyst rule to replace tables by it's Materialized view using 
> Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 
> 'select * from foo where x > 10 && x <110'  then query 'select * from foo 
> where x > 70 and x < 100' will be transformed into 'select * from mv where x 
> >70 and x < 100'
> Note that Implication Engine and Catalyst Rule is generic can be used even 
> when Spark decides to have it's own Materialized View.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29059) Support for Hive Materialized Views in Spark SQL.

2019-09-11 Thread Amogh Margoor (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amogh Margoor updated SPARK-29059:
--
Summary: Support for Hive Materialized Views in Spark SQL.  (was: Support 
for Hive Materialized Views for Spark SQL.)

> Support for Hive Materialized Views in Spark SQL.
> -
>
> Key: SPARK-29059
> URL: https://issues.apache.org/jira/browse/SPARK-29059
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Amogh Margoor
>Priority: Minor
>
> Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark 
> Catalyst does not optimize queries against Hive tables using Materialized 
> View the way Apache Calcite does it for Hive. This Jira is to add support for 
> the same.
> We have developed it in our internal track would like to open source it. It 
> would consist of 3 major parts:
>  # Reading MV related Hive Metadata
>  # Implication Engine which would figure out if an expression exp1 implies 
> another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar 
> to RexImplication checker in Apache Calcite.
>  # Catalyst rule to replace tables by it's Materialized view using 
> Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 
> 'select * from foo where x > 10 && x <110'  then query 'select * from foo 
> where x > 70 and x < 100' will be transformed into 'select * from mv where x 
> >70 and x < 100'
> Note that Implication Engine and Catalyst Rule is generic can be used even 
> when Spark decides to have it's own Materialized View.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29059) Support for Hive Materialized Views for Spark SQL.

2019-09-11 Thread Amogh Margoor (Jira)
Amogh Margoor created SPARK-29059:
-

 Summary: Support for Hive Materialized Views for Spark SQL.
 Key: SPARK-29059
 URL: https://issues.apache.org/jira/browse/SPARK-29059
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Amogh Margoor


Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark 
Catalyst does not optimize queries against Hive tables using Materialized View 
the way Apache Calcite does it for Hive. This Jira is to add support for the 
same.

We have developed it in our internal track would like to open source it. It 
would consist of 3 major parts:
 # Reading MV related Hive Metadata
 # Implication Engine which would figure out if an expression exp1 implies 
another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar 
to RexImplication checker in Apache Calcite.
 # Catalyst rule to replace tables by it's Materialized view using Implication 
Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * 
from foo where x > 10 && x <110'  then query 'select * from foo where x > 70 
and x < 100' will be transformed into 'select * from mv where x >70 and x < 100'

Note that Implication Engine and Catalyst Rule is generic can be used even when 
Spark decides to have it's own Materialized View.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927889#comment-16927889
 ] 

koert kuipers commented on SPARK-29027:
---

just for this one test debug logs is 62mb of kerberos and ldap stuff. its 
difficult to say whats sensitive and whats not.

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  5.456 
> 

[jira] [Created] (SPARK-29058) Reading csv file with DROPMALFORMED showing incorrect record count

2019-09-11 Thread Suchintak Patnaik (Jira)
Suchintak Patnaik created SPARK-29058:
-

 Summary: Reading csv file with DROPMALFORMED showing incorrect 
record count
 Key: SPARK-29058
 URL: https://issues.apache.org/jira/browse/SPARK-29058
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.3.0
Reporter: Suchintak Patnaik


The spark sql csv reader is dropping malformed records as expected, but the 
record count is showing as incorrect.

Consider this file (fruit.csv)

apple,red,1,3
banana,yellow,2,4.56
orange,orange,3,5

Defining schema as follows:

schema = "Fruit string,color string,price int,quantity int"

Notice that the "quantity" field is defined as integer type, but the 2nd row in 
the file contains a floating point value, hence it is a corrupt record.


>>> df = spark.read.csv(path="fruit.csv",mode="DROPMALFORMED",schema=schema)
>>> df.show()
+--+--+-++
| Fruit| color|price|quantity|
+--+--+-++
| apple|   red|1|   3|
|orange|orange|3|   5|
+--+--+-++

>>> df.count()
3

Malformed record is getting dropped as expected, but incorrect record count is 
getting displayed.

Here the df.count() should give value as 2




 

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29007) Possible leak of SparkContext in tests / test suites initializing StreamingContext

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-29007.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Possible leak of SparkContext in tests / test suites initializing 
> StreamingContext
> --
>
> Key: SPARK-29007
> URL: https://issues.apache.org/jira/browse/SPARK-29007
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, MLlib, Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> There're lots of tests creating StreamingContext with creating new 
> SparkContext in its constructor, and we don't have enough guard to prevent 
> leakage of SparkContext in test suites. Ideally we should ensure SparkContext 
> is not leaked between test suites, even between tests if each test creates 
> StreamingContext.
>  
> One of example for leakage is below:
> {noformat}
> [info] *** 4 SUITES ABORTED ***
> [info] *** 131 TESTS FAILED ***
> [error] Error: Total 418, Failed 131, Errors 4, Passed 283, Ignored 1
> [error] Failed tests:
> [error]   org.apache.spark.streaming.scheduler.JobGeneratorSuite
> [error]   org.apache.spark.streaming.ReceiverInputDStreamSuite
> [error]   org.apache.spark.streaming.WindowOperationsSuite
> [error]   org.apache.spark.streaming.StreamingContextSuite
> [error]   org.apache.spark.streaming.scheduler.ReceiverTrackerSuite
> [error]   org.apache.spark.streaming.CheckpointSuite
> [error]   org.apache.spark.streaming.UISeleniumSuite
> [error]   
> org.apache.spark.streaming.scheduler.ExecutorAllocationManagerSuite
> [error]   org.apache.spark.streaming.ReceiverSuite
> [error]   org.apache.spark.streaming.BasicOperationsSuite
> [error]   org.apache.spark.streaming.InputStreamsSuite
> [error] Error during tests:
> [error]   org.apache.spark.streaming.MapWithStateSuite
> [error]   org.apache.spark.streaming.DStreamScopeSuite
> [error]   org.apache.spark.streaming.rdd.MapWithStateRDDSuite
> [error]   org.apache.spark.streaming.scheduler.InputInfoTrackerSuite
>  {noformat}
> {{}}
> {noformat}
> [info] JobGeneratorSuite:
> [info] - SPARK-6222: Do not clear received block data too soon *** FAILED *** 
> (2 milliseconds)
> [info]   org.apache.spark.SparkException: Only one SparkContext should be 
> running in this JVM (see SPARK-2243).The currently running SparkContext was 
> created at:
> [info] org.apache.spark.SparkContext.(SparkContext.scala:82)
> [info] 
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851)
> [info] 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85)
> [info] 
> org.apache.spark.streaming.TestSuiteBase.setupStreams(TestSuiteBase.scala:317)
> [info] 
> org.apache.spark.streaming.TestSuiteBase.setupStreams$(TestSuiteBase.scala:311)
> [info] 
> org.apache.spark.streaming.CheckpointSuite.setupStreams(CheckpointSuite.scala:209)
> [info] 
> org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:258)
> [info] scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info] org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
> [info] org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> [info] org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
> [info] org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
> [info]   at 
> org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2512)
> [info]   at scala.Option.foreach(Option.scala:274)
> [info]   at 
> org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2509)
> [info]   at 
> org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2586)
> [info]   at org.apache.spark.SparkContext.(SparkContext.scala:87)
> [info]   at 
> org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851)
> [info]   at 
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85)
> [info]   at 
> 

[jira] [Resolved] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26989.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25706
[https://github.com/apache/spark/pull/25706]

> Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage 
> attempt don't trigger multiple stage retries
> ---
>
> Key: SPARK-26989
> URL: https://issues.apache.org/jira/browse/SPARK-26989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/
> {noformat}
> org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the 
> same stage attempt don't trigger multiple stage retries
> Error Message
> org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal 
> List(0)
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> ArrayBuffer() did not equal List(0)
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122)
> {noformat}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull
> {code}
> - Barrier task failures from the same stage attempt don't trigger multiple 
> stage retries *** FAILED ***
>   ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries

2019-09-11 Thread Marcelo Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-26989:
--

Assignee: Jungtaek Lim

> Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage 
> attempt don't trigger multiple stage retries
> ---
>
> Key: SPARK-26989
> URL: https://issues.apache.org/jira/browse/SPARK-26989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/
> {noformat}
> org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the 
> same stage attempt don't trigger multiple stage retries
> Error Message
> org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal 
> List(0)
> Stacktrace
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> ArrayBuffer() did not equal List(0)
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104)
>   at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122)
> {noformat}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull
> {code}
> - Barrier task failures from the same stage attempt don't trigger multiple 
> stage retries *** FAILED ***
>   ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927822#comment-16927822
 ] 

Gabor Somogyi commented on SPARK-29027:
---

You can remove the sensitive parts or if you only trust me then fine but you 
loose the possibility of community knowledge. Maybe somebody would pinpoint the 
issue right away.

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local 

[jira] [Created] (SPARK-29057) remove InsertIntoTable

2019-09-11 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-29057:
---

 Summary: remove InsertIntoTable
 Key: SPARK-29057
 URL: https://issues.apache.org/jira/browse/SPARK-29057
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29014) DataSourceV2: Clean up current, default, and session catalog uses

2019-09-11 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927796#comment-16927796
 ] 

Wenchen Fan commented on SPARK-29014:
-

It doesn't require a major refactor but it's easier and cleaner to make this 
change with a refactor that centralizes the catalog/table lookup logic.

> DataSourceV2: Clean up current, default, and session catalog uses
> -
>
> Key: SPARK-29014
> URL: https://issues.apache.org/jira/browse/SPARK-29014
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ryan Blue
>Priority: Blocker
>
> Catalog tracking in DSv2 has evolved since the initial changes went in. We 
> need to make sure that handling is consistent across plans using the latest 
> rules:
>  * The _current_ catalog should be used when no catalog is specified
>  * The _default_ catalog is the catalog _current_ is initialized to
>  * If the _default_ catalog is not set, then it is the built-in Spark session 
> catalog, which will be called `spark_catalog` (This is the v2 session catalog)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Xiao Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927784#comment-16927784
 ] 

Xiao Li commented on SPARK-29038:
-

So far, the doc does not contain enough details. It requires comprehensive 
comparison with the corresponding features in the other commercial database. We 
also need to document how to implement them one by one.

Also, based on my understanding, the materialized view should not be 
memory-based. It has to be physically stored. Usage of Spark cache could affect 
the other memory-intensive queries. Any major feature in cache usage requires a 
memory manager.   

I am not against this, but the efforts for supporting this feature are pretty 
big. 

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927785#comment-16927785
 ] 

koert kuipers commented on SPARK-29027:
---

[~gsomogyi] i can email you debug log file directly if thats ok. rather not 
post it publicly.

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  5.456 
> s]
> [INFO] Spark Project 

[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927783#comment-16927783
 ] 

koert kuipers commented on SPARK-29027:
---

i get same error in sbt i think, plus i find sbt a lot easier to handle :)
{code}
[info] KafkaDelegationTokenSuite:
[info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** 
(10 seconds, 543 milliseconds)
[info]   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication 
failure
[info]   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
[info]   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
[info]   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
[info]   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
[info]   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
[info]   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
[info]   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
[info]   at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
[info]   at 
org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
[info]   at 
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
[info]   at 
org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:507)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[info]   at java.lang.Thread.run(Thread.java:748)
org.apache.directory.api.ldap.model.exception.LdapOperationErrorException: 
/home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/system/1.3.6.1.4.1.18060.0.4.1.2.50.lg
 (No such file or directory)
at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1183)
at 
org.apache.directory.server.core.shared.partition.DefaultPartitionNexus.sync(DefaultPartitionNexus.java:335)
at 
org.apache.directory.server.core.DefaultDirectoryService.shutdown(DefaultDirectoryService.java:1299)
at 
org.apache.directory.server.core.DefaultDirectoryService$1.run(DefaultDirectoryService.java:1230)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: 
/home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/system/1.3.6.1.4.1.18060.0.4.1.2.50.lg
 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:101)
at jdbm.recman.TransactionManager.open(TransactionManager.java:209)
at 
jdbm.recman.TransactionManager.synchronizeLogFromMemory(TransactionManager.java:202)
at 
jdbm.recman.TransactionManager.synchronizeLog(TransactionManager.java:135)
at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmIndex.sync(JdbmIndex.java:698)
at 
org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmPartition.sync(JdbmPartition.java:312)
at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1228)
at 
org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1173)
... 4 more
java.io.FileNotFoundException: 
/home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/example/1.3.6.1.4.1.18060.0.4.1.2.5.lg
 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:101)
at jdbm.recman.TransactionManager.open(TransactionManager.java:209)
at 
jdbm.recman.TransactionManager.synchronizeLogFromMemory(TransactionManager.java:202)
at 
jdbm.recman.TransactionManager.synchronizeLog(TransactionManager.java:135)
at 

[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927715#comment-16927715
 ] 

koert kuipers commented on SPARK-29027:
---

i renamed /etc/krb5.conf and it did not change anything. still same failure.

{code}
~/spark/external/kafka-0-10-sql$ mvn dependency:tree -Dverbose | grep zookeeper
[INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.7:test
{code}

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... 

[jira] [Updated] (SPARK-29056) ThriftServerSessionPage displays 1970/01/01 for queries that are not finished and not closed

2019-09-11 Thread Juliusz Sompolski (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski updated SPARK-29056:
--
Issue Type: Bug  (was: Improvement)

> ThriftServerSessionPage displays 1970/01/01 for queries that are not finished 
> and not closed
> 
>
> Key: SPARK-29056
> URL: https://issues.apache.org/jira/browse/SPARK-29056
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Spark UI ODBC/JDBC tab session page displays 1970/01/01 (timestamp 0) as 
> finish/close time for queries that haven't finished yet.
> !image-2019-09-11-17-21-52-771.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29056) ThriftServerSessionPage displays 1970/01/01 for queries that are not finished and not closed

2019-09-11 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-29056:
-

 Summary: ThriftServerSessionPage displays 1970/01/01 for queries 
that are not finished and not closed
 Key: SPARK-29056
 URL: https://issues.apache.org/jira/browse/SPARK-29056
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Juliusz Sompolski


Spark UI ODBC/JDBC tab session page displays 1970/01/01 (timestamp 0) as 
finish/close time for queries that haven't finished yet.

!image-2019-09-11-17-21-52-771.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927646#comment-16927646
 ] 

koert kuipers commented on SPARK-29027:
---

let me try to get debug logs

> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB . SUCCESS [  5.456 
> s]
> [INFO] Spark Project Networking ... SUCCESS [ 49.819 
> s]
> 

[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927640#comment-16927640
 ] 

Gabor Somogyi commented on SPARK-29027:
---

Can you give for example the output of this cmd:
{quote}[gaborsomogyi:~/spark/external/kafka-0-10-sql] master(+8/-2)+ ± mvn 
dependency:tree -Dverbose | grep zookeeper{quote}


> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local 

[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails

2019-09-11 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927637#comment-16927637
 ] 

Gabor Somogyi commented on SPARK-29027:
---

I've tried to create a krb5.conf file which contains various things but not 
able to make the test fail. [~koert] please attach something to proceed.


> KafkaDelegationTokenSuite fails
> ---
>
> Key: SPARK-29027
> URL: https://issues.apache.org/jira/browse/SPARK-29027
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
> Environment: {code}
> commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4
> Author: Sean Owen 
> Date:   Mon Sep 9 10:19:40 2019 -0500
> {code}
> Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10)
>Reporter: koert kuipers
>Priority: Minor
>
> i am seeing consistent failure of KafkaDelegationTokenSuite on master
> {code}
> JsonUtilsSuite:
> - parsing partitions
> - parsing partitionOffsets
> KafkaDelegationTokenSuite:
> javax.security.sasl.SaslException: Failure to initialize security context 
> [Caused by GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125)
>   at 
> com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85)
>   at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114)
>   at 
> org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48)
>   at 
> org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: GSSException: No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos credentails)
>   at 
> sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87)
>   at 
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127)
>   at 
> sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193)
>   at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427)
>   at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62)
>   at 
> sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154)
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108)
>   ... 12 more
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED ***
>   org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure
>   at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947)
>   at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924)
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131)
>   at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93)
>   at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   ...
> KafkaSourceOffsetSuite:
> - comparison {"t":{"0":1}} <=> {"t":{"0":2}}
> - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}}
> - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}}
> - basic serialization - deserialization
> - OffsetSeqLog serialization - deserialization
> - read Spark 2.1.0 offset format
> {code}
> {code}
> [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
> [INFO] 
> [INFO] Spark Project Parent POM ... SUCCESS [  4.178 
> s]
> [INFO] Spark Project Tags . SUCCESS [  9.373 
> s]
> [INFO] Spark Project Sketch ... SUCCESS [ 24.586 
> s]
> [INFO] Spark Project Local DB 

[jira] [Commented] (SPARK-28985) Pyspark ClassificationModel and RegressionModel support column setters/getters/predict

2019-09-11 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927612#comment-16927612
 ] 

Huaxin Gao commented on SPARK-28985:


Thanks [~podongfeng] I will work on this. 

> Pyspark ClassificationModel and RegressionModel support column 
> setters/getters/predict
> --
>
> Key: SPARK-28985
> URL: https://issues.apache.org/jira/browse/SPARK-28985
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Minor
>
> 1, add common abstract classes like JavaClassificationModel & 
> JavaProbabilisticClassificationModel
> 2, add column setters/getters, and predict method
> 3, update the test suites to verify newly added functions



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Affects Version/s: (was: 2.4.2)
   (was: 2.4.1)
   (was: 2.4.0)

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-14-26-765.png, 
> image-2019-09-11-16-14-34-963.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-14-34-963.png!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-29050:
--
Issue Type: Improvement  (was: Bug)
  Priority: Trivial  (was: Major)

This can't be considered a bug, or even major. I fixed it. Please read 
https://spark.apache.org/contributing.html

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Trivial
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27492) GPU scheduling - High level user documentation

2019-09-11 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-27492.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> GPU scheduling - High level user documentation
> --
>
> Key: SPARK-27492
> URL: https://issues.apache.org/jira/browse/SPARK-27492
> Project: Spark
>  Issue Type: Story
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.0.0
>
>
> For the SPIP - Accelerator-aware task scheduling for Spark, 
> https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user 
> documentation about how this feature works together and point to things like 
> the example discovery script, etc.
>  
>  - make sure to document the discovery script and what permissions are needed 
> and any security implications
>  - Document standalone - local-cluster mode limitation of only a single 
> resource file or discovery script so you have to have coordination on for it 
> to work right.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling

2019-09-11 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927558#comment-16927558
 ] 

Thomas Graves commented on SPARK-27495:
---

[~felixcheung]  [~jiangxb1987]  I put this up for vote on the dev mailing list. 
 Could you please take a look and comment there?

> SPIP: Support Stage level resource configuration and scheduling
> ---
>
> Key: SPARK-27495
> URL: https://issues.apache.org/jira/browse/SPARK-27495
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> *Q1.* What are you trying to do? Articulate your objectives using absolutely 
> no jargon.
> Objectives:
>  # Allow users to specify task and executor resource requirements at the 
> stage level. 
>  # Spark will use the stage level requirements to acquire the necessary 
> resources/executors and schedule tasks based on the per stage requirements.
> Many times users have different resource requirements for different stages of 
> their application so they want to be able to configure resources at the stage 
> level. For instance, you have a single job that has 2 stages. The first stage 
> does some  ETL which requires a lot of tasks, each with a small amount of 
> memory and 1 core each. Then you have a second stage where you feed that ETL 
> data into an ML algorithm. The second stage only requires a few executors but 
> each executor needs a lot of memory, GPUs, and many cores.  This feature 
> allows the user to specify the task and executor resource requirements for 
> the ETL Stage and then change them for the ML stage of the job. 
> Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and 
> extra Resources (GPU/FPGA/etc). It has the potential to allow for other 
> things like limiting the number of tasks per stage, specifying other 
> parameters for things like shuffle, etc. Initially I would propose we only 
> support resources as they are now. So Task resources would be cpu and other 
> resources (GPU, FPGA), that way we aren't adding in extra scheduling things 
> at this point.  Executor resources would be cpu, memory, and extra 
> resources(GPU,FPGA, etc). Changing the executor resources will rely on 
> dynamic allocation being enabled.
> Main use cases:
>  # ML use case where user does ETL and feeds it into an ML algorithm where 
> it’s using the RDD API. This should work with barrier scheduling as well once 
> it supports dynamic allocation.
>  # This adds the framework/api for Spark's own internal use.  In the future 
> (not covered by this SPIP), Catalyst could control the stage level resources 
> as it finds the need to change it between stages for different optimizations. 
> For instance, with the new columnar plugin to the query planner we can insert 
> stages into the plan that would change running something on the CPU in row 
> format to running it on the GPU in columnar format. This API would allow the 
> planner to make sure the stages that run on the GPU get the corresponding GPU 
> resources it needs to run. Another possible use case for catalyst is that it 
> would allow catalyst to add in more optimizations to where the user doesn’t 
> need to configure container sizes at all. If the optimizer/planner can handle 
> that for the user, everyone wins.
> This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I 
> think the DataSet API will require more changes because it specifically hides 
> the RDD from the users via the plans and catalyst can optimize the plan and 
> insert things into the plan. The only way I’ve found to make this work with 
> the Dataset API would be modifying all the plans to be able to get the 
> resource requirements down into where it creates the RDDs, which I believe 
> would be a lot of change.  If other people know better options, it would be 
> great to hear them.
> *Q2.* What problem is this proposal NOT designed to solve?
> The initial implementation is not going to add Dataset APIs.
> We are starting with allowing users to specify a specific set of 
> task/executor resources and plan to design it to be extendable, but the first 
> implementation will not support changing generic SparkConf configs and only 
> specific limited resources.
> This initial version will have a programmatic API for specifying the resource 
> requirements per stage, we can add the ability to perhaps have profiles in 
> the configs later if its useful.
> *Q3.* How is it done today, and what are the limits of current practice?
> Currently this is either done by having multiple spark jobs or requesting 
> containers with the max resources needed for any part of the job.  To do this 
> today, you can break it into 

[jira] [Updated] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only

2019-09-11 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28987:
--
Priority: Minor  (was: Major)

> DiskBlockManager#createTempShuffleBlock should skip directory which is 
> read-only
> 
>
> Key: SPARK-28987
> URL: https://issues.apache.org/jira/browse/SPARK-28987
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> DiskBlockManager#createTempShuffleBlock only considers the path which is not 
> exist. I think we could check whether the path is writeable or not. It's 
> resonable beacuse we invoke createTempShuffleBlock to create a new path to 
> write files in it. It should be writeable.
> stack:
> {code:java}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task 
> 1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor 
> 251): java.io.FileNotFoundException: 
> /home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74
>  (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:100)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only

2019-09-11 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28987.
---
Resolution: Won't Fix

> DiskBlockManager#createTempShuffleBlock should skip directory which is 
> read-only
> 
>
> Key: SPARK-28987
> URL: https://issues.apache.org/jira/browse/SPARK-28987
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: deshanxiao
>Priority: Minor
>
> DiskBlockManager#createTempShuffleBlock only considers the path which is not 
> exist. I think we could check whether the path is writeable or not. It's 
> resonable beacuse we invoke createTempShuffleBlock to create a new path to 
> write files in it. It should be writeable.
> stack:
> {code:java}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task 
> 1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor 
> 251): java.io.FileNotFoundException: 
> /home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74
>  (No such file or directory)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.(FileOutputStream.java:213)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150)
> at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268)
> at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
> at org.apache.spark.scheduler.Task.run(Task.scala:100)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:
> at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695)
> at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: image-2019-09-11-16-14-34-963.png

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-14-26-765.png, 
> image-2019-09-11-16-14-34-963.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-32-650.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: image-2019-09-11-16-14-26-765.png

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-14-26-765.png, 
> image-2019-09-11-16-14-34-963.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-32-650.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Description: 
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-14-34-963.png!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 

  was:
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-13-32-650.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 


> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-14-26-765.png, 
> image-2019-09-11-16-14-34-963.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-14-34-963.png!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: (was: image-2019-09-11-16-13-20-588.png)

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-14-26-765.png, 
> image-2019-09-11-16-14-34-963.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-32-650.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Description: 
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-13-20-588.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 

  was:
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-09-06-720.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 


> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-13-20-588.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-20-588.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Description: 
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-13-32-650.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 

  was:
In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-13-20-588.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 


> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-13-20-588.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-32-650.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-09-11 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28906.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 25655
[https://github.com/apache/spark/pull/25655]

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 
> 2.4.4, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
> Attachments: image-2019-08-29-05-50-13-526.png
>
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: image-2019-09-11-16-13-20-588.png

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-13-20-588.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-09-06-720.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: (was: image-2019-09-11-16-13-32-650.png)

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-13-20-588.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-32-650.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

George Papa updated SPARK-29055:

Attachment: image-2019-09-11-16-13-32-650.png

> Memory leak in Spark Driver
> ---
>
> Key: SPARK-29055
> URL: https://issues.apache.org/jira/browse/SPARK-29055
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: George Papa
>Priority: Major
> Attachments: image-2019-09-11-16-13-20-588.png
>
>
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> In Spark 2.3.3+ the driver memory is increasing continuously. I don't have 
> this issue with Spark 2.1.1.
> In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
> BlockManager removes the broadcast blocks from the memory, as you can see in 
> the following screenshot:
> !image-2019-09-11-16-13-20-588.png|width=685,height=89!
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  
> But in Spark 2.3.3+ I don't see this cleaning and the driver storage 
> increases!!
> *NOTE:* After few hours of use I have application interruption with the 
> following error :
> {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-09-11 Thread Sean Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28906:
-

Assignee: Kazuaki Ishizaki

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 
> 2.4.4, 3.0.0
>Reporter: Marcelo Vanzin
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2019-08-29-05-50-13-526.png
>
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29055) Memory leak in Spark Driver

2019-09-11 Thread George Papa (Jira)
George Papa created SPARK-29055:
---

 Summary: Memory leak in Spark Driver
 Key: SPARK-29055
 URL: https://issues.apache.org/jira/browse/SPARK-29055
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Spark Core
Affects Versions: 2.4.4, 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.3
Reporter: George Papa
 Attachments: image-2019-09-11-16-13-20-588.png

In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this 
issue with Spark 2.1.1.

In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and 
BlockManager removes the broadcast blocks from the memory, as you can see in 
the following screenshot:

!image-2019-09-11-16-09-06-720.png|width=685,height=89!

But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!!

*NOTE:* After few hours of use I have application interruption with the 
following error :

{color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color}

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927533#comment-16927533
 ] 

Jungtaek Lim commented on SPARK-29043:
--

5+! I'm very surprised to hear that, as it means 5+ of files are stored 
in same directory and being listed via SHS, and 5+ of UI objects are loaded 
and rendered in SHS (one JVM).

I'd be appreciated if you can review design doc for SPARK-28594 to see whether 
it helps your case, and participate code review. Thanks!

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29054) Invalidate Kafka consumer when new delegation token available

2019-09-11 Thread Gabor Somogyi (Jira)
Gabor Somogyi created SPARK-29054:
-

 Summary: Invalidate Kafka consumer when new delegation token 
available
 Key: SPARK-29054
 URL: https://issues.apache.org/jira/browse/SPARK-29054
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Gabor Somogyi


Kafka consumers are cached. If delegation token is used and the token is 
expired, then exception is thrown. Such case new consumer is created in a Task 
retry with the latest delegation token. This can be enhanced by detecting the 
existence of a new delegation token.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28985) Pyspark ClassificationModel and RegressionModel support column setters/getters/predict

2019-09-11 Thread zhengruifeng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927498#comment-16927498
 ] 

zhengruifeng commented on SPARK-28985:
--

[~huaxingao] You can refer to my old prs 
[https://github.com/apache/spark/pull/16171] and 
[https://github.com/apache/spark/pull/25662] if you want to take it over. 
Thanks!

> Pyspark ClassificationModel and RegressionModel support column 
> setters/getters/predict
> --
>
> Key: SPARK-28985
> URL: https://issues.apache.org/jira/browse/SPARK-28985
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Priority: Minor
>
> 1, add common abstract classes like JavaClassificationModel & 
> JavaProbabilisticClassificationModel
> 2, add column setters/getters, and predict method
> 3, update the test suites to verify newly added functions



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29053:
-
Description: 
Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
*Duration* field.

*Test Steps*
 1.Install spark
 2.Start Spark beeline
 3.Submit some SQL queries
 4.Close some spark applications
 5.Check the Spark Web UI JDBC/ODBC Server TAB.
 7.Try sorting based on each filed USer/IP/Session ID/Finish 
Time/DUration/Total execute

*Issue:*
 *Sorting[ascending or descending]* based on *Duration* is not proper in 
*JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
sorting is based on string/number only instead of proper days/weeks/hours ..
 Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check 
it.

!Sort Icon.png!

  was:
Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
*Duration* field.

*Test Steps*
1.Install spark
2.Start Spark beeline
3.Submit some SQL queries
4.Close some spark applications
5.Check the Spark Web UI JDBC/ODBC Server TAB.
7.Try sorting based on each filed USer/IP/Session ID/Finish Time/DUration/Total 
execute

*Issue:*
*Sorting[ascending or descending]* based on *Duration* is not proper in 
*JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
sorting is based on string/number only instead of proper days/weeks/hours ..
Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check 
it.



> Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for 
> Duration field
> ---
>
> Key: SPARK-29053
> URL: https://issues.apache.org/jira/browse/SPARK-29053
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Sort Icon.png
>
>
> Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
> *Duration* field.
> *Test Steps*
>  1.Install spark
>  2.Start Spark beeline
>  3.Submit some SQL queries
>  4.Close some spark applications
>  5.Check the Spark Web UI JDBC/ODBC Server TAB.
>  7.Try sorting based on each filed USer/IP/Session ID/Finish 
> Time/DUration/Total execute
> *Issue:*
>  *Sorting[ascending or descending]* based on *Duration* is not proper in 
> *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
> sorting is based on string/number only instead of proper days/weeks/hours ..
>  Issue there in *Session Statistics* & *SQL Statistics* sessions .Please 
> check it.
> !Sort Icon.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29053:
-
Attachment: Sort Icon.png

> Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for 
> Duration field
> ---
>
> Key: SPARK-29053
> URL: https://issues.apache.org/jira/browse/SPARK-29053
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Sort Icon.png
>
>
> Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
> *Duration* field.
> *Test Steps*
> 1.Install spark
> 2.Start Spark beeline
> 3.Submit some SQL queries
> 4.Close some spark applications
> 5.Check the Spark Web UI JDBC/ODBC Server TAB.
> 7.Try sorting based on each filed USer/IP/Session ID/Finish 
> Time/DUration/Total execute
> *Issue:*
> *Sorting[ascending or descending]* based on *Duration* is not proper in 
> *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
> sorting is based on string/number only instead of proper days/weeks/hours ..
> Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check 
> it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field

2019-09-11 Thread Rakesh Raushan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927467#comment-16927467
 ] 

Rakesh Raushan commented on SPARK-29053:


I will work on this one.

> Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for 
> Duration field
> ---
>
> Key: SPARK-29053
> URL: https://issues.apache.org/jira/browse/SPARK-29053
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Minor
>
> Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
> *Duration* field.
> *Test Steps*
> 1.Install spark
> 2.Start Spark beeline
> 3.Submit some SQL queries
> 4.Close some spark applications
> 5.Check the Spark Web UI JDBC/ODBC Server TAB.
> 7.Try sorting based on each filed USer/IP/Session ID/Finish 
> Time/DUration/Total execute
> *Issue:*
> *Sorting[ascending or descending]* based on *Duration* is not proper in 
> *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
> sorting is based on string/number only instead of proper days/weeks/hours ..
> Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check 
> it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field

2019-09-11 Thread jobit mathew (Jira)
jobit mathew created SPARK-29053:


 Summary: Spark Thrift JDBC/ODBC Server application UI, Sorting is 
not working for Duration field
 Key: SPARK-29053
 URL: https://issues.apache.org/jira/browse/SPARK-29053
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.3
Reporter: jobit mathew


Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for 
*Duration* field.

*Test Steps*
1.Install spark
2.Start Spark beeline
3.Submit some SQL queries
4.Close some spark applications
5.Check the Spark Web UI JDBC/ODBC Server TAB.
7.Try sorting based on each filed USer/IP/Session ID/Finish Time/DUration/Total 
execute

*Issue:*
*Sorting[ascending or descending]* based on *Duration* is not proper in 
*JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like 
sorting is based on string/number only instead of proper days/weeks/hours ..
Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check 
it.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Lantao Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927374#comment-16927374
 ] 

Lantao Jin edited comment on SPARK-29038 at 9/11/19 10:12 AM:
--

[~smilegator], materialized view is not ANSI SQL. 
https://en.wikipedia.org/wiki/Materialized_view
Our implementation refers CTAS syntax in Spark.


was (Author: cltlfcjin):
[~smilegator] Sure, we will totally fellow ANSI SQL when commit although it 
contains some unstandard ones in our internal version.

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29052) Create a Migration Guide tap in Spark documentation

2019-09-11 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-29052:


 Summary: Create a Migration Guide tap in Spark documentation
 Key: SPARK-29052
 URL: https://issues.apache.org/jira/browse/SPARK-29052
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, ML, PySpark, Spark Core, SparkR, SQL, 
Structured Streaming
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


Currently, there is no migration sections for PySpark, SparkCore and Structured 
Streaming.
It is difficult for users to know what to do when they upgrade.

It would be great if we create a migration tap and put related migration notes 
together.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29051:
-
Description: 
Spark Application UI *Search is not working* for some fields in *Spark Web UI 
Executors TAB* and Spark job History Server page

*Test Steps*
 1.Install spark
 2.Start Spark SQL/Shell/beeline
 3.Submit some SQL queries 
 4.Close some spark applications
 5.Check the Spark Web UI Executors TAB and verify search
 6.Check Spark job History Server page and verify search

*Issue 1*

Searching of some field contents are not working in *Spark Web UI Executors 
TAB*(Spark SQL/Shell/JDBC server UIs ).

• *Input column*(search working wrongly .Example if input is 34.5KB,searching 
of 34.5 won't take ,but 345 shows the search result -it is wrong)
 • Task time search is Ok, but *GC time* search not working
 • *Thread Dump* -search not working [have to confirm it is required to add in 
search, but we are able to search stdout text in that case Thread Dump text 
also should be searchable ]
 • *Storage memory* example 384.1 search not searching.

!Search Missing.png!

*Issue 2:*

*Spark job History Server page*,completed tasks- search is not working based on 
*Duration column values*. We are getting the proper search result, if we search 
the content from any other columns except Duration.*For example if Duration is 
6.1 min* we can not search result for 6.1 min or even 6.1.

!Duration Search.png!

  !Duration Search1.png!

  was:
Spark Application UI *Search is not working* for some fields in *Spark Web UI 
Executors TAB* and Spark job History Server page

*Test Steps*
 1.Install spark
 2.Start Spark SQL/Shell/beeline
 3.Submit some SQL queries 
 4.Close some spark applications
 5.Check the Spark Web UI Executors TAB and verify search
 6.Check Spark job History Server page and verify search

*Issue 1*

Searching of some field contents are not working in *Spark Web UI Executors 
TAB*(Spark SQL/Shell/JDBC server UIs ).

• *Input column*(search working wrongly .Example if input is 34.5KB,searching 
of 34.5 won't take ,but 345 shows the search result -it is wrong)
 • Task time search is Ok, but *GC time* search not working
 • *Thread Dump* -search not working [have to confirm it is required to add in 
search, but we are able to search stdout text in that case Thread Dump text 
also should be searchable ]
 • *Storage memory* example 384.1 search not searching.

*Issue 2:*

*Spark job History Server page*,completed tasks- search is not working based on 
*Duration column values*. We are getting the proper search result, if we search 
the content from any other columns except Duration.*For example if Duration is 
6.1 min* we can not search result for 6.1 min or even 6.1.

 


> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Duration Search.png, Duration Search1.png, Search 
> Missing.png, Search Missing.png
>
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> !Search Missing.png!
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
> !Duration Search.png!
>   !Duration Search1.png!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29051:
-
Attachment: Duration Search1.png

> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Duration Search.png, Duration Search1.png, Search 
> Missing.png, Search Missing.png
>
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29051:
-
Attachment: Duration Search.png

> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Duration Search.png, Search Missing.png, Search 
> Missing.png
>
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29051:
-
Attachment: Search Missing.png

> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Duration Search.png, Search Missing.png, Search 
> Missing.png
>
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29051:
-
Attachment: Search Missing.png

> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
> Attachments: Search Missing.png
>
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28882) Memory leak when stopping spark session

2019-09-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-28882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Łukasz Pińkowski updated SPARK-28882:
-
Description: 
When calling stop() method on spark session underlying SparkContext is being 
stopped.

It causes also stop of underlying ContextCleaner thread, usually before it is 
able to clean all context objects (not all of them are returned to 
ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is 
never collected by GC.

 

There should be at least comment in documentation that calling stop() method on 
session or context may lead to memory leaks.

  was:
When calling stop() method on spark session underlying SparkContext is being 
stopped.

It causes also stop of underlying ContextCleaner thread, usually before it is 
able to clean all context objects (not all of them are returned to 
ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is 
never collected by GC.


> Memory leak when stopping spark session
> ---
>
> Key: SPARK-28882
> URL: https://issues.apache.org/jira/browse/SPARK-28882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Łukasz Pińkowski
>Priority: Major
>
> When calling stop() method on spark session underlying SparkContext is being 
> stopped.
> It causes also stop of underlying ContextCleaner thread, usually before it is 
> able to clean all context objects (not all of them are returned to 
> ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is 
> never collected by GC.
>  
> There should be at least comment in documentation that calling stop() method 
> on session or context may lead to memory leaks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927434#comment-16927434
 ] 

Aman Omer commented on SPARK-29051:
---

I would like to handle this.

> Spark Application UI search is not working for some fields
> --
>
> Key: SPARK-29051
> URL: https://issues.apache.org/jira/browse/SPARK-29051
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.3, 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Spark Application UI *Search is not working* for some fields in *Spark Web UI 
> Executors TAB* and Spark job History Server page
> *Test Steps*
>  1.Install spark
>  2.Start Spark SQL/Shell/beeline
>  3.Submit some SQL queries 
>  4.Close some spark applications
>  5.Check the Spark Web UI Executors TAB and verify search
>  6.Check Spark job History Server page and verify search
> *Issue 1*
> Searching of some field contents are not working in *Spark Web UI Executors 
> TAB*(Spark SQL/Shell/JDBC server UIs ).
> • *Input column*(search working wrongly .Example if input is 34.5KB,searching 
> of 34.5 won't take ,but 345 shows the search result -it is wrong)
>  • Task time search is Ok, but *GC time* search not working
>  • *Thread Dump* -search not working [have to confirm it is required to add 
> in search, but we are able to search stdout text in that case Thread Dump 
> text also should be searchable ]
>  • *Storage memory* example 384.1 search not searching.
> *Issue 2:*
> *Spark job History Server page*,completed tasks- search is not working based 
> on *Duration column values*. We are getting the proper search result, if we 
> search the content from any other columns except Duration.*For example if 
> Duration is 6.1 min* we can not search result for 6.1 min or even 6.1.
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29051) Spark Application UI search is not working for some fields

2019-09-11 Thread jobit mathew (Jira)
jobit mathew created SPARK-29051:


 Summary: Spark Application UI search is not working for some fields
 Key: SPARK-29051
 URL: https://issues.apache.org/jira/browse/SPARK-29051
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.4, 2.4.3
Reporter: jobit mathew


Spark Application UI *Search is not working* for some fields in *Spark Web UI 
Executors TAB* and Spark job History Server page

*Test Steps*
 1.Install spark
 2.Start Spark SQL/Shell/beeline
 3.Submit some SQL queries 
 4.Close some spark applications
 5.Check the Spark Web UI Executors TAB and verify search
 6.Check Spark job History Server page and verify search

*Issue 1*

Searching of some field contents are not working in *Spark Web UI Executors 
TAB*(Spark SQL/Shell/JDBC server UIs ).

• *Input column*(search working wrongly .Example if input is 34.5KB,searching 
of 34.5 won't take ,but 345 shows the search result -it is wrong)
 • Task time search is Ok, but *GC time* search not working
 • *Thread Dump* -search not working [have to confirm it is required to add in 
search, but we are able to search stdout text in that case Thread Dump text 
also should be searchable ]
 • *Storage memory* example 384.1 search not searching.

*Issue 2:*

*Spark job History Server page*,completed tasks- search is not working based on 
*Duration column values*. We are getting the proper search result, if we search 
the content from any other columns except Duration.*For example if Duration is 
6.1 min* we can not search result for 6.1 min or even 6.1.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927421#comment-16927421
 ] 

dengziming edited comment on SPARK-29050 at 9/11/19 9:17 AM:
-

Hi, I have already do this.

[https://github.com/apache/spark/pull/25756]


was (Author: dengziming):
[https://github.com/apache/spark/pull/25756]

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Major
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927421#comment-16927421
 ] 

dengziming commented on SPARK-29050:


[https://github.com/apache/spark/pull/25756]

> Fix typo in some docs
> -
>
> Key: SPARK-29050
> URL: https://issues.apache.org/jira/browse/SPARK-29050
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.3, 2.4.3, 3.0.0
>Reporter: dengziming
>Priority: Major
>
> 'a hdfs' change into  'an hdfs'
> 'an unique' change into 'a unique'
> 'an url' change into 'a url'
> 'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29050) Fix typo in some docs

2019-09-11 Thread dengziming (Jira)
dengziming created SPARK-29050:
--

 Summary: Fix typo in some docs
 Key: SPARK-29050
 URL: https://issues.apache.org/jira/browse/SPARK-29050
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.3, 2.3.3, 3.0.0
Reporter: dengziming


'a hdfs' change into  'an hdfs'
'an unique' change into 'a unique'
'an url' change into 'a url'
'a error' change into 'an error'



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927402#comment-16927402
 ] 

feiwang edited comment on SPARK-29043 at 9/11/19 8:41 AM:
--

[~kabhwan]
* How long "spark.history.fs.update.interval" has been set?20s
* How many applications are reloaded per each call of checkForLogs?   5+
* How big the event log for each application is?there maybe many large logs.

I think SPARK-28594 is more helpful for our case.


was (Author: hzfeiwang):
* How long "spark.history.fs.update.interval" has been set?20s
* How many applications are reloaded per each call of checkForLogs?   5+
* How big the event log for each application is?there maybe many large logs.

I think SPARK-28594 is more helpful for our case.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-09-11 Thread feiwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927402#comment-16927402
 ] 

feiwang commented on SPARK-29043:
-

* How long "spark.history.fs.update.interval" has been set?20s
* How many applications are reloaded per each call of checkForLogs?   5+
* How big the event log for each application is?there maybe many large logs.

I think SPARK-28594 is more helpful for our case.

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-11 Thread Xianyin Xin (Jira)
Xianyin Xin created SPARK-29049:
---

 Summary: Rename DataSourceStrategy#normalizeFilters to 
DataSourceStrategy#normalizeAttrNames
 Key: SPARK-29049
 URL: https://issues.apache.org/jira/browse/SPARK-29049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View

2019-09-11 Thread Lantao Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927374#comment-16927374
 ] 

Lantao Jin commented on SPARK-29038:


[~smilegator] Sure, we will totally fellow ANSI SQL when commit although it 
contains some unstandard ones in our internal version.

> SPIP: Support Spark Materialized View
> -
>
> Key: SPARK-29038
> URL: https://issues.apache.org/jira/browse/SPARK-29038
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Materialized view is an important approach in DBMS to cache data to 
> accelerate queries. By creating a materialized view through SQL, the data 
> that can be cached is very flexible, and needs to be configured arbitrarily 
> according to specific usage scenarios. The Materialization Manager 
> automatically updates the cache data according to changes in detail source 
> tables, simplifying user work. When user submit query, Spark optimizer 
> rewrites the execution plan based on the available materialized view to 
> determine the optimal execution plan.
> Details in [design 
> doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2019-09-11 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-29048:
--

Assignee: Weichen Xu

> Query optimizer slow when using Column.isInCollection() with a large size 
> collection
> 
>
> Key: SPARK-29048
> URL: https://issues.apache.org/jira/browse/SPARK-29048
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Query optimizer slow when using Column.isInCollection() with a large size 
> collection.
> The query optimizer takes a long time to do its thing and on the UI all I see 
> is "Running commands". This can take from 10s of minutes to 11 hours 
> depending on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection

2019-09-11 Thread Weichen Xu (Jira)
Weichen Xu created SPARK-29048:
--

 Summary: Query optimizer slow when using Column.isInCollection() 
with a large size collection
 Key: SPARK-29048
 URL: https://issues.apache.org/jira/browse/SPARK-29048
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Weichen Xu


Query optimizer slow when using Column.isInCollection() with a large size 
collection.

The query optimizer takes a long time to do its thing and on the UI all I see 
is "Running commands". This can take from 10s of minutes to 11 hours depending 
on how many values there are.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >