[jira] [Commented] (SPARK-21492) Memory leak in SortMergeJoin
[ https://issues.apache.org/jira/browse/SPARK-21492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928204#comment-16928204 ] zhoukang commented on SPARK-21492: -- Any progress of this issue? [~jiangxb1987] We also encountered this problem > Memory leak in SortMergeJoin > > > Key: SPARK-21492 > URL: https://issues.apache.org/jira/browse/SPARK-21492 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0, 2.3.1, 3.0.0 >Reporter: Zhan Zhang >Priority: Major > > In SortMergeJoin, if the iterator is not exhausted, there will be memory leak > caused by the Sort. The memory is not released until the task end, and cannot > be used by other operators causing performance drop or OOM. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928180#comment-16928180 ] feiwang commented on SPARK-29037: - [~cloud_fan] > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.3 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when this task complete, it will save output to committedTaskPath, > when all tasks of this stage success, all task output under committedTaskPath > will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in committedTaskPath, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > committedTaskPath dir. > And when the task commit stage of new application success, all task output > under this committedTaskPath, which contains parts of old application's task > output , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Comment: was deleted (was: If we have several applications, which insert overwrite a partition of same table, running same time. There may be data corruption when they commit task output same time.) > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.3 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when this task complete, it will save output to committedTaskPath, > when all tasks of this stage success, all task output under committedTaskPath > will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in committedTaskPath, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > committedTaskPath dir. > And when the task commit stage of new application success, all task output > under this committedTaskPath, which contains parts of old application's task > output , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Description: When we insert overwrite a partition of table. For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when this task complete, it will save output to committedTaskPath, when all tasks of this stage success, all task output under committedTaskPath will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in committedTaskPath, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this committedTaskPath dir. And when the task commit stage of new application success, all task output under this committedTaskPath, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. was: When we insert overwrite a partition of table. For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when this task complete, it will save output to when all tasks of this stage success, all task output under staging dir will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in staging dir, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this staging dir. And when the task commit stage of new application success, all task output under this staging dir, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.3 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when this task complete, it will save output to committedTaskPath, > when all tasks of this stage success, all task output under committedTaskPath > will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in committedTaskPath, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > committedTaskPath dir. > And when the task commit stage of new application success, all task output > under this committedTaskPath, which contains parts of old application's task > output , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29064) Add PrometheusResource to export Executor metrics
Dongjoon Hyun created SPARK-29064: - Summary: Add PrometheusResource to export Executor metrics Key: SPARK-29064 URL: https://issues.apache.org/jira/browse/SPARK-29064 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Description: When we insert overwrite a partition of table. For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when this task complete, it will save output to when all tasks of this stage success, all task output under staging dir will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in staging dir, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this staging dir. And when the task commit stage of new application success, all task output under this staging dir, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. was: When we insert overwrite a partition of table. For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when all tasks of this stage success, all task output under staging dir will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in staging dir, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this staging dir. And when the task commit stage of new application success, all task output under this staging dir, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.3 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when this task complete, it will save output to > when all tasks of this stage success, all task output under staging dir will > be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Affects Version/s: 2.3.3 > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.3.3 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928170#comment-16928170 ] feiwang commented on SPARK-29037: - If we have several applications, which insert overwrite a partition of same table, running same time. There may be data corruption when they commit task output same time. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928168#comment-16928168 ] feiwang commented on SPARK-29037: - This committedTaskPath is hard coded in FileOutputCommitter class. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928165#comment-16928165 ] feiwang commented on SPARK-29037: - This is the unit test log. !screenshot-1.png! We can see that, the task's output will always be saved at $tablePath/_temporary/0/. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Attachment: screenshot-1.png > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29037) [Core] Spark gives duplicate result when an application was killed and rerun
[ https://issues.apache.org/jira/browse/SPARK-29037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwang updated SPARK-29037: Description: When we insert overwrite a partition of table. For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when all tasks of this stage success, all task output under staging dir will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in staging dir, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this staging dir. And when the task commit stage of new application success, all task output under this staging dir, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. was: For a stage, whose tasks commit output, a task saves output to a staging dir firstly, when all tasks of this stage success, all task output under staging dir will be moved to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results will be kept in staging dir, which would not be cleared gracefully. Then we rerun this application and the new application will reuse this staging dir. And when the task commit stage of new application success, all task output under this staging dir, which contains parts of old application's task output , would be moved to destination dir and the result is duplicated. > [Core] Spark gives duplicate result when an application was killed and rerun > > > Key: SPARK-29037 > URL: https://issues.apache.org/jira/browse/SPARK-29037 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: feiwang >Priority: Major > Attachments: screenshot-1.png > > > When we insert overwrite a partition of table. > For a stage, whose tasks commit output, a task saves output to a staging dir > firstly, when all tasks of this stage success, all task output under staging > dir will be moved to destination dir. > However, when we kill an application, which is committing tasks' output, > parts of tasks' results will be kept in staging dir, which would not be > cleared gracefully. > Then we rerun this application and the new application will reuse this > staging dir. > And when the task commit stage of new application success, all task output > under this staging dir, which contains parts of old application's task output > , would be moved to destination dir and the result is duplicated. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29063) fillna support for joined table
[ https://issues.apache.org/jira/browse/SPARK-29063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-29063: Description: When you have a joined table that has the same field name from both original table, fillna will fail even if you specify a subset that does not include the 'ambiguous' fields. {code:java} scala> val df1 = Seq(("f1-1", "f2", null), ("f1-2", null, null), ("f1-3", "f2", "f3-1"), ("f1-4", "f2", "f3-1")).toDF("f1", "f2", "f3") scala> val df2 = Seq(("f1-1", null, null), ("f1-2", "f2", null), ("f1-3", "f2", "f4-1")).toDF("f1", "f2", "f4") scala> val df_join = df1.alias("df1").join(df2.alias("df2"), Seq("f1"), joinType="left_outer") scala> df_join.na.fill("", cols=Seq("f4")) org.apache.spark.sql.AnalysisException: Reference 'f2' is ambiguous, could be: df1.f2, df2.f2.; {code} > fillna support for joined table > --- > > Key: SPARK-29063 > URL: https://issues.apache.org/jira/browse/SPARK-29063 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > When you have a joined table that has the same field name from both original > table, fillna will fail even if you specify a subset that does not include > the 'ambiguous' fields. > {code:java} > scala> val df1 = Seq(("f1-1", "f2", null), ("f1-2", null, null), ("f1-3", > "f2", "f3-1"), ("f1-4", "f2", "f3-1")).toDF("f1", "f2", "f3") > scala> val df2 = Seq(("f1-1", null, null), ("f1-2", "f2", null), ("f1-3", > "f2", "f4-1")).toDF("f1", "f2", "f4") > scala> val df_join = df1.alias("df1").join(df2.alias("df2"), Seq("f1"), > joinType="left_outer") > scala> df_join.na.fill("", cols=Seq("f4")) > org.apache.spark.sql.AnalysisException: Reference 'f2' is ambiguous, could > be: df1.f2, df2.f2.; > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29063) fillna support for joined table
Yuanjian Li created SPARK-29063: --- Summary: fillna support for joined table Key: SPARK-29063 URL: https://issues.apache.org/jira/browse/SPARK-29063 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuanjian Li -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928144#comment-16928144 ] Lantao Jin commented on SPARK-29038: [~smilegator] Yes. It's physically stored. I will create a detail documentation which contains more details to illustrate the implementation. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread
[ https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29046. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25753 [https://github.com/apache/spark/pull/25753] > Possible NPE on SQLConf.get when SparkContext is stopping in another thread > --- > > Key: SPARK-29046 > URL: https://issues.apache.org/jira/browse/SPARK-29046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > We encountered NPE in listener code which deals with query plan - and > according to the stack trace below, only possible case of NPE is > SparkContext._dagScheduler being null, which is only possible while stopping > SparkContext (unless null is set from outside). > > {code:java} > 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped > Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO > server.AbstractConnector: Stopped > Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO > ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO > cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 > 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each > executor to shut down19/09/11 00:22:24 INFO > cluster.SchedulerExtensionServices: Stopping > SchedulerExtensionServices(serviceOption=None, services=List(), > started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught > exception during parsing eventjava.lang.NullPointerException at > org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at > org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at > scala.Option.map(Option.scala:146) at > org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at > org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at > com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102) > at > com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at > scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at > com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at > scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63) > at > com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) > at > com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) > at scala.Option.foreach(Option.scala:257) at >
[jira] [Assigned] (SPARK-29046) Possible NPE on SQLConf.get when SparkContext is stopping in another thread
[ https://issues.apache.org/jira/browse/SPARK-29046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-29046: Assignee: Jungtaek Lim > Possible NPE on SQLConf.get when SparkContext is stopping in another thread > --- > > Key: SPARK-29046 > URL: https://issues.apache.org/jira/browse/SPARK-29046 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > > We encountered NPE in listener code which deals with query plan - and > according to the stack trace below, only possible case of NPE is > SparkContext._dagScheduler being null, which is only possible while stopping > SparkContext (unless null is set from outside). > > {code:java} > 19/09/11 00:22:24 INFO server.AbstractConnector: Stopped > Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO > server.AbstractConnector: Stopped > Spark@49d8c117{HTTP/1.1,[http/1.1]}{0.0.0.0:0}19/09/11 00:22:24 INFO > ui.SparkUI: Stopped Spark web UI at http://:3277019/09/11 00:22:24 INFO > cluster.YarnClusterSchedulerBackend: Shutting down all executors19/09/11 > 00:22:24 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each > executor to shut down19/09/11 00:22:24 INFO > cluster.SchedulerExtensionServices: Stopping > SchedulerExtensionServices(serviceOption=None, services=List(), > started=false)19/09/11 00:22:24 WARN sql.SparkExecutionPlanProcessor: Caught > exception during parsing eventjava.lang.NullPointerException at > org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at > org.apache.spark.sql.internal.SQLConf$$anonfun$15.apply(SQLConf.scala:133) at > scala.Option.map(Option.scala:146) at > org.apache.spark.sql.internal.SQLConf$.get(SQLConf.scala:133) at > org.apache.spark.sql.types.StructType.simpleString(StructType.scala:352) at > com.hortonworks.spark.atlas.types.internal$.sparkTableToEntity(internal.scala:102) > at > com.hortonworks.spark.atlas.types.AtlasEntityUtils$class.tableToEntity(AtlasEntityUtils.scala:62) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$.tableToEntity(CommandsHarvester.scala:45) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:240) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$$anonfun$com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities$1.apply(CommandsHarvester.scala:239) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at > scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at > com.hortonworks.spark.atlas.sql.CommandsHarvester$.com$hortonworks$spark$atlas$sql$CommandsHarvester$$discoverInputsEntities(CommandsHarvester.scala:239) > at > com.hortonworks.spark.atlas.sql.CommandsHarvester$CreateDataSourceTableAsSelectHarvester$.harvest(CommandsHarvester.scala:104) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:138) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor$$anonfun$2.apply(SparkExecutionPlanProcessor.scala:89) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at > scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:89) > at > com.hortonworks.spark.atlas.sql.SparkExecutionPlanProcessor.process(SparkExecutionPlanProcessor.scala:63) > at > com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:72) > at > com.hortonworks.spark.atlas.AbstractEventProcessor$$anonfun$eventProcess$1.apply(AbstractEventProcessor.scala:71) > at scala.Option.foreach(Option.scala:257) at > com.hortonworks.spark.atlas.AbstractEventProcessor.eventProcess(AbstractEventProcessor.scala:71) > at >
[jira] [Updated] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dengziming updated SPARK-29050: --- Issue Type: Improvement (was: Bug) > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Trivial > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928121#comment-16928121 ] dengziming commented on SPARK-29050: [~srowen] thank you! > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Trivial > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dengziming updated SPARK-29050: --- Issue Type: Bug (was: Improvement) > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Trivial > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29062) Add V1_BATCH_WRITE to the TableCapabilityChecks in the Analyzer
Burak Yavuz created SPARK-29062: --- Summary: Add V1_BATCH_WRITE to the TableCapabilityChecks in the Analyzer Key: SPARK-29062 URL: https://issues.apache.org/jira/browse/SPARK-29062 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Burak Yavuz Currently the checks in the Analyzer require that V2 Tables have BATCH_WRITE defined for all tables that have V1 Write fallbacks. This is confusing as these tables may not have the V2 writer interface implemented yet. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29061) Prints bytecode statistics in debugCodegen
Takeshi Yamamuro created SPARK-29061: Summary: Prints bytecode statistics in debugCodegen Key: SPARK-29061 URL: https://issues.apache.org/jira/browse/SPARK-29061 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Takeshi Yamamuro This ticket targets to print bytecode statistics (max class bytecode size, max method bytecode size, and max constant pool size) for generated classes in debug prints, {{debugCodegen}}. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29057) remove InsertIntoTable
[ https://issues.apache.org/jira/browse/SPARK-29057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29057. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25763 [https://github.com/apache/spark/pull/25763] > remove InsertIntoTable > -- > > Key: SPARK-29057 > URL: https://issues.apache.org/jira/browse/SPARK-29057 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928100#comment-16928100 ] Adrian Wang commented on SPARK-29038: - This seems duplicates with our proposal of SPARK-26764 . We have implemented similar features and have already had it running in our customer's production environment. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29041) Allow createDataFrame to accept bytes as binary type
[ https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-29041. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25749 [https://github.com/apache/spark/pull/25749] > Allow createDataFrame to accept bytes as binary type > > > Key: SPARK-29041 > URL: https://issues.apache.org/jira/browse/SPARK-29041 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > {code} > spark.createDataFrame([[b"abcd"]], "col binary") > {code} > simply fails as below: > in Python 3 > {code} > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj > TypeError: field col: BinaryType can not accept object b'abcd' in type 'bytes'> > {code} > in Python 2: > {code} > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj > TypeError: field col: BinaryType can not accept object 'abcd' in type 'str'> > {code} > {{bytes}} should also be able to accepted as binary type -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29041) Allow createDataFrame to accept bytes as binary type
[ https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-29041: Assignee: Hyukjin Kwon > Allow createDataFrame to accept bytes as binary type > > > Key: SPARK-29041 > URL: https://issues.apache.org/jira/browse/SPARK-29041 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4, 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > spark.createDataFrame([[b"abcd"]], "col binary") > {code} > simply fails as below: > in Python 3 > {code} > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj > TypeError: field col: BinaryType can not accept object b'abcd' in type 'bytes'> > {code} > in Python 2: > {code} > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 787, in > createDataFrame > rdd, schema = self._createFromLocal(map(prepare, data), schema) > File "/.../spark/python/pyspark/sql/session.py", line 442, in > _createFromLocal > data = list(data) > File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare > verify_func(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct > verifier(v) > File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify > verify_value(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default > verify_acceptable_types(obj) > File "/.../spark/python/pyspark/sql/types.py", line 1282, in > verify_acceptable_types > % (dataType, obj, type(obj > TypeError: field col: BinaryType can not accept object 'abcd' in type 'str'> > {code} > {{bytes}} should also be able to accepted as binary type -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"
[ https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-24663: -- Assignee: Jungtaek Lim > Flaky test: StreamingContextSuite "stop slow receiver gracefully" > - > > Key: SPARK-24663 > URL: https://issues.apache.org/jira/browse/SPARK-24663 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Jungtaek Lim >Priority: Minor > > This is another test that sometimes fails on our build machines, although I > can't find failures on the riselab jenkins servers. Failure looks like: > {noformat} > org.scalatest.exceptions.TestFailedException: 0 was not greater than 0 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335) > {noformat} > The test fails in about 2s, while a successful run generally takes 15s. > Looking at the logs, the receiver hasn't even started when things fail, which > points at a race during test initialization. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24663) Flaky test: StreamingContextSuite "stop slow receiver gracefully"
[ https://issues.apache.org/jira/browse/SPARK-24663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-24663. Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25725 [https://github.com/apache/spark/pull/25725] > Flaky test: StreamingContextSuite "stop slow receiver gracefully" > - > > Key: SPARK-24663 > URL: https://issues.apache.org/jira/browse/SPARK-24663 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.4.0, 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > This is another test that sometimes fails on our build machines, although I > can't find failures on the riselab jenkins servers. Failure looks like: > {noformat} > org.scalatest.exceptions.TestFailedException: 0 was not greater than 0 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply$mcV$sp(StreamingContextSuite.scala:356) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335) > at > org.apache.spark.streaming.StreamingContextSuite$$anonfun$24.apply(StreamingContextSuite.scala:335) > {noformat} > The test fails in about 2s, while a successful run generally takes 15s. > Looking at the logs, the receiver hasn't even started when things fail, which > points at a race during test initialization. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V
[ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921700#comment-16921700 ] Michael Heuer edited comment on SPARK-27781 at 9/11/19 7:47 PM: -This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 without Hadoop.- https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/ was (Author: heuermh): -This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 without Hadoop.- [-https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/-] > Tried to access method org.apache.avro.specific.SpecificData.()V > -- > > Key: SPARK-27781 > URL: https://issues.apache.org/jira/browse/SPARK-27781 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Michael Heuer >Priority: Major > Fix For: 2.4.4 > > Attachments: reproduce.sh > > > It appears that there is a conflict in avro dependency versions at runtime > when using Spark 2.4.3 and Scala 2.12 > (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop > 2.7.7. > > Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes > avro-1.8.2.jar > {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}} > {{jars/avro-1.8.2.jar}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > {{jars/avro-ipc-1.8.2.jar}} > > Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop > does not > {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > > Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which > conflicts at runtime > {{$ find hadoop-2.7.7 -name *.jar | grep avro}} > {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}} > {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}} > {{share/hadoop/tools/lib/avro-1.7.4.jar}} > {{share/hadoop/common/lib/avro-1.7.4.jar}} > {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}} > > Issue filed downstream in > [https://github.com/bigdatagenomics/adam/issues/2151] > > Attached a smaller reproducing test case. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V
[ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Heuer resolved SPARK-27781. --- Fix Version/s: 2.4.4 Resolution: Fixed > Tried to access method org.apache.avro.specific.SpecificData.()V > -- > > Key: SPARK-27781 > URL: https://issues.apache.org/jira/browse/SPARK-27781 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Michael Heuer >Priority: Major > Fix For: 2.4.4 > > Attachments: reproduce.sh > > > It appears that there is a conflict in avro dependency versions at runtime > when using Spark 2.4.3 and Scala 2.12 > (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop > 2.7.7. > > Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes > avro-1.8.2.jar > {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}} > {{jars/avro-1.8.2.jar}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > {{jars/avro-ipc-1.8.2.jar}} > > Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop > does not > {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > > Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which > conflicts at runtime > {{$ find hadoop-2.7.7 -name *.jar | grep avro}} > {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}} > {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}} > {{share/hadoop/tools/lib/avro-1.7.4.jar}} > {{share/hadoop/common/lib/avro-1.7.4.jar}} > {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}} > > Issue filed downstream in > [https://github.com/bigdatagenomics/adam/issues/2151] > > Attached a smaller reproducing test case. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V
[ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921700#comment-16921700 ] Michael Heuer edited comment on SPARK-27781 at 9/11/19 7:46 PM: -This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 without Hadoop.- [-https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/-] was (Author: heuermh): This is still an issue with the Spark 2.4.4 binary distribution for Scala 2.12 without Hadoop. https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/3047/ > Tried to access method org.apache.avro.specific.SpecificData.()V > -- > > Key: SPARK-27781 > URL: https://issues.apache.org/jira/browse/SPARK-27781 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3, 2.4.4 >Reporter: Michael Heuer >Priority: Major > Attachments: reproduce.sh > > > It appears that there is a conflict in avro dependency versions at runtime > when using Spark 2.4.3 and Scala 2.12 > (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop > 2.7.7. > > Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes > avro-1.8.2.jar > {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}} > {{jars/avro-1.8.2.jar}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > {{jars/avro-ipc-1.8.2.jar}} > > Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop > does not > {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > > Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which > conflicts at runtime > {{$ find hadoop-2.7.7 -name *.jar | grep avro}} > {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}} > {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}} > {{share/hadoop/tools/lib/avro-1.7.4.jar}} > {{share/hadoop/common/lib/avro-1.7.4.jar}} > {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}} > > Issue filed downstream in > [https://github.com/bigdatagenomics/adam/issues/2151] > > Attached a smaller reproducing test case. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V
[ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927953#comment-16927953 ] Michael Heuer commented on SPARK-27781: --- This issue has been fixed in Spark 2.4.4, and fixed in ADAM Jenkins CI https://github.com/bigdatagenomics/adam/pull/2206 > Tried to access method org.apache.avro.specific.SpecificData.()V > -- > > Key: SPARK-27781 > URL: https://issues.apache.org/jira/browse/SPARK-27781 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3, 2.4.4 >Reporter: Michael Heuer >Priority: Major > Attachments: reproduce.sh > > > It appears that there is a conflict in avro dependency versions at runtime > when using Spark 2.4.3 and Scala 2.12 > (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop > 2.7.7. > > Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes > avro-1.8.2.jar > {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}} > {{jars/avro-1.8.2.jar}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > {{jars/avro-ipc-1.8.2.jar}} > > Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop > does not > {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > > Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which > conflicts at runtime > {{$ find hadoop-2.7.7 -name *.jar | grep avro}} > {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}} > {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}} > {{share/hadoop/tools/lib/avro-1.7.4.jar}} > {{share/hadoop/common/lib/avro-1.7.4.jar}} > {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}} > > Issue filed downstream in > [https://github.com/bigdatagenomics/adam/issues/2151] > > Attached a smaller reproducing test case. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27781) Tried to access method org.apache.avro.specific.SpecificData.()V
[ https://issues.apache.org/jira/browse/SPARK-27781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Heuer updated SPARK-27781: -- Affects Version/s: (was: 2.4.4) > Tried to access method org.apache.avro.specific.SpecificData.()V > -- > > Key: SPARK-27781 > URL: https://issues.apache.org/jira/browse/SPARK-27781 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Michael Heuer >Priority: Major > Attachments: reproduce.sh > > > It appears that there is a conflict in avro dependency versions at runtime > when using Spark 2.4.3 and Scala 2.12 > (spark-2.4.3-bin-without-hadoop-scala-2.12 binary distribution) and Hadoop > 2.7.7. > > Specifically, the Spark 2.4.3 binary distribution for Hadoop 2.7.x includes > avro-1.8.2.jar > {{$ find spark-2.4.3-bin-hadoop2.7 *.jar | grep avro}} > {{jars/avro-1.8.2.jar}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > {{jars/avro-ipc-1.8.2.jar}} > > Whereas the Spark 2.4.3 binary distribution for Scala 2.12 without Hadoop > does not > {{$ find spark-2.4.3-bin-without-hadoop-scala-2.12 *.jar | grep avro}} > {{jars/avro-mapred-1.8.2-hadoop2.jar}} > > Including Hadoop 2.7.7 onto the classpath brings in avro-1.7.4.jar, which > conflicts at runtime > {{$ find hadoop-2.7.7 -name *.jar | grep avro}} > {{share/hadoop/mapreduce/lib/avro-1.7.4.jar}} > {{share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/avro-1.7.4.jar}} > {{share/hadoop/tools/lib/avro-1.7.4.jar}} > {{share/hadoop/common/lib/avro-1.7.4.jar}} > {{hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/avro-1.7.4.jar}} > > Issue filed downstream in > [https://github.com/bigdatagenomics/adam/issues/2151] > > Attached a smaller reproducing test case. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927912#comment-16927912 ] koert kuipers commented on SPARK-29027: --- [~gsomogyi] if you email me at koert at tresata dot com i can send logs > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local DB . SUCCESS [ 5.456 > s] > [INFO] Spark Project Networking
[jira] [Created] (SPARK-29060) Add tree traversal helper for adaptive spark plans
Maryann Xue created SPARK-29060: --- Summary: Add tree traversal helper for adaptive spark plans Key: SPARK-29060 URL: https://issues.apache.org/jira/browse/SPARK-29060 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maryann Xue -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29059) Support for Hive Materialized Views in Spark SQL.
[ https://issues.apache.org/jira/browse/SPARK-29059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amogh Margoor updated SPARK-29059: -- Description: Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark Catalyst does not optimize queries against Hive tables using Materialized View the way Apache Calcite does it for Hive. This Jira is to add support for the same. We have developed it in our internal trunk and would like to open source it. It would consist of 3 major parts: # Reading MV related Hive Metadata # Implication Engine which would figure out if an expression exp1 implies another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar to RexImplication checker in Apache Calcite. # Catalyst rule to replace tables by it's Materialized view using Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * from foo where x > 10 && x <110' then query 'select * from foo where x > 70 and x < 100' will be transformed into 'select * from mv where x >70 and x < 100' Note that Implication Engine and Catalyst Rule is generic can be used even when Spark decides to have it's own Materialized View. was: Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark Catalyst does not optimize queries against Hive tables using Materialized View the way Apache Calcite does it for Hive. This Jira is to add support for the same. We have developed it in our internal track would like to open source it. It would consist of 3 major parts: # Reading MV related Hive Metadata # Implication Engine which would figure out if an expression exp1 implies another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar to RexImplication checker in Apache Calcite. # Catalyst rule to replace tables by it's Materialized view using Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * from foo where x > 10 && x <110' then query 'select * from foo where x > 70 and x < 100' will be transformed into 'select * from mv where x >70 and x < 100' Note that Implication Engine and Catalyst Rule is generic can be used even when Spark decides to have it's own Materialized View. > Support for Hive Materialized Views in Spark SQL. > - > > Key: SPARK-29059 > URL: https://issues.apache.org/jira/browse/SPARK-29059 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Amogh Margoor >Priority: Minor > > Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark > Catalyst does not optimize queries against Hive tables using Materialized > View the way Apache Calcite does it for Hive. This Jira is to add support for > the same. > We have developed it in our internal trunk and would like to open source it. > It would consist of 3 major parts: > # Reading MV related Hive Metadata > # Implication Engine which would figure out if an expression exp1 implies > another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar > to RexImplication checker in Apache Calcite. > # Catalyst rule to replace tables by it's Materialized view using > Implication Engine. For e.g., if MV 'mv' has been created in Hive using query > 'select * from foo where x > 10 && x <110' then query 'select * from foo > where x > 70 and x < 100' will be transformed into 'select * from mv where x > >70 and x < 100' > Note that Implication Engine and Catalyst Rule is generic can be used even > when Spark decides to have it's own Materialized View. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29059) Support for Hive Materialized Views in Spark SQL.
[ https://issues.apache.org/jira/browse/SPARK-29059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amogh Margoor updated SPARK-29059: -- Summary: Support for Hive Materialized Views in Spark SQL. (was: Support for Hive Materialized Views for Spark SQL.) > Support for Hive Materialized Views in Spark SQL. > - > > Key: SPARK-29059 > URL: https://issues.apache.org/jira/browse/SPARK-29059 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Amogh Margoor >Priority: Minor > > Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark > Catalyst does not optimize queries against Hive tables using Materialized > View the way Apache Calcite does it for Hive. This Jira is to add support for > the same. > We have developed it in our internal track would like to open source it. It > would consist of 3 major parts: > # Reading MV related Hive Metadata > # Implication Engine which would figure out if an expression exp1 implies > another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar > to RexImplication checker in Apache Calcite. > # Catalyst rule to replace tables by it's Materialized view using > Implication Engine. For e.g., if MV 'mv' has been created in Hive using query > 'select * from foo where x > 10 && x <110' then query 'select * from foo > where x > 70 and x < 100' will be transformed into 'select * from mv where x > >70 and x < 100' > Note that Implication Engine and Catalyst Rule is generic can be used even > when Spark decides to have it's own Materialized View. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29059) Support for Hive Materialized Views for Spark SQL.
Amogh Margoor created SPARK-29059: - Summary: Support for Hive Materialized Views for Spark SQL. Key: SPARK-29059 URL: https://issues.apache.org/jira/browse/SPARK-29059 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 3.0.0 Reporter: Amogh Margoor Materialized view was introduced in Apache Hive 3.0.0. Currently, Spark Catalyst does not optimize queries against Hive tables using Materialized View the way Apache Calcite does it for Hive. This Jira is to add support for the same. We have developed it in our internal track would like to open source it. It would consist of 3 major parts: # Reading MV related Hive Metadata # Implication Engine which would figure out if an expression exp1 implies another expression exp2 i.e., if exp1 => exp2 is a tautology. This is similar to RexImplication checker in Apache Calcite. # Catalyst rule to replace tables by it's Materialized view using Implication Engine. For e.g., if MV 'mv' has been created in Hive using query 'select * from foo where x > 10 && x <110' then query 'select * from foo where x > 70 and x < 100' will be transformed into 'select * from mv where x >70 and x < 100' Note that Implication Engine and Catalyst Rule is generic can be used even when Spark decides to have it's own Materialized View. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927889#comment-16927889 ] koert kuipers commented on SPARK-29027: --- just for this one test debug logs is 62mb of kerberos and ldap stuff. its difficult to say whats sensitive and whats not. > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local DB . SUCCESS [ 5.456 >
[jira] [Created] (SPARK-29058) Reading csv file with DROPMALFORMED showing incorrect record count
Suchintak Patnaik created SPARK-29058: - Summary: Reading csv file with DROPMALFORMED showing incorrect record count Key: SPARK-29058 URL: https://issues.apache.org/jira/browse/SPARK-29058 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 2.3.0 Reporter: Suchintak Patnaik The spark sql csv reader is dropping malformed records as expected, but the record count is showing as incorrect. Consider this file (fruit.csv) apple,red,1,3 banana,yellow,2,4.56 orange,orange,3,5 Defining schema as follows: schema = "Fruit string,color string,price int,quantity int" Notice that the "quantity" field is defined as integer type, but the 2nd row in the file contains a floating point value, hence it is a corrupt record. >>> df = spark.read.csv(path="fruit.csv",mode="DROPMALFORMED",schema=schema) >>> df.show() +--+--+-++ | Fruit| color|price|quantity| +--+--+-++ | apple| red|1| 3| |orange|orange|3| 5| +--+--+-++ >>> df.count() 3 Malformed record is getting dropped as expected, but incorrect record count is getting displayed. Here the df.count() should give value as 2 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29007) Possible leak of SparkContext in tests / test suites initializing StreamingContext
[ https://issues.apache.org/jira/browse/SPARK-29007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-29007. Fix Version/s: 3.0.0 Assignee: Jungtaek Lim Resolution: Fixed > Possible leak of SparkContext in tests / test suites initializing > StreamingContext > -- > > Key: SPARK-29007 > URL: https://issues.apache.org/jira/browse/SPARK-29007 > Project: Spark > Issue Type: Bug > Components: DStreams, MLlib, Spark Core >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > Fix For: 3.0.0 > > > There're lots of tests creating StreamingContext with creating new > SparkContext in its constructor, and we don't have enough guard to prevent > leakage of SparkContext in test suites. Ideally we should ensure SparkContext > is not leaked between test suites, even between tests if each test creates > StreamingContext. > > One of example for leakage is below: > {noformat} > [info] *** 4 SUITES ABORTED *** > [info] *** 131 TESTS FAILED *** > [error] Error: Total 418, Failed 131, Errors 4, Passed 283, Ignored 1 > [error] Failed tests: > [error] org.apache.spark.streaming.scheduler.JobGeneratorSuite > [error] org.apache.spark.streaming.ReceiverInputDStreamSuite > [error] org.apache.spark.streaming.WindowOperationsSuite > [error] org.apache.spark.streaming.StreamingContextSuite > [error] org.apache.spark.streaming.scheduler.ReceiverTrackerSuite > [error] org.apache.spark.streaming.CheckpointSuite > [error] org.apache.spark.streaming.UISeleniumSuite > [error] > org.apache.spark.streaming.scheduler.ExecutorAllocationManagerSuite > [error] org.apache.spark.streaming.ReceiverSuite > [error] org.apache.spark.streaming.BasicOperationsSuite > [error] org.apache.spark.streaming.InputStreamsSuite > [error] Error during tests: > [error] org.apache.spark.streaming.MapWithStateSuite > [error] org.apache.spark.streaming.DStreamScopeSuite > [error] org.apache.spark.streaming.rdd.MapWithStateRDDSuite > [error] org.apache.spark.streaming.scheduler.InputInfoTrackerSuite > {noformat} > {{}} > {noformat} > [info] JobGeneratorSuite: > [info] - SPARK-6222: Do not clear received block data too soon *** FAILED *** > (2 milliseconds) > [info] org.apache.spark.SparkException: Only one SparkContext should be > running in this JVM (see SPARK-2243).The currently running SparkContext was > created at: > [info] org.apache.spark.SparkContext.(SparkContext.scala:82) > [info] > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851) > [info] > org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85) > [info] > org.apache.spark.streaming.TestSuiteBase.setupStreams(TestSuiteBase.scala:317) > [info] > org.apache.spark.streaming.TestSuiteBase.setupStreams$(TestSuiteBase.scala:311) > [info] > org.apache.spark.streaming.CheckpointSuite.setupStreams(CheckpointSuite.scala:209) > [info] > org.apache.spark.streaming.CheckpointSuite.$anonfun$new$3(CheckpointSuite.scala:258) > [info] scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] org.scalatest.Transformer.apply(Transformer.scala:22) > [info] org.scalatest.Transformer.apply(Transformer.scala:20) > [info] org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > [info] org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) > [info] org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > [info] org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > [info] org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > [info] org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > [info] org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > [info] at > org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2512) > [info] at scala.Option.foreach(Option.scala:274) > [info] at > org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2509) > [info] at > org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2586) > [info] at org.apache.spark.SparkContext.(SparkContext.scala:87) > [info] at > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:851) > [info] at > org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:85) > [info] at >
[jira] [Resolved] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries
[ https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26989. Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25706 [https://github.com/apache/spark/pull/25706] > Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage > attempt don't trigger multiple stage retries > --- > > Key: SPARK-26989 > URL: https://issues.apache.org/jira/browse/SPARK-26989 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/ > {noformat} > org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the > same stage attempt don't trigger multiple stage retries > Error Message > org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal > List(0) > Stacktrace > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: > ArrayBuffer() did not equal List(0) > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122) > {noformat} > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull > {code} > - Barrier task failures from the same stage attempt don't trigger multiple > stage retries *** FAILED *** > ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26989) Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage attempt don't trigger multiple stage retries
[ https://issues.apache.org/jira/browse/SPARK-26989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-26989: -- Assignee: Jungtaek Lim > Flaky test:DAGSchedulerSuite.Barrier task failures from the same stage > attempt don't trigger multiple stage retries > --- > > Key: SPARK-26989 > URL: https://issues.apache.org/jira/browse/SPARK-26989 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Jungtaek Lim >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/102761/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/Barrier_task_failures_from_the_same_stage_attempt_don_t_trigger_multiple_stage_retries/ > {noformat} > org.apache.spark.scheduler.DAGSchedulerSuite.Barrier task failures from the > same stage attempt don't trigger multiple stage retries > Error Message > org.scalatest.exceptions.TestFailedException: ArrayBuffer() did not equal > List(0) > Stacktrace > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: > ArrayBuffer() did not equal List(0) > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > at > org.apache.spark.scheduler.DAGSchedulerSuite.$anonfun$new$144(DAGSchedulerSuite.scala:2644) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:104) > at > org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) > at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) > at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) > at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) > at > org.apache.spark.scheduler.DAGSchedulerSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(DAGSchedulerSuite.scala:122) > {noformat} > - > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109303/consoleFull > {code} > - Barrier task failures from the same stage attempt don't trigger multiple > stage retries *** FAILED *** > ArrayBuffer(0) did not equal List(0) (DAGSchedulerSuite.scala:2656) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927822#comment-16927822 ] Gabor Somogyi commented on SPARK-29027: --- You can remove the sensitive parts or if you only trust me then fine but you loose the possibility of community knowledge. Maybe somebody would pinpoint the issue right away. > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local
[jira] [Created] (SPARK-29057) remove InsertIntoTable
Wenchen Fan created SPARK-29057: --- Summary: remove InsertIntoTable Key: SPARK-29057 URL: https://issues.apache.org/jira/browse/SPARK-29057 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29014) DataSourceV2: Clean up current, default, and session catalog uses
[ https://issues.apache.org/jira/browse/SPARK-29014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927796#comment-16927796 ] Wenchen Fan commented on SPARK-29014: - It doesn't require a major refactor but it's easier and cleaner to make this change with a refactor that centralizes the catalog/table lookup logic. > DataSourceV2: Clean up current, default, and session catalog uses > - > > Key: SPARK-29014 > URL: https://issues.apache.org/jira/browse/SPARK-29014 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Ryan Blue >Priority: Blocker > > Catalog tracking in DSv2 has evolved since the initial changes went in. We > need to make sure that handling is consistent across plans using the latest > rules: > * The _current_ catalog should be used when no catalog is specified > * The _default_ catalog is the catalog _current_ is initialized to > * If the _default_ catalog is not set, then it is the built-in Spark session > catalog, which will be called `spark_catalog` (This is the v2 session catalog) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927784#comment-16927784 ] Xiao Li commented on SPARK-29038: - So far, the doc does not contain enough details. It requires comprehensive comparison with the corresponding features in the other commercial database. We also need to document how to implement them one by one. Also, based on my understanding, the materialized view should not be memory-based. It has to be physically stored. Usage of Spark cache could affect the other memory-intensive queries. Any major feature in cache usage requires a memory manager. I am not against this, but the efforts for supporting this feature are pretty big. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927785#comment-16927785 ] koert kuipers commented on SPARK-29027: --- [~gsomogyi] i can email you debug log file directly if thats ok. rather not post it publicly. > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local DB . SUCCESS [ 5.456 > s] > [INFO] Spark Project
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927783#comment-16927783 ] koert kuipers commented on SPARK-29027: --- i get same error in sbt i think, plus i find sbt a lot easier to handle :) {code} [info] KafkaDelegationTokenSuite: [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** (10 seconds, 543 milliseconds) [info] org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure [info] at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) [info] at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) [info] at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) [info] at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) [info] at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) [info] at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) [info] at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) [info] at org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) [info] at org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) [info] at org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:507) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:296) [info] at sbt.ForkMain$Run$2.call(ForkMain.java:286) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) org.apache.directory.api.ldap.model.exception.LdapOperationErrorException: /home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/system/1.3.6.1.4.1.18060.0.4.1.2.50.lg (No such file or directory) at org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1183) at org.apache.directory.server.core.shared.partition.DefaultPartitionNexus.sync(DefaultPartitionNexus.java:335) at org.apache.directory.server.core.DefaultDirectoryService.shutdown(DefaultDirectoryService.java:1299) at org.apache.directory.server.core.DefaultDirectoryService$1.run(DefaultDirectoryService.java:1230) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.FileNotFoundException: /home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/system/1.3.6.1.4.1.18060.0.4.1.2.50.lg (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:101) at jdbm.recman.TransactionManager.open(TransactionManager.java:209) at jdbm.recman.TransactionManager.synchronizeLogFromMemory(TransactionManager.java:202) at jdbm.recman.TransactionManager.synchronizeLog(TransactionManager.java:135) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmIndex.sync(JdbmIndex.java:698) at org.apache.directory.server.core.partition.impl.btree.jdbm.JdbmPartition.sync(JdbmPartition.java:312) at org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1228) at org.apache.directory.server.core.partition.impl.btree.AbstractBTreePartition.modify(AbstractBTreePartition.java:1173) ... 4 more java.io.FileNotFoundException: /home/koert/src/spark/target/tmp/spark-dc223dd0-e499-4ccf-9600-c70e4706a909/1568218986864/partitions/example/1.3.6.1.4.1.18060.0.4.1.2.5.lg (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:101) at jdbm.recman.TransactionManager.open(TransactionManager.java:209) at jdbm.recman.TransactionManager.synchronizeLogFromMemory(TransactionManager.java:202) at jdbm.recman.TransactionManager.synchronizeLog(TransactionManager.java:135) at
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927715#comment-16927715 ] koert kuipers commented on SPARK-29027: --- i renamed /etc/krb5.conf and it did not change anything. still same failure. {code} ~/spark/external/kafka-0-10-sql$ mvn dependency:tree -Dverbose | grep zookeeper [INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.7:test {code} > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ...
[jira] [Updated] (SPARK-29056) ThriftServerSessionPage displays 1970/01/01 for queries that are not finished and not closed
[ https://issues.apache.org/jira/browse/SPARK-29056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Juliusz Sompolski updated SPARK-29056: -- Issue Type: Bug (was: Improvement) > ThriftServerSessionPage displays 1970/01/01 for queries that are not finished > and not closed > > > Key: SPARK-29056 > URL: https://issues.apache.org/jira/browse/SPARK-29056 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Juliusz Sompolski >Priority: Major > > Spark UI ODBC/JDBC tab session page displays 1970/01/01 (timestamp 0) as > finish/close time for queries that haven't finished yet. > !image-2019-09-11-17-21-52-771.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29056) ThriftServerSessionPage displays 1970/01/01 for queries that are not finished and not closed
Juliusz Sompolski created SPARK-29056: - Summary: ThriftServerSessionPage displays 1970/01/01 for queries that are not finished and not closed Key: SPARK-29056 URL: https://issues.apache.org/jira/browse/SPARK-29056 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Juliusz Sompolski Spark UI ODBC/JDBC tab session page displays 1970/01/01 (timestamp 0) as finish/close time for queries that haven't finished yet. !image-2019-09-11-17-21-52-771.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927646#comment-16927646 ] koert kuipers commented on SPARK-29027: --- let me try to get debug logs > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local DB . SUCCESS [ 5.456 > s] > [INFO] Spark Project Networking ... SUCCESS [ 49.819 > s] >
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927640#comment-16927640 ] Gabor Somogyi commented on SPARK-29027: --- Can you give for example the output of this cmd: {quote}[gaborsomogyi:~/spark/external/kafka-0-10-sql] master(+8/-2)+ ± mvn dependency:tree -Dverbose | grep zookeeper{quote} > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local
[jira] [Commented] (SPARK-29027) KafkaDelegationTokenSuite fails
[ https://issues.apache.org/jira/browse/SPARK-29027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927637#comment-16927637 ] Gabor Somogyi commented on SPARK-29027: --- I've tried to create a krb5.conf file which contains various things but not able to make the test fail. [~koert] please attach something to proceed. > KafkaDelegationTokenSuite fails > --- > > Key: SPARK-29027 > URL: https://issues.apache.org/jira/browse/SPARK-29027 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 > Environment: {code} > commit 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 > Author: Sean Owen > Date: Mon Sep 9 10:19:40 2019 -0500 > {code} > Ubuntu 16.04 with OpenJDK 1.8 (1.8.0_222-8u222-b10-1ubuntu1~16.04.1-b10) >Reporter: koert kuipers >Priority: Minor > > i am seeing consistent failure of KafkaDelegationTokenSuite on master > {code} > JsonUtilsSuite: > - parsing partitions > - parsing partitionOffsets > KafkaDelegationTokenSuite: > javax.security.sasl.SaslException: Failure to initialize security context > [Caused by GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails)] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:125) > at > com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(FactoryImpl.java:85) > at javax.security.sasl.Sasl.createSaslServer(Sasl.java:524) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:118) > at > org.apache.zookeeper.server.ZooKeeperSaslServer$1.run(ZooKeeperSaslServer.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.createSaslServer(ZooKeeperSaslServer.java:114) > at > org.apache.zookeeper.server.ZooKeeperSaslServer.(ZooKeeperSaslServer.java:48) > at > org.apache.zookeeper.server.NIOServerCnxn.(NIOServerCnxn.java:100) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.createConnection(NIOServerCnxnFactory.java:156) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:197) > at java.lang.Thread.run(Thread.java:748) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos credentails) > at > sun.security.jgss.krb5.Krb5AcceptCredential.getInstance(Krb5AcceptCredential.java:87) > at > sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:127) > at > sun.security.jgss.GSSManagerImpl.getCredentialElement(GSSManagerImpl.java:193) > at sun.security.jgss.GSSCredentialImpl.add(GSSCredentialImpl.java:427) > at sun.security.jgss.GSSCredentialImpl.(GSSCredentialImpl.java:62) > at > sun.security.jgss.GSSManagerImpl.createCredential(GSSManagerImpl.java:154) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.(GssKrb5Server.java:108) > ... 12 more > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED *** > org.I0Itec.zkclient.exception.ZkAuthFailedException: Authentication failure > at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:947) > at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:924) > at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:1231) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:157) > at org.I0Itec.zkclient.ZkClient.(ZkClient.java:131) > at kafka.utils.ZkUtils$.createZkClientAndConnection(ZkUtils.scala:93) > at kafka.utils.ZkUtils$.apply(ZkUtils.scala:75) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:202) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:243) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > ... > KafkaSourceOffsetSuite: > - comparison {"t":{"0":1}} <=> {"t":{"0":2}} > - comparison {"t":{"1":0,"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1},"T":{"0":0}} <=> {"t":{"0":2},"T":{"0":1}} > - comparison {"t":{"0":1}} <=> {"t":{"1":1,"0":2}} > - comparison {"t":{"0":1}} <=> {"t":{"1":3,"0":2}} > - basic serialization - deserialization > - OffsetSeqLog serialization - deserialization > - read Spark 2.1.0 offset format > {code} > {code} > [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: > [INFO] > [INFO] Spark Project Parent POM ... SUCCESS [ 4.178 > s] > [INFO] Spark Project Tags . SUCCESS [ 9.373 > s] > [INFO] Spark Project Sketch ... SUCCESS [ 24.586 > s] > [INFO] Spark Project Local DB
[jira] [Commented] (SPARK-28985) Pyspark ClassificationModel and RegressionModel support column setters/getters/predict
[ https://issues.apache.org/jira/browse/SPARK-28985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927612#comment-16927612 ] Huaxin Gao commented on SPARK-28985: Thanks [~podongfeng] I will work on this. > Pyspark ClassificationModel and RegressionModel support column > setters/getters/predict > -- > > Key: SPARK-28985 > URL: https://issues.apache.org/jira/browse/SPARK-28985 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Minor > > 1, add common abstract classes like JavaClassificationModel & > JavaProbabilisticClassificationModel > 2, add column setters/getters, and predict method > 3, update the test suites to verify newly added functions -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Affects Version/s: (was: 2.4.2) (was: 2.4.1) (was: 2.4.0) > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-14-26-765.png, > image-2019-09-11-16-14-34-963.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-14-34-963.png! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-29050: -- Issue Type: Improvement (was: Bug) Priority: Trivial (was: Major) This can't be considered a bug, or even major. I fixed it. Please read https://spark.apache.org/contributing.html > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Trivial > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-27492. --- Fix Version/s: 3.0.0 Resolution: Fixed > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.0.0 > > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling
[ https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927558#comment-16927558 ] Thomas Graves commented on SPARK-27495: --- [~felixcheung] [~jiangxb1987] I put this up for vote on the dev mailing list. Could you please take a look and comment there? > SPIP: Support Stage level resource configuration and scheduling > --- > > Key: SPARK-27495 > URL: https://issues.apache.org/jira/browse/SPARK-27495 > Project: Spark > Issue Type: Epic > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > > *Q1.* What are you trying to do? Articulate your objectives using absolutely > no jargon. > Objectives: > # Allow users to specify task and executor resource requirements at the > stage level. > # Spark will use the stage level requirements to acquire the necessary > resources/executors and schedule tasks based on the per stage requirements. > Many times users have different resource requirements for different stages of > their application so they want to be able to configure resources at the stage > level. For instance, you have a single job that has 2 stages. The first stage > does some ETL which requires a lot of tasks, each with a small amount of > memory and 1 core each. Then you have a second stage where you feed that ETL > data into an ML algorithm. The second stage only requires a few executors but > each executor needs a lot of memory, GPUs, and many cores. This feature > allows the user to specify the task and executor resource requirements for > the ETL Stage and then change them for the ML stage of the job. > Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and > extra Resources (GPU/FPGA/etc). It has the potential to allow for other > things like limiting the number of tasks per stage, specifying other > parameters for things like shuffle, etc. Initially I would propose we only > support resources as they are now. So Task resources would be cpu and other > resources (GPU, FPGA), that way we aren't adding in extra scheduling things > at this point. Executor resources would be cpu, memory, and extra > resources(GPU,FPGA, etc). Changing the executor resources will rely on > dynamic allocation being enabled. > Main use cases: > # ML use case where user does ETL and feeds it into an ML algorithm where > it’s using the RDD API. This should work with barrier scheduling as well once > it supports dynamic allocation. > # This adds the framework/api for Spark's own internal use. In the future > (not covered by this SPIP), Catalyst could control the stage level resources > as it finds the need to change it between stages for different optimizations. > For instance, with the new columnar plugin to the query planner we can insert > stages into the plan that would change running something on the CPU in row > format to running it on the GPU in columnar format. This API would allow the > planner to make sure the stages that run on the GPU get the corresponding GPU > resources it needs to run. Another possible use case for catalyst is that it > would allow catalyst to add in more optimizations to where the user doesn’t > need to configure container sizes at all. If the optimizer/planner can handle > that for the user, everyone wins. > This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I > think the DataSet API will require more changes because it specifically hides > the RDD from the users via the plans and catalyst can optimize the plan and > insert things into the plan. The only way I’ve found to make this work with > the Dataset API would be modifying all the plans to be able to get the > resource requirements down into where it creates the RDDs, which I believe > would be a lot of change. If other people know better options, it would be > great to hear them. > *Q2.* What problem is this proposal NOT designed to solve? > The initial implementation is not going to add Dataset APIs. > We are starting with allowing users to specify a specific set of > task/executor resources and plan to design it to be extendable, but the first > implementation will not support changing generic SparkConf configs and only > specific limited resources. > This initial version will have a programmatic API for specifying the resource > requirements per stage, we can add the ability to perhaps have profiles in > the configs later if its useful. > *Q3.* How is it done today, and what are the limits of current practice? > Currently this is either done by having multiple spark jobs or requesting > containers with the max resources needed for any part of the job. To do this > today, you can break it into
[jira] [Updated] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only
[ https://issues.apache.org/jira/browse/SPARK-28987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-28987: -- Priority: Minor (was: Major) > DiskBlockManager#createTempShuffleBlock should skip directory which is > read-only > > > Key: SPARK-28987 > URL: https://issues.apache.org/jira/browse/SPARK-28987 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > DiskBlockManager#createTempShuffleBlock only considers the path which is not > exist. I think we could check whether the path is writeable or not. It's > resonable beacuse we invoke createTempShuffleBlock to create a new path to > write files in it. It should be writeable. > stack: > {code:java} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task > 1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor > 251): java.io.FileNotFoundException: > /home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74 > (No such file or directory) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:100) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28987) DiskBlockManager#createTempShuffleBlock should skip directory which is read-only
[ https://issues.apache.org/jira/browse/SPARK-28987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28987. --- Resolution: Won't Fix > DiskBlockManager#createTempShuffleBlock should skip directory which is > read-only > > > Key: SPARK-28987 > URL: https://issues.apache.org/jira/browse/SPARK-28987 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Minor > > DiskBlockManager#createTempShuffleBlock only considers the path which is not > exist. I think we could check whether the path is writeable or not. It's > resonable beacuse we invoke createTempShuffleBlock to create a new path to > write files in it. It should be writeable. > stack: > {code:java} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 1765 in stage 368592.0 failed 4 times, most recent failure: Lost task > 1765.3 in stage 368592.0 (TID 66021932, test-hadoop-prc-st2808.bj, executor > 251): java.io.FileNotFoundException: > /home/work/hdd6/yarn/test-hadoop/nodemanager/usercache/sql_test/appcache/application_1560996968289_16320/blockmgr-14608b48-7efd-4fd3-b050-2ac9953390d4/1e/temp_shuffle_00c7b87f-d7ed-49f3-90e7-1c8358bcfd74 > (No such file or directory) > at java.io.FileOutputStream.open0(Native Method) > at java.io.FileOutputStream.open(FileOutputStream.java:270) > at java.io.FileOutputStream.(FileOutputStream.java:213) > at > org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:139) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:150) > at > org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:268) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:159) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:100) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1515) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1503) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1502) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1502) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:816) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:816) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1740) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1695) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1684) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: image-2019-09-11-16-14-34-963.png > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-14-26-765.png, > image-2019-09-11-16-14-34-963.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-32-650.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: image-2019-09-11-16-14-26-765.png > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-14-26-765.png, > image-2019-09-11-16-14-34-963.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-32-650.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Description: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-14-34-963.png! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} was: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-13-32-650.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-14-26-765.png, > image-2019-09-11-16-14-34-963.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-14-34-963.png! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: (was: image-2019-09-11-16-13-20-588.png) > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-14-26-765.png, > image-2019-09-11-16-14-34-963.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-32-650.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Description: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-13-20-588.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} was: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-09-06-720.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-13-20-588.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-20-588.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Description: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-13-32-650.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} was: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-13-20-588.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-13-20-588.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-32-650.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28906) `bin/spark-submit --version` shows incorrect info
[ https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-28906. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 25655 [https://github.com/apache/spark/pull/25655] > `bin/spark-submit --version` shows incorrect info > - > > Key: SPARK-28906 > URL: https://issues.apache.org/jira/browse/SPARK-28906 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, > 2.4.4, 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Kazuaki Ishizaki >Priority: Minor > Fix For: 2.4.5, 3.0.0 > > Attachments: image-2019-08-29-05-50-13-526.png > > > Since Spark 2.3.1, `spark-submit` shows a wrong information. > {code} > $ bin/spark-submit --version > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.3.3 > /_/ > Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222 > Branch > Compiled by user on 2019-02-04T13:00:46Z > Revision > Url > Type --help for more information. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: image-2019-09-11-16-13-20-588.png > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-13-20-588.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-09-06-720.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: (was: image-2019-09-11-16-13-32-650.png) > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-13-20-588.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-32-650.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29055) Memory leak in Spark Driver
[ https://issues.apache.org/jira/browse/SPARK-29055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] George Papa updated SPARK-29055: Attachment: image-2019-09-11-16-13-32-650.png > Memory leak in Spark Driver > --- > > Key: SPARK-29055 > URL: https://issues.apache.org/jira/browse/SPARK-29055 > Project: Spark > Issue Type: Bug > Components: Block Manager, Spark Core >Affects Versions: 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: George Papa >Priority: Major > Attachments: image-2019-09-11-16-13-20-588.png > > > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > In Spark 2.3.3+ the driver memory is increasing continuously. I don't have > this issue with Spark 2.1.1. > In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and > BlockManager removes the broadcast blocks from the memory, as you can see in > the following screenshot: > !image-2019-09-11-16-13-20-588.png|width=685,height=89! > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > > But in Spark 2.3.3+ I don't see this cleaning and the driver storage > increases!! > *NOTE:* After few hours of use I have application interruption with the > following error : > {color:#ff}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28906) `bin/spark-submit --version` shows incorrect info
[ https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-28906: - Assignee: Kazuaki Ishizaki > `bin/spark-submit --version` shows incorrect info > - > > Key: SPARK-28906 > URL: https://issues.apache.org/jira/browse/SPARK-28906 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, > 2.4.4, 3.0.0 >Reporter: Marcelo Vanzin >Assignee: Kazuaki Ishizaki >Priority: Minor > Attachments: image-2019-08-29-05-50-13-526.png > > > Since Spark 2.3.1, `spark-submit` shows a wrong information. > {code} > $ bin/spark-submit --version > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.3.3 > /_/ > Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222 > Branch > Compiled by user on 2019-02-04T13:00:46Z > Revision > Url > Type --help for more information. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29055) Memory leak in Spark Driver
George Papa created SPARK-29055: --- Summary: Memory leak in Spark Driver Key: SPARK-29055 URL: https://issues.apache.org/jira/browse/SPARK-29055 Project: Spark Issue Type: Bug Components: Block Manager, Spark Core Affects Versions: 2.4.4, 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.3 Reporter: George Papa Attachments: image-2019-09-11-16-13-20-588.png In Spark 2.3.3+ the driver memory is increasing continuously. I don't have this issue with Spark 2.1.1. In Spark 2.1.1 I see the ContextCleaner runs and cleans the driver and BlockManager removes the broadcast blocks from the memory, as you can see in the following screenshot: !image-2019-09-11-16-09-06-720.png|width=685,height=89! But in Spark 2.3.3+ I don't see this cleaning and the driver storage increases!! *NOTE:* After few hours of use I have application interruption with the following error : {color:#FF}java.lang.OutOfMemoryError: GC overhead limit exceeded{color} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler
[ https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927533#comment-16927533 ] Jungtaek Lim commented on SPARK-29043: -- 5+! I'm very surprised to hear that, as it means 5+ of files are stored in same directory and being listed via SHS, and 5+ of UI objects are loaded and rendered in SHS (one JVM). I'd be appreciated if you can review design doc for SPARK-28594 to see whether it helps your case, and participate code review. Thanks! > [History Server]Only one replay thread of FsHistoryProvider work because of > straggler > - > > Key: SPARK-29043 > URL: https://issues.apache.org/jira/browse/SPARK-29043 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: image-2019-09-11-15-09-22-912.png, > image-2019-09-11-15-10-25-326.png, screenshot-1.png > > > As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for > spark history server. > However, there is only one replay thread work because of straggler. > Let's check the code. > https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547 > There is a synchronous operation for all replay tasks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29054) Invalidate Kafka consumer when new delegation token available
Gabor Somogyi created SPARK-29054: - Summary: Invalidate Kafka consumer when new delegation token available Key: SPARK-29054 URL: https://issues.apache.org/jira/browse/SPARK-29054 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Gabor Somogyi Kafka consumers are cached. If delegation token is used and the token is expired, then exception is thrown. Such case new consumer is created in a Task retry with the latest delegation token. This can be enhanced by detecting the existence of a new delegation token. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28985) Pyspark ClassificationModel and RegressionModel support column setters/getters/predict
[ https://issues.apache.org/jira/browse/SPARK-28985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927498#comment-16927498 ] zhengruifeng commented on SPARK-28985: -- [~huaxingao] You can refer to my old prs [https://github.com/apache/spark/pull/16171] and [https://github.com/apache/spark/pull/25662] if you want to take it over. Thanks! > Pyspark ClassificationModel and RegressionModel support column > setters/getters/predict > -- > > Key: SPARK-28985 > URL: https://issues.apache.org/jira/browse/SPARK-28985 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Priority: Minor > > 1, add common abstract classes like JavaClassificationModel & > JavaProbabilisticClassificationModel > 2, add column setters/getters, and predict method > 3, update the test suites to verify newly added functions -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field
[ https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29053: - Description: Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for *Duration* field. *Test Steps* 1.Install spark 2.Start Spark beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI JDBC/ODBC Server TAB. 7.Try sorting based on each filed USer/IP/Session ID/Finish Time/DUration/Total execute *Issue:* *Sorting[ascending or descending]* based on *Duration* is not proper in *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like sorting is based on string/number only instead of proper days/weeks/hours .. Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check it. !Sort Icon.png! was: Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for *Duration* field. *Test Steps* 1.Install spark 2.Start Spark beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI JDBC/ODBC Server TAB. 7.Try sorting based on each filed USer/IP/Session ID/Finish Time/DUration/Total execute *Issue:* *Sorting[ascending or descending]* based on *Duration* is not proper in *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like sorting is based on string/number only instead of proper days/weeks/hours .. Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check it. > Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for > Duration field > --- > > Key: SPARK-29053 > URL: https://issues.apache.org/jira/browse/SPARK-29053 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3 >Reporter: jobit mathew >Priority: Minor > Attachments: Sort Icon.png > > > Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for > *Duration* field. > *Test Steps* > 1.Install spark > 2.Start Spark beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI JDBC/ODBC Server TAB. > 7.Try sorting based on each filed USer/IP/Session ID/Finish > Time/DUration/Total execute > *Issue:* > *Sorting[ascending or descending]* based on *Duration* is not proper in > *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like > sorting is based on string/number only instead of proper days/weeks/hours .. > Issue there in *Session Statistics* & *SQL Statistics* sessions .Please > check it. > !Sort Icon.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field
[ https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29053: - Attachment: Sort Icon.png > Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for > Duration field > --- > > Key: SPARK-29053 > URL: https://issues.apache.org/jira/browse/SPARK-29053 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3 >Reporter: jobit mathew >Priority: Minor > Attachments: Sort Icon.png > > > Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for > *Duration* field. > *Test Steps* > 1.Install spark > 2.Start Spark beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI JDBC/ODBC Server TAB. > 7.Try sorting based on each filed USer/IP/Session ID/Finish > Time/DUration/Total execute > *Issue:* > *Sorting[ascending or descending]* based on *Duration* is not proper in > *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like > sorting is based on string/number only instead of proper days/weeks/hours .. > Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check > it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field
[ https://issues.apache.org/jira/browse/SPARK-29053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927467#comment-16927467 ] Rakesh Raushan commented on SPARK-29053: I will work on this one. > Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for > Duration field > --- > > Key: SPARK-29053 > URL: https://issues.apache.org/jira/browse/SPARK-29053 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3 >Reporter: jobit mathew >Priority: Minor > > Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for > *Duration* field. > *Test Steps* > 1.Install spark > 2.Start Spark beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI JDBC/ODBC Server TAB. > 7.Try sorting based on each filed USer/IP/Session ID/Finish > Time/DUration/Total execute > *Issue:* > *Sorting[ascending or descending]* based on *Duration* is not proper in > *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like > sorting is based on string/number only instead of proper days/weeks/hours .. > Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check > it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29053) Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field
jobit mathew created SPARK-29053: Summary: Spark Thrift JDBC/ODBC Server application UI, Sorting is not working for Duration field Key: SPARK-29053 URL: https://issues.apache.org/jira/browse/SPARK-29053 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.3 Reporter: jobit mathew Spark Thrift JDBC/ODBC Server application UI, *Sorting* is not working for *Duration* field. *Test Steps* 1.Install spark 2.Start Spark beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI JDBC/ODBC Server TAB. 7.Try sorting based on each filed USer/IP/Session ID/Finish Time/DUration/Total execute *Issue:* *Sorting[ascending or descending]* based on *Duration* is not proper in *JDBC/ODBC Server TAB*.[It is working in some tab -SQL tab is OK].Looks like sorting is based on string/number only instead of proper days/weeks/hours .. Issue there in *Session Statistics* & *SQL Statistics* sessions .Please check it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927374#comment-16927374 ] Lantao Jin edited comment on SPARK-29038 at 9/11/19 10:12 AM: -- [~smilegator], materialized view is not ANSI SQL. https://en.wikipedia.org/wiki/Materialized_view Our implementation refers CTAS syntax in Spark. was (Author: cltlfcjin): [~smilegator] Sure, we will totally fellow ANSI SQL when commit although it contains some unstandard ones in our internal version. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29052) Create a Migration Guide tap in Spark documentation
Hyukjin Kwon created SPARK-29052: Summary: Create a Migration Guide tap in Spark documentation Key: SPARK-29052 URL: https://issues.apache.org/jira/browse/SPARK-29052 Project: Spark Issue Type: Documentation Components: Documentation, ML, PySpark, Spark Core, SparkR, SQL, Structured Streaming Affects Versions: 3.0.0 Reporter: Hyukjin Kwon Currently, there is no migration sections for PySpark, SparkCore and Structured Streaming. It is difficult for users to know what to do when they upgrade. It would be great if we create a migration tap and put related migration notes together. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29051: - Description: Spark Application UI *Search is not working* for some fields in *Spark Web UI Executors TAB* and Spark job History Server page *Test Steps* 1.Install spark 2.Start Spark SQL/Shell/beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI Executors TAB and verify search 6.Check Spark job History Server page and verify search *Issue 1* Searching of some field contents are not working in *Spark Web UI Executors TAB*(Spark SQL/Shell/JDBC server UIs ). • *Input column*(search working wrongly .Example if input is 34.5KB,searching of 34.5 won't take ,but 345 shows the search result -it is wrong) • Task time search is Ok, but *GC time* search not working • *Thread Dump* -search not working [have to confirm it is required to add in search, but we are able to search stdout text in that case Thread Dump text also should be searchable ] • *Storage memory* example 384.1 search not searching. !Search Missing.png! *Issue 2:* *Spark job History Server page*,completed tasks- search is not working based on *Duration column values*. We are getting the proper search result, if we search the content from any other columns except Duration.*For example if Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. !Duration Search.png! !Duration Search1.png! was: Spark Application UI *Search is not working* for some fields in *Spark Web UI Executors TAB* and Spark job History Server page *Test Steps* 1.Install spark 2.Start Spark SQL/Shell/beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI Executors TAB and verify search 6.Check Spark job History Server page and verify search *Issue 1* Searching of some field contents are not working in *Spark Web UI Executors TAB*(Spark SQL/Shell/JDBC server UIs ). • *Input column*(search working wrongly .Example if input is 34.5KB,searching of 34.5 won't take ,but 345 shows the search result -it is wrong) • Task time search is Ok, but *GC time* search not working • *Thread Dump* -search not working [have to confirm it is required to add in search, but we are able to search stdout text in that case Thread Dump text also should be searchable ] • *Storage memory* example 384.1 search not searching. *Issue 2:* *Spark job History Server page*,completed tasks- search is not working based on *Duration column values*. We are getting the proper search result, if we search the content from any other columns except Duration.*For example if Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > Attachments: Duration Search.png, Duration Search1.png, Search > Missing.png, Search Missing.png > > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > !Search Missing.png! > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > !Duration Search.png! > !Duration Search1.png! -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29051: - Attachment: Duration Search1.png > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > Attachments: Duration Search.png, Duration Search1.png, Search > Missing.png, Search Missing.png > > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29051: - Attachment: Duration Search.png > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > Attachments: Duration Search.png, Search Missing.png, Search > Missing.png > > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29051: - Attachment: Search Missing.png > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > Attachments: Duration Search.png, Search Missing.png, Search > Missing.png > > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-29051: - Attachment: Search Missing.png > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > Attachments: Search Missing.png > > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28882) Memory leak when stopping spark session
[ https://issues.apache.org/jira/browse/SPARK-28882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Łukasz Pińkowski updated SPARK-28882: - Description: When calling stop() method on spark session underlying SparkContext is being stopped. It causes also stop of underlying ContextCleaner thread, usually before it is able to clean all context objects (not all of them are returned to ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is never collected by GC. There should be at least comment in documentation that calling stop() method on session or context may lead to memory leaks. was: When calling stop() method on spark session underlying SparkContext is being stopped. It causes also stop of underlying ContextCleaner thread, usually before it is able to clean all context objects (not all of them are returned to ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is never collected by GC. > Memory leak when stopping spark session > --- > > Key: SPARK-28882 > URL: https://issues.apache.org/jira/browse/SPARK-28882 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Łukasz Pińkowski >Priority: Major > > When calling stop() method on spark session underlying SparkContext is being > stopped. > It causes also stop of underlying ContextCleaner thread, usually before it is > able to clean all context objects (not all of them are returned to > ReferenceQueue by GC). It causes memory leak because this ReferenceQueue is > never collected by GC. > > There should be at least comment in documentation that calling stop() method > on session or context may lead to memory leaks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29051) Spark Application UI search is not working for some fields
[ https://issues.apache.org/jira/browse/SPARK-29051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927434#comment-16927434 ] Aman Omer commented on SPARK-29051: --- I would like to handle this. > Spark Application UI search is not working for some fields > -- > > Key: SPARK-29051 > URL: https://issues.apache.org/jira/browse/SPARK-29051 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.3, 2.4.4 >Reporter: jobit mathew >Priority: Minor > > Spark Application UI *Search is not working* for some fields in *Spark Web UI > Executors TAB* and Spark job History Server page > *Test Steps* > 1.Install spark > 2.Start Spark SQL/Shell/beeline > 3.Submit some SQL queries > 4.Close some spark applications > 5.Check the Spark Web UI Executors TAB and verify search > 6.Check Spark job History Server page and verify search > *Issue 1* > Searching of some field contents are not working in *Spark Web UI Executors > TAB*(Spark SQL/Shell/JDBC server UIs ). > • *Input column*(search working wrongly .Example if input is 34.5KB,searching > of 34.5 won't take ,but 345 shows the search result -it is wrong) > • Task time search is Ok, but *GC time* search not working > • *Thread Dump* -search not working [have to confirm it is required to add > in search, but we are able to search stdout text in that case Thread Dump > text also should be searchable ] > • *Storage memory* example 384.1 search not searching. > *Issue 2:* > *Spark job History Server page*,completed tasks- search is not working based > on *Duration column values*. We are getting the proper search result, if we > search the content from any other columns except Duration.*For example if > Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29051) Spark Application UI search is not working for some fields
jobit mathew created SPARK-29051: Summary: Spark Application UI search is not working for some fields Key: SPARK-29051 URL: https://issues.apache.org/jira/browse/SPARK-29051 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.4, 2.4.3 Reporter: jobit mathew Spark Application UI *Search is not working* for some fields in *Spark Web UI Executors TAB* and Spark job History Server page *Test Steps* 1.Install spark 2.Start Spark SQL/Shell/beeline 3.Submit some SQL queries 4.Close some spark applications 5.Check the Spark Web UI Executors TAB and verify search 6.Check Spark job History Server page and verify search *Issue 1* Searching of some field contents are not working in *Spark Web UI Executors TAB*(Spark SQL/Shell/JDBC server UIs ). • *Input column*(search working wrongly .Example if input is 34.5KB,searching of 34.5 won't take ,but 345 shows the search result -it is wrong) • Task time search is Ok, but *GC time* search not working • *Thread Dump* -search not working [have to confirm it is required to add in search, but we are able to search stdout text in that case Thread Dump text also should be searchable ] • *Storage memory* example 384.1 search not searching. *Issue 2:* *Spark job History Server page*,completed tasks- search is not working based on *Duration column values*. We are getting the proper search result, if we search the content from any other columns except Duration.*For example if Duration is 6.1 min* we can not search result for 6.1 min or even 6.1. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927421#comment-16927421 ] dengziming edited comment on SPARK-29050 at 9/11/19 9:17 AM: - Hi, I have already do this. [https://github.com/apache/spark/pull/25756] was (Author: dengziming): [https://github.com/apache/spark/pull/25756] > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Major > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29050) Fix typo in some docs
[ https://issues.apache.org/jira/browse/SPARK-29050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927421#comment-16927421 ] dengziming commented on SPARK-29050: [https://github.com/apache/spark/pull/25756] > Fix typo in some docs > - > > Key: SPARK-29050 > URL: https://issues.apache.org/jira/browse/SPARK-29050 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.3.3, 2.4.3, 3.0.0 >Reporter: dengziming >Priority: Major > > 'a hdfs' change into 'an hdfs' > 'an unique' change into 'a unique' > 'an url' change into 'a url' > 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29050) Fix typo in some docs
dengziming created SPARK-29050: -- Summary: Fix typo in some docs Key: SPARK-29050 URL: https://issues.apache.org/jira/browse/SPARK-29050 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.4.3, 2.3.3, 3.0.0 Reporter: dengziming 'a hdfs' change into 'an hdfs' 'an unique' change into 'a unique' 'an url' change into 'a url' 'a error' change into 'an error' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler
[ https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927402#comment-16927402 ] feiwang edited comment on SPARK-29043 at 9/11/19 8:41 AM: -- [~kabhwan] * How long "spark.history.fs.update.interval" has been set?20s * How many applications are reloaded per each call of checkForLogs? 5+ * How big the event log for each application is?there maybe many large logs. I think SPARK-28594 is more helpful for our case. was (Author: hzfeiwang): * How long "spark.history.fs.update.interval" has been set?20s * How many applications are reloaded per each call of checkForLogs? 5+ * How big the event log for each application is?there maybe many large logs. I think SPARK-28594 is more helpful for our case. > [History Server]Only one replay thread of FsHistoryProvider work because of > straggler > - > > Key: SPARK-29043 > URL: https://issues.apache.org/jira/browse/SPARK-29043 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: image-2019-09-11-15-09-22-912.png, > image-2019-09-11-15-10-25-326.png, screenshot-1.png > > > As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for > spark history server. > However, there is only one replay thread work because of straggler. > Let's check the code. > https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547 > There is a synchronous operation for all replay tasks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler
[ https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927402#comment-16927402 ] feiwang commented on SPARK-29043: - * How long "spark.history.fs.update.interval" has been set?20s * How many applications are reloaded per each call of checkForLogs? 5+ * How big the event log for each application is?there maybe many large logs. I think SPARK-28594 is more helpful for our case. > [History Server]Only one replay thread of FsHistoryProvider work because of > straggler > - > > Key: SPARK-29043 > URL: https://issues.apache.org/jira/browse/SPARK-29043 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: feiwang >Priority: Major > Attachments: image-2019-09-11-15-09-22-912.png, > image-2019-09-11-15-10-25-326.png, screenshot-1.png > > > As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for > spark history server. > However, there is only one replay thread work because of straggler. > Let's check the code. > https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547 > There is a synchronous operation for all replay tasks. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames
Xianyin Xin created SPARK-29049: --- Summary: Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames Key: SPARK-29049 URL: https://issues.apache.org/jira/browse/SPARK-29049 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Xianyin Xin -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29038) SPIP: Support Spark Materialized View
[ https://issues.apache.org/jira/browse/SPARK-29038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927374#comment-16927374 ] Lantao Jin commented on SPARK-29038: [~smilegator] Sure, we will totally fellow ANSI SQL when commit although it contains some unstandard ones in our internal version. > SPIP: Support Spark Materialized View > - > > Key: SPARK-29038 > URL: https://issues.apache.org/jira/browse/SPARK-29038 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Lantao Jin >Priority: Major > > Materialized view is an important approach in DBMS to cache data to > accelerate queries. By creating a materialized view through SQL, the data > that can be cached is very flexible, and needs to be configured arbitrarily > according to specific usage scenarios. The Materialization Manager > automatically updates the cache data according to changes in detail source > tables, simplifying user work. When user submit query, Spark optimizer > rewrites the execution plan based on the available materialized view to > determine the optimal execution plan. > Details in [design > doc|https://docs.google.com/document/d/1q5pjSWoTNVc9zsAfbNzJ-guHyVwPsEroIEP8Cca179A/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
[ https://issues.apache.org/jira/browse/SPARK-29048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-29048: -- Assignee: Weichen Xu > Query optimizer slow when using Column.isInCollection() with a large size > collection > > > Key: SPARK-29048 > URL: https://issues.apache.org/jira/browse/SPARK-29048 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Query optimizer slow when using Column.isInCollection() with a large size > collection. > The query optimizer takes a long time to do its thing and on the UI all I see > is "Running commands". This can take from 10s of minutes to 11 hours > depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29048) Query optimizer slow when using Column.isInCollection() with a large size collection
Weichen Xu created SPARK-29048: -- Summary: Query optimizer slow when using Column.isInCollection() with a large size collection Key: SPARK-29048 URL: https://issues.apache.org/jira/browse/SPARK-29048 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Weichen Xu Query optimizer slow when using Column.isInCollection() with a large size collection. The query optimizer takes a long time to do its thing and on the UI all I see is "Running commands". This can take from 10s of minutes to 11 hours depending on how many values there are. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org