[jira] [Updated] (SPARK-30211) Use python3 in make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-30211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30211: -- Summary: Use python3 in make-distribution.sh (was: Update python version in make-distribution.sh) > Use python3 in make-distribution.sh > --- > > Key: SPARK-30211 > URL: https://issues.apache.org/jira/browse/SPARK-30211 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30211) Update python version in make-distribution.sh
[ https://issues.apache.org/jira/browse/SPARK-30211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30211. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26844 [https://github.com/apache/spark/pull/26844] > Update python version in make-distribution.sh > - > > Key: SPARK-30211 > URL: https://issues.apache.org/jira/browse/SPARK-30211 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30214) Support COMMENT ON syntax
Kent Yao created SPARK-30214: Summary: Support COMMENT ON syntax Key: SPARK-30214 URL: https://issues.apache.org/jira/browse/SPARK-30214 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Kent Yao https://prestosql.io/docs/current/sql/comment.html https://www.postgresql.org/docs/12/sql-comment.html We are going to disable setting reserved properties by dbproperties or tblproperites directory, which need a subclause in create syntax or specific alter commands -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30213) Remove the mutable status in QueryStage when enable AQE
Ke Jia created SPARK-30213: -- Summary: Remove the mutable status in QueryStage when enable AQE Key: SPARK-30213 URL: https://issues.apache.org/jira/browse/SPARK-30213 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Ke Jia Currently ShuffleQueryStageExec contain the mutable status, eg mapOutputStatisticsFuture variable. So It is not easy to pass when we copy ShuffleQueryStageExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC
[ https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993231#comment-16993231 ] Rinaz Belhaj commented on SPARK-19335: -- +1 This feature would be very useful. Any updates on this ? > Spark should support doing an efficient DataFrame Upsert via JDBC > - > > Key: SPARK-19335 > URL: https://issues.apache.org/jira/browse/SPARK-19335 > Project: Spark > Issue Type: Improvement >Reporter: Ilya Ganelin >Priority: Minor > > Doing a database update, as opposed to an insert is useful, particularly when > working with streaming applications which may require revisions to previously > stored data. > Spark DataFrames/DataSets do not currently support an Update feature via the > JDBC Writer allowing only Overwrite or Append. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30212) COUNT(DISTINCT) window function should be supported
[ https://issues.apache.org/jira/browse/SPARK-30212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kernel Force updated SPARK-30212: - Summary: COUNT(DISTINCT) window function should be supported (was: Could not use COUNT(DISTINCT) window function in SparkSQL) > COUNT(DISTINCT) window function should be supported > --- > > Key: SPARK-30212 > URL: https://issues.apache.org/jira/browse/SPARK-30212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Spark 2.4.4 > Scala 2.11.12 > Hive 2.3.6 >Reporter: Kernel Force >Priority: Major > Labels: SQL, distinct, window_function > > Suppose we have a typical table in Hive like below: > {code:sql} > CREATE TABLE DEMO_COUNT_DISTINCT ( > demo_date string, > demo_id string > ); > {code} > {noformat} > ++--+ > | demo_count_distinct.demo_date | demo_count_distinct.demo_id | > ++--+ > | 20180301 | 101 | > | 20180301 | 102 | > | 20180301 | 103 | > | 20180401 | 201 | > | 20180401 | 202 | > ++--+ > {noformat} > Now I want to count distinct number of DEMO_DATE but also reserve every > columns' data in each row. > So I use COUNT(DISTINCT) window function like below in Hive beeline and it > work: > {code:sql} > SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T; > {code} > {noformat} > +--++-+ > | t.demo_date | t.demo_id | uniq_dates | > +--++-+ > | 20180401 | 202 | 2 | > | 20180401 | 201 | 2 | > | 20180301 | 103 | 2 | > | 20180301 | 102 | 2 | > | 20180301 | 101 | 2 | > +--++-+ > {noformat} > But when I came to SparkSQL, it threw exception even if I run the same SQL. > {code:sql} > spark.sql(""" > SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T > """).show > {code} > {noformat} > org.apache.spark.sql.AnalysisException: Distinct window functions are not > supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, > specifiedwindowframe(RowFrame, unboundedpreceding$(), > unboundedfollowing$()));; > Project [demo_date#1, demo_id#2, UNIQ_DATES#0L] > +- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L] > +- Window [count(distinct DEMO_DATE#1) windowspecdefinition(null, > specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) > AS UNIQ_DATES#0L], [null] > +- Project [demo_date#1, demo_id#2] > +- SubqueryAlias `T` > +- SubqueryAlias `default`.`demo_count_distinct` > +- HiveTableRelation `default`.`demo_count_distinct`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [demo_date#1, demo_id#2] > {noformat} > Then I try to use countDistinct function but also got exceptions. > {code:sql} > spark.sql(""" > SELECT T.*, countDistinct(T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T > """).show > {code} > {noformat} > org.apache.spark.sql.AnalysisException: Undefined function: 'countDistinct'. > This function is neither a registered temporary function nor a permanent > function registered in the database 'default'.; line 2 pos 12 > at > org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) > .. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30212) Could not use COUNT(DISTINCT) window function in SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-30212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kernel Force updated SPARK-30212: - Labels: SQL distinct window_function (was: ) > Could not use COUNT(DISTINCT) window function in SparkSQL > - > > Key: SPARK-30212 > URL: https://issues.apache.org/jira/browse/SPARK-30212 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 > Environment: Spark 2.4.4 > Scala 2.11.12 > Hive 2.3.6 >Reporter: Kernel Force >Priority: Major > Labels: SQL, distinct, window_function > > Suppose we have a typical table in Hive like below: > {code:sql} > CREATE TABLE DEMO_COUNT_DISTINCT ( > demo_date string, > demo_id string > ); > {code} > {noformat} > ++--+ > | demo_count_distinct.demo_date | demo_count_distinct.demo_id | > ++--+ > | 20180301 | 101 | > | 20180301 | 102 | > | 20180301 | 103 | > | 20180401 | 201 | > | 20180401 | 202 | > ++--+ > {noformat} > Now I want to count distinct number of DEMO_DATE but also reserve every > columns' data in each row. > So I use COUNT(DISTINCT) window function like below in Hive beeline and it > work: > {code:sql} > SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T; > {code} > {noformat} > +--++-+ > | t.demo_date | t.demo_id | uniq_dates | > +--++-+ > | 20180401 | 202 | 2 | > | 20180401 | 201 | 2 | > | 20180301 | 103 | 2 | > | 20180301 | 102 | 2 | > | 20180301 | 101 | 2 | > +--++-+ > {noformat} > But when I came to SparkSQL, it threw exception even if I run the same SQL. > {code:sql} > spark.sql(""" > SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T > """).show > {code} > {noformat} > org.apache.spark.sql.AnalysisException: Distinct window functions are not > supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, > specifiedwindowframe(RowFrame, unboundedpreceding$(), > unboundedfollowing$()));; > Project [demo_date#1, demo_id#2, UNIQ_DATES#0L] > +- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L] > +- Window [count(distinct DEMO_DATE#1) windowspecdefinition(null, > specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) > AS UNIQ_DATES#0L], [null] > +- Project [demo_date#1, demo_id#2] > +- SubqueryAlias `T` > +- SubqueryAlias `default`.`demo_count_distinct` > +- HiveTableRelation `default`.`demo_count_distinct`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [demo_date#1, demo_id#2] > {noformat} > Then I try to use countDistinct function but also got exceptions. > {code:sql} > spark.sql(""" > SELECT T.*, countDistinct(T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES > FROM DEMO_COUNT_DISTINCT T > """).show > {code} > {noformat} > org.apache.spark.sql.AnalysisException: Undefined function: 'countDistinct'. > This function is neither a registered temporary function nor a permanent > function registered in the database 'default'.; line 2 pos 12 > at > org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) > at > org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) > .. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30212) Could not use COUNT(DISTINCT) window function in SparkSQL
Dilly King created SPARK-30212: -- Summary: Could not use COUNT(DISTINCT) window function in SparkSQL Key: SPARK-30212 URL: https://issues.apache.org/jira/browse/SPARK-30212 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Environment: Spark 2.4.4 Scala 2.11.12 Hive 2.3.6 Reporter: Dilly King Suppose we have a typical table in Hive like below: {code:sql} CREATE TABLE DEMO_COUNT_DISTINCT ( demo_date string, demo_id string ); {code} {noformat} ++--+ | demo_count_distinct.demo_date | demo_count_distinct.demo_id | ++--+ | 20180301 | 101 | | 20180301 | 102 | | 20180301 | 103 | | 20180401 | 201 | | 20180401 | 202 | ++--+ {noformat} Now I want to count distinct number of DEMO_DATE but also reserve every columns' data in each row. So I use COUNT(DISTINCT) window function like below in Hive beeline and it work: {code:sql} SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T; {code} {noformat} +--++-+ | t.demo_date | t.demo_id | uniq_dates | +--++-+ | 20180401 | 202 | 2 | | 20180401 | 201 | 2 | | 20180301 | 103 | 2 | | 20180301 | 102 | 2 | | 20180301 | 101 | 2 | +--++-+ {noformat} But when I came to SparkSQL, it threw exception even if I run the same SQL. {code:sql} spark.sql(""" SELECT T.*, COUNT(DISTINCT T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T """).show {code} {noformat} org.apache.spark.sql.AnalysisException: Distinct window functions are not supported: count(distinct DEMO_DATE#1) windowspecdefinition(null, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$()));; Project [demo_date#1, demo_id#2, UNIQ_DATES#0L] +- Project [demo_date#1, demo_id#2, UNIQ_DATES#0L, UNIQ_DATES#0L] +- Window [count(distinct DEMO_DATE#1) windowspecdefinition(null, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS UNIQ_DATES#0L], [null] +- Project [demo_date#1, demo_id#2] +- SubqueryAlias `T` +- SubqueryAlias `default`.`demo_count_distinct` +- HiveTableRelation `default`.`demo_count_distinct`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [demo_date#1, demo_id#2] {noformat} Then I try to use countDistinct function but also got exceptions. {code:sql} spark.sql(""" SELECT T.*, countDistinct(T.DEMO_DATE) OVER(PARTITION BY NULL) UNIQ_DATES FROM DEMO_COUNT_DISTINCT T """).show {code} {noformat} org.apache.spark.sql.AnalysisException: Undefined function: 'countDistinct'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 2 pos 12 at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) at org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions$$anonfun$apply$15$$anonfun$applyOrElse$49.apply(Analyzer.scala:1279) at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53) .. {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30211) Update python version in make-distribution.sh
Yuming Wang created SPARK-30211: --- Summary: Update python version in make-distribution.sh Key: SPARK-30211 URL: https://issues.apache.org/jira/browse/SPARK-30211 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.0.0 Reporter: Yuming Wang Assignee: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30204) Support for config Pod DNS for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-30204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vanderliang resolved SPARK-30204. - Resolution: Fixed > Support for config Pod DNS for Kubernetes > - > > Key: SPARK-30204 > URL: https://issues.apache.org/jira/browse/SPARK-30204 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: vanderliang >Priority: Major > > Current we can not configure the pod dns nameservers and searches when submit > a job via cli for kubernetes. However, this's a common scenarios for > hybricloud where we use public cloud compute resourses while with private > dns. > > {code:java} > //代码占位符 > apiVersion: v1 > kind: Pod > metadata: > namespace: default > name: dns-example > spec: > containers: > - name: test > image: nginx > dnsConfig: > nameservers: > - 1.2.3.4 > searches: > - ns1.svc.cluster-domain.example > - my.dns.search.suffix > options: > - name: ndots > value: "2" > - name: edns0 > {code} > As a result, we can use the following property to specify the pod dns config. > * spark.kubernetes.dnsConfig.nameservers, Comma separated list of the > Kubernetes dns nameservers for driver and executor. > * spark.kubernetes.dnsConfig.searches, Comma separated list of the > Kubernetes dns searches for driver and executor. > * spark.kubernetes.dnsConfig.options.[OptionVariableName], Add the dns > option variable specified by OptionVariableName to the Driver And Executor > process. The user can specify multiple of these to set multiple options > variables. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled
[ https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Masiero Vanzin reassigned SPARK-29152: -- Assignee: Rakesh Raushan > Spark Executor Plugin API shutdown is not proper when dynamic allocation > enabled > > > Key: SPARK-29152 > URL: https://issues.apache.org/jira/browse/SPARK-29152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0 >Reporter: jobit mathew >Assignee: Rakesh Raushan >Priority: Major > > *Issue Description* > Spark Executor Plugin API *shutdown handling is not proper*, when dynamic > allocation enabled .Plugin's shutdown method is not processed when dynamic > allocation is enabled and *executors become dead* after inactive time. > *Test Precondition* > 1. Create a plugin and make a jar named SparkExecutorplugin.jar > import org.apache.spark.ExecutorPlugin; > public class ExecutoTest1 implements ExecutorPlugin{ > public void init(){ > System.out.println("Executor Plugin Initialised."); > } > public void shutdown(){ > System.out.println("Executor plugin closed successfully."); > } > } > 2. Create the jars with the same and put it in folder /spark/examples/jars > *Test Steps* > 1. launch bin/spark-sql with dynamic allocation enabled > ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1 --jars > /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.initialExecutors=2 --conf > spark.dynamicAllocation.minExecutors=1 > 2 create a table , insert the data and select * from tablename > 3.Check the spark UI Jobs tab/SQL tab > 4. Check all Executors(executor tab will give all executors details) > application log file for Executor plugin Initialization and Shutdown messages > or operations. > Example > /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/ > stdout > 5. Wait for the executor to be dead after the inactive time and check the > same container log > 6. Kill the spark sql and check the container log for executor plugin > shutdown. > *Expect Output* > 1. Job should be success. Create table ,insert and select query should be > success. > 2.While running query All Executors log should contain the executor plugin > Init messages or operations. > "Executor Plugin Initialised. > 3.Once the executors are dead ,shutdown message should be there in log file. > “ Executor plugin closed successfully. > 4.Once the sql application closed ,shutdown message should be there in log. > “ Executor plugin closed successfully". > *Actual Output* > Shutdown message is not called when executor is dead after inactive time. > *Observation* > Without dynamic allocation Executor plugin is working fine. But after > enabling dynamic allocation,Executor shutdown is not processed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled
[ https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Masiero Vanzin resolved SPARK-29152. Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26810 [https://github.com/apache/spark/pull/26810] > Spark Executor Plugin API shutdown is not proper when dynamic allocation > enabled > > > Key: SPARK-29152 > URL: https://issues.apache.org/jira/browse/SPARK-29152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0 >Reporter: jobit mathew >Assignee: Rakesh Raushan >Priority: Major > Fix For: 3.0.0 > > > *Issue Description* > Spark Executor Plugin API *shutdown handling is not proper*, when dynamic > allocation enabled .Plugin's shutdown method is not processed when dynamic > allocation is enabled and *executors become dead* after inactive time. > *Test Precondition* > 1. Create a plugin and make a jar named SparkExecutorplugin.jar > import org.apache.spark.ExecutorPlugin; > public class ExecutoTest1 implements ExecutorPlugin{ > public void init(){ > System.out.println("Executor Plugin Initialised."); > } > public void shutdown(){ > System.out.println("Executor plugin closed successfully."); > } > } > 2. Create the jars with the same and put it in folder /spark/examples/jars > *Test Steps* > 1. launch bin/spark-sql with dynamic allocation enabled > ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1 --jars > /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.initialExecutors=2 --conf > spark.dynamicAllocation.minExecutors=1 > 2 create a table , insert the data and select * from tablename > 3.Check the spark UI Jobs tab/SQL tab > 4. Check all Executors(executor tab will give all executors details) > application log file for Executor plugin Initialization and Shutdown messages > or operations. > Example > /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/ > stdout > 5. Wait for the executor to be dead after the inactive time and check the > same container log > 6. Kill the spark sql and check the container log for executor plugin > shutdown. > *Expect Output* > 1. Job should be success. Create table ,insert and select query should be > success. > 2.While running query All Executors log should contain the executor plugin > Init messages or operations. > "Executor Plugin Initialised. > 3.Once the executors are dead ,shutdown message should be there in log file. > “ Executor plugin closed successfully. > 4.Once the sql application closed ,shutdown message should be there in log. > “ Executor plugin closed successfully". > *Actual Output* > Shutdown message is not called when executor is dead after inactive time. > *Observation* > Without dynamic allocation Executor plugin is working fine. But after > enabling dynamic allocation,Executor shutdown is not processed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30209) Display stageId, attemptId, taskId with SQL max metric in UI
[ https://issues.apache.org/jira/browse/SPARK-30209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993008#comment-16993008 ] Niranjan Artal commented on SPARK-30209: I am working on it. > Display stageId, attemptId, taskId with SQL max metric in UI > > > Key: SPARK-30209 > URL: https://issues.apache.org/jira/browse/SPARK-30209 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Niranjan Artal >Priority: Major > > It would be helpful if we could add stageId, stage attemptId and taskId for > in SQL UI for each of the max metrics values. These additional metrics help > in debugging the jobs quicker. For a given operator, it will be easy to > identify the task which is taking maximum time to complete from the Spark UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30209) Display stageId, attemptId, taskId with SQL max metric in UI
[ https://issues.apache.org/jira/browse/SPARK-30209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niranjan Artal updated SPARK-30209: --- Description: It would be helpful if we could add stageId, stage attemptId and taskId for in SQL UI for each of the max metrics values. These additional metrics help in debugging the jobs quicker. For a given operator, it will be easy to identify the task which is taking maximum time to complete from the Spark UI. (was: It would be helpful if we could add stageId, stage attemptId and taskId in SQL UI. These additional metrics help in debugging the jobs quicker. For a given operator, it will be easy to identify the task which is taking maximum time to complete from the Spark UI.) > Display stageId, attemptId, taskId with SQL max metric in UI > > > Key: SPARK-30209 > URL: https://issues.apache.org/jira/browse/SPARK-30209 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Niranjan Artal >Priority: Major > > It would be helpful if we could add stageId, stage attemptId and taskId for > in SQL UI for each of the max metrics values. These additional metrics help > in debugging the jobs quicker. For a given operator, it will be easy to > identify the task which is taking maximum time to complete from the Spark UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30210) Give more informative error for BinaryClassificationEvaluator when data with only one label is provided
Paul Anzel created SPARK-30210: -- Summary: Give more informative error for BinaryClassificationEvaluator when data with only one label is provided Key: SPARK-30210 URL: https://issues.apache.org/jira/browse/SPARK-30210 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.4.5 Environment: Pyspark on Databricks Reporter: Paul Anzel Hi all, When I was trying to do some machine learning work with pyspark I ran into a confusing error message: # Model and train/test set generated evaluator = BinaryClassificationEvaluator(labelCol=label, metricName='areaUnderROC') prediction = model.transform(test_data) auc = evaluator.evaluate(prediction) org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 21.0 failed 4 times, most recent failure: Lost task 37.3 in stage 21.0 (TID 2811, 10.139.65.48, executor 16): java.lang.ArrayIndexOutOfBoundsException After some investigation, I found that the issue was that the data I was trying to predict on only had one label represented, rather than both positive and negative labels. Easy enough to fix, but I would like to ask if we could replace this error with one that explicitly points out the issue. Would it be acceptable to have a check ahead of time on labels that ensures all labels are represented? Alternately, can we change the docs for BinaryClassificationEvaluator to explain what this error means? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30210) Give more informative error for BinaryClassificationEvaluator when data with only one label is provided
[ https://issues.apache.org/jira/browse/SPARK-30210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Anzel updated SPARK-30210: --- Description: Hi all, When I was trying to do some machine learning work with pyspark I ran into a confusing error message: {{# Model and train/test set generated...}} {{ evaluator = BinaryClassificationEvaluator(labelCol=label, metricName='areaUnderROC')}} {{ prediction = model.transform(test_data)}} {{ auc = evaluator.evaluate(prediction)}} {{org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 21.0 failed 4 times, most recent failure: Lost task 37.3 in stage 21.0 (TID 2811, 10.139.65.48, executor 16): java.lang.ArrayIndexOutOfBoundsException}} After some investigation, I found that the issue was that the data I was trying to predict on only had one label represented, rather than both positive and negative labels. Easy enough to fix, but I would like to ask if we could replace this error with one that explicitly points out the issue. Would it be acceptable to have a check ahead of time on labels that ensures all labels are represented? Alternately, can we change the docs for BinaryClassificationEvaluator to explain what this error means? was: Hi all, When I was trying to do some machine learning work with pyspark I ran into a confusing error message: # Model and train/test set generated evaluator = BinaryClassificationEvaluator(labelCol=label, metricName='areaUnderROC') prediction = model.transform(test_data) auc = evaluator.evaluate(prediction) org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 21.0 failed 4 times, most recent failure: Lost task 37.3 in stage 21.0 (TID 2811, 10.139.65.48, executor 16): java.lang.ArrayIndexOutOfBoundsException After some investigation, I found that the issue was that the data I was trying to predict on only had one label represented, rather than both positive and negative labels. Easy enough to fix, but I would like to ask if we could replace this error with one that explicitly points out the issue. Would it be acceptable to have a check ahead of time on labels that ensures all labels are represented? Alternately, can we change the docs for BinaryClassificationEvaluator to explain what this error means? > Give more informative error for BinaryClassificationEvaluator when data with > only one label is provided > --- > > Key: SPARK-30210 > URL: https://issues.apache.org/jira/browse/SPARK-30210 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.5 > Environment: Pyspark on Databricks >Reporter: Paul Anzel >Priority: Minor > > Hi all, > When I was trying to do some machine learning work with pyspark I ran into a > confusing error message: > {{# Model and train/test set generated...}} > {{ evaluator = BinaryClassificationEvaluator(labelCol=label, > metricName='areaUnderROC')}} > {{ prediction = model.transform(test_data)}} > {{ auc = evaluator.evaluate(prediction)}} > {{org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 > in stage 21.0 failed 4 times, most recent failure: Lost task 37.3 in stage > 21.0 (TID 2811, 10.139.65.48, executor 16): > java.lang.ArrayIndexOutOfBoundsException}} > After some investigation, I found that the issue was that the data I was > trying to predict on only had one label represented, rather than both > positive and negative labels. Easy enough to fix, but I would like to ask if > we could replace this error with one that explicitly points out the issue. > Would it be acceptable to have a check ahead of time on labels that ensures > all labels are represented? Alternately, can we change the docs for > BinaryClassificationEvaluator to explain what this error means? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30209) Display stageId, attemptId, taskId with SQL max metric in UI
Niranjan Artal created SPARK-30209: -- Summary: Display stageId, attemptId, taskId with SQL max metric in UI Key: SPARK-30209 URL: https://issues.apache.org/jira/browse/SPARK-30209 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Niranjan Artal It would be helpful if we could add stageId, stage attemptId and taskId in SQL UI. These additional metrics help in debugging the jobs quicker. For a given operator, it will be easy to identify the task which is taking maximum time to complete from the Spark UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21869: - Fix Version/s: (was: 3.0.0) > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992975#comment-16992975 ] Shixiong Zhu commented on SPARK-21869: -- Reopened this. https://github.com/apache/spark/pull/25853 has been reverted. > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.
[ https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21869: -- > A cached Kafka producer should not be closed if any task is using it. > - > > Key: SPARK-21869 > URL: https://issues.apache.org/jira/browse/SPARK-21869 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4, 3.0.0 >Reporter: Shixiong Zhu >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > Right now a cached Kafka producer may be closed if a large task uses it for > more than 10 minutes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29976) Allow speculation even if there is only one task
[ https://issues.apache.org/jira/browse/SPARK-29976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-29976. --- Fix Version/s: 3.0.0 Assignee: Yuchen Huo Resolution: Fixed > Allow speculation even if there is only one task > > > Key: SPARK-29976 > URL: https://issues.apache.org/jira/browse/SPARK-29976 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Yuchen Huo >Assignee: Yuchen Huo >Priority: Major > Fix For: 3.0.0 > > > In the current speculative execution implementation if there is only one task > in the stage then no speculative run would be conducted. However, there might > be cases where an executor have some problem in writing to its disk and just > hang forever. In this case, if the single task stage get assigned to the > problematic executor then the whole job would hang forever. It would be > better if we could run the task on another executor if this happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30208) A race condition when reading from Kafka in PySpark
Shixiong Zhu created SPARK-30208: Summary: A race condition when reading from Kafka in PySpark Key: SPARK-30208 URL: https://issues.apache.org/jira/browse/SPARK-30208 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.4 Reporter: Jiawen Zhu When using PySpark to read from Kafka, there is a race condition that Spark may use KafkaConsumer in multiple threads at the same time and throw the following error: {code} java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.acquire(KafkaConsumer.java:2215) at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2104) at kafkashaded.org.apache.kafka.clients.consumer.KafkaConsumer.close(KafkaConsumer.java:2059) at org.apache.spark.sql.kafka010.InternalKafkaConsumer.close(KafkaDataConsumer.scala:451) at org.apache.spark.sql.kafka010.KafkaDataConsumer$NonCachedKafkaDataConsumer.release(KafkaDataConsumer.scala:508) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.close(KafkaSourceRDD.scala:126) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:66) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anonfun$compute$3.apply(KafkaSourceRDD.scala:131) at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anonfun$compute$3.apply(KafkaSourceRDD.scala:130) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:162) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:131) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:131) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:144) at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:142) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:142) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:130) at org.apache.spark.scheduler.Task.doRunTask(Task.scala:155) at org.apache.spark.scheduler.Task.run(Task.scala:112) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} When using PySpark, reading from Kafka is actually happening in a separate writer thread rather that the task thread. When a task is early terminated (e.g., there is a limit operator), the task thread may stop the KafkaConsumer when the writer thread is using it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30205. --- Fix Version/s: 3.0.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/26835 > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Karthikeyan Singaravelan >Priority: Minor > Fix For: 3.0.0 > > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30205: - Assignee: Karthikeyan Singaravelan > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Karthikeyan Singaravelan >Assignee: Karthikeyan Singaravelan >Priority: Minor > Fix For: 3.0.0 > > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
[ https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992835#comment-16992835 ] Matt Boegner commented on SPARK-30130: -- [~Ankitraj] apologies, a typo was introduced when I copied the sample queries into the Jira code block. The query has been edited and should generate the error. Let me know if you have any questions. > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions > > > Key: SPARK-30130 > URL: https://issues.apache.org/jira/browse/SPARK-30130 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Matt Boegner >Priority: Minor > > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions. > {code:java} > val df = spark.sql(""" > with a as (select 0 as test, count(*) group by test) > select * from a > """) > df.show(){code} > This results in an error message like {color:#e01e5a}GROUP BY position 0 is > not in select list (valid range is [1, 2]){color} . > > However, this error does not appear in a traditional subselect format. For > example, this query executes correctly: > {code:java} > val df = spark.sql(""" > select * from (select 0 as test, count(*) group by test) a > """) > df.show(){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
[ https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Boegner updated SPARK-30130: - Description: Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions. {code:java} val df = spark.sql(""" with a as (select 0 as test, count(*) group by test) select * from a """) df.show(){code} This results in an error message like {color:#e01e5a}GROUP BY position 0 is not in select list (valid range is [1, 2]){color} . However, this error does not appear in a traditional subselect format. For example, this query executes correctly: {code:java} val df = spark.sql(""" select * from (select 0 as test, count(*) group by test) a """) df.show(){code} was: Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions. {code:java} val df = spark.sql(""" with a as (select 0 as test, count group by test) select * from a """) df.show(){code} This results in an error message like {color:#e01e5a}GROUP BY position 0 is not in select list (valid range is [1, 2]){color} . However, this error does not appear in a traditional subselect format. For example, this query executes correctly: {code:java} val df = spark.sql(""" select * from (select 0 as test, count group by test) a """) df.show(){code} > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions > > > Key: SPARK-30130 > URL: https://issues.apache.org/jira/browse/SPARK-30130 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Matt Boegner >Priority: Minor > > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions. > {code:java} > val df = spark.sql(""" > with a as (select 0 as test, count(*) group by test) > select * from a > """) > df.show(){code} > This results in an error message like {color:#e01e5a}GROUP BY position 0 is > not in select list (valid range is [1, 2]){color} . > > However, this error does not appear in a traditional subselect format. For > example, this query executes correctly: > {code:java} > val df = spark.sql(""" > select * from (select 0 as test, count(*) group by test) a > """) > df.show(){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql
[ https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-29587: --- Assignee: Kent Yao > Real data type is not supported in Spark SQL which is supporting in postgresql > -- > > Key: SPARK-29587 > URL: https://issues.apache.org/jira/browse/SPARK-29587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.4 >Reporter: jobit mathew >Assignee: Kent Yao >Priority: Minor > > Real data type is not supported in Spark SQL which is supporting in > postgresql. > +*In postgresql query success*+ > CREATE TABLE weather2(prcp real); > insert into weather2 values(2.5); > select * from weather2; > > || ||prcp|| > |1|2,5| > +*In spark sql getting error*+ > spark-sql> CREATE TABLE weather2(prcp real); > Error in query: > DataType real is not supported.(line 1, pos 27) > == SQL == > CREATE TABLE weather2(prcp real) > --- > Better to add the datatype "real " support in sql also > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql
[ https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-29587. - Fix Version/s: 3.0.0 Resolution: Fixed > Real data type is not supported in Spark SQL which is supporting in postgresql > -- > > Key: SPARK-29587 > URL: https://issues.apache.org/jira/browse/SPARK-29587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.4 >Reporter: jobit mathew >Assignee: Kent Yao >Priority: Minor > Fix For: 3.0.0 > > > Real data type is not supported in Spark SQL which is supporting in > postgresql. > +*In postgresql query success*+ > CREATE TABLE weather2(prcp real); > insert into weather2 values(2.5); > select * from weather2; > > || ||prcp|| > |1|2,5| > +*In spark sql getting error*+ > spark-sql> CREATE TABLE weather2(prcp real); > Error in query: > DataType real is not supported.(line 1, pos 27) > == SQL == > CREATE TABLE weather2(prcp real) > --- > Better to add the datatype "real " support in sql also > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql
[ https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reopened SPARK-29587: - > Real data type is not supported in Spark SQL which is supporting in postgresql > -- > > Key: SPARK-29587 > URL: https://issues.apache.org/jira/browse/SPARK-29587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.4 >Reporter: jobit mathew >Assignee: Kent Yao >Priority: Minor > > Real data type is not supported in Spark SQL which is supporting in > postgresql. > +*In postgresql query success*+ > CREATE TABLE weather2(prcp real); > insert into weather2 values(2.5); > select * from weather2; > > || ||prcp|| > |1|2,5| > +*In spark sql getting error*+ > spark-sql> CREATE TABLE weather2(prcp real); > Error in query: > DataType real is not supported.(line 1, pos 27) > == SQL == > CREATE TABLE weather2(prcp real) > --- > Better to add the datatype "real " support in sql also > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30200) Add ExplainMode for Dataset.explain
[ https://issues.apache.org/jira/browse/SPARK-30200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30200. --- Fix Version/s: 3.0.0 Assignee: Takeshi Yamamuro Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/26829 > Add ExplainMode for Dataset.explain > --- > > Key: SPARK-30200 > URL: https://issues.apache.org/jira/browse/SPARK-30200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro >Priority: Major > Fix For: 3.0.0 > > > This pr targets to add ExplainMode for explaining Dataset/DataFrame with a > given format mode (ExplainMode). ExplainMode has four types along with the > SQL EXPLAIN command: Simple, Extended, Codegen, Cost, and Formatted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30207) Enhance the SQL NULL Semantics document
Yuanjian Li created SPARK-30207: --- Summary: Enhance the SQL NULL Semantics document Key: SPARK-30207 URL: https://issues.apache.org/jira/browse/SPARK-30207 Project: Spark Issue Type: Improvement Components: Documentation, SQL Affects Versions: 3.0.0 Reporter: Yuanjian Li Enhancement of the SQL NULL Semantics document: sql-ref-null-semantics.html. Clarify the behavior of `UNKNOW` for both `EXIST` and `IN` operation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30125) Remove PostgreSQL dialect
[ https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30125. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26763 [https://github.com/apache/spark/pull/26763] > Remove PostgreSQL dialect > - > > Key: SPARK-30125 > URL: https://issues.apache.org/jira/browse/SPARK-30125 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > Fix For: 3.0.0 > > > As the discussion in > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], > we need to remove PostgreSQL dialect form code base for several reasons: > 1. The current approach makes the codebase complicated and hard to maintain. > 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. > > Curently we have 3 features under PostgreSQL dialect: > 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. > are also allowed as true string. > 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard > behavior), but return int in PostgreSQL > 3. SPARK-28395: `int / int` returns double in Spark, but returns int in > PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30125) Remove PostgreSQL dialect
[ https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30125: --- Assignee: Yuanjian Li > Remove PostgreSQL dialect > - > > Key: SPARK-30125 > URL: https://issues.apache.org/jira/browse/SPARK-30125 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Assignee: Yuanjian Li >Priority: Major > > As the discussion in > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], > we need to remove PostgreSQL dialect form code base for several reasons: > 1. The current approach makes the codebase complicated and hard to maintain. > 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. > > Curently we have 3 features under PostgreSQL dialect: > 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. > are also allowed as true string. > 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard > behavior), but return int in PostgreSQL > 3. SPARK-28395: `int / int` returns double in Spark, but returns int in > PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30205: -- Labels: (was: python3) > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 > Environment: Python version : 3.9 > Operating System : Linux >Reporter: Karthikeyan Singaravelan >Priority: Minor > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30205: -- Affects Version/s: (was: 2.4.4) 3.0.0 > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 > Environment: Python version : 3.9 > Operating System : Linux >Reporter: Karthikeyan Singaravelan >Priority: Minor > Labels: python3 > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30205: -- Environment: (was: Python version : 3.9 Operating System : Linux) > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Karthikeyan Singaravelan >Priority: Minor > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30205: -- Issue Type: Improvement (was: Bug) > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.4 > Environment: Python version : 3.9 > Operating System : Linux >Reporter: Karthikeyan Singaravelan >Priority: Minor > Labels: python3 > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30205) Import ABC from collections.abc to remove deprecation warnings
[ https://issues.apache.org/jira/browse/SPARK-30205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30205: -- Summary: Import ABC from collections.abc to remove deprecation warnings (was: Importing ABC from collections module is removed in Python 3.9) > Import ABC from collections.abc to remove deprecation warnings > -- > > Key: SPARK-30205 > URL: https://issues.apache.org/jira/browse/SPARK-30205 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 > Environment: Python version : 3.9 > Operating System : Linux >Reporter: Karthikeyan Singaravelan >Priority: Minor > Labels: python3 > > Importing ABC from collections module directly is deprecated since 3.4 and is > removed in Python 3.9. Thus this will cause ImportError for pyspark in Python > 3.9 in the resultiterable module where Iterable is used from collections at > https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. > > Relevant CPython PR : https://github.com/python/cpython/pull/10596. > I am a new contributor and would like to work on this issue. > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30197) Add minimum `requirements-dev.txt` file to `python` directory
[ https://issues.apache.org/jira/browse/SPARK-30197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30197: -- Summary: Add minimum `requirements-dev.txt` file to `python` directory (was: Add `requirements.txt` file to `python` directory) > Add minimum `requirements-dev.txt` file to `python` directory > - > > Key: SPARK-30197 > URL: https://issues.apache.org/jira/browse/SPARK-30197 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-30197) Add `requirements.txt` file to `python` directory
[ https://issues.apache.org/jira/browse/SPARK-30197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-30197. - > Add `requirements.txt` file to `python` directory > - > > Key: SPARK-30197 > URL: https://issues.apache.org/jira/browse/SPARK-30197 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30197) Add `requirements.txt` file to `python` directory
[ https://issues.apache.org/jira/browse/SPARK-30197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30197: -- Priority: Minor (was: Major) > Add `requirements.txt` file to `python` directory > - > > Key: SPARK-30197 > URL: https://issues.apache.org/jira/browse/SPARK-30197 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30197) Add `requirements.txt` file to `python` directory
[ https://issues.apache.org/jira/browse/SPARK-30197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30197. --- Resolution: Won't Do > Add `requirements.txt` file to `python` directory > - > > Key: SPARK-30197 > URL: https://issues.apache.org/jira/browse/SPARK-30197 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30206) Rename normalizeFilters in DataSourceStrategy to be generic
[ https://issues.apache.org/jira/browse/SPARK-30206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30206. --- Fix Version/s: 3.0.0 Assignee: Anton Okolnychyi Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/26830 > Rename normalizeFilters in DataSourceStrategy to be generic > --- > > Key: SPARK-30206 > URL: https://issues.apache.org/jira/browse/SPARK-30206 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Anton Okolnychyi >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30206) Rename normalizeFilters in DataSourceStrategy to be generic
Dongjoon Hyun created SPARK-30206: - Summary: Rename normalizeFilters in DataSourceStrategy to be generic Key: SPARK-30206 URL: https://issues.apache.org/jira/browse/SPARK-30206 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29967) KMeans support instance weighting
[ https://issues.apache.org/jira/browse/SPARK-29967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29967. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26739 [https://github.com/apache/spark/pull/26739] > KMeans support instance weighting > - > > Key: SPARK-29967 > URL: https://issues.apache.org/jira/browse/SPARK-29967 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.0.0 > > > Since https://issues.apache.org/jira/browse/SPARK-9610, we start to support > instance weighting in ML. > However, Clustering and other impl in features still do not support instance > weighting. > I think we need to start support weighting in KMeans, like what scikit-learn > does. > It will contains three parts: > 1, move the impl from .mllib to .ml > 2, make .mllib.KMeans as a wrapper of .ml.KMeans > 3, support instance weighting in the .ml.KMeans -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29967) KMeans support instance weighting
[ https://issues.apache.org/jira/browse/SPARK-29967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-29967: Assignee: Huaxin Gao > KMeans support instance weighting > - > > Key: SPARK-29967 > URL: https://issues.apache.org/jira/browse/SPARK-29967 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: Huaxin Gao >Priority: Major > > Since https://issues.apache.org/jira/browse/SPARK-9610, we start to support > instance weighting in ML. > However, Clustering and other impl in features still do not support instance > weighting. > I think we need to start support weighting in KMeans, like what scikit-learn > does. > It will contains three parts: > 1, move the impl from .mllib to .ml > 2, make .mllib.KMeans as a wrapper of .ml.KMeans > 3, support instance weighting in the .ml.KMeans -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20135) spark thriftserver2: no job running but containers not release on yarn
[ https://issues.apache.org/jira/browse/SPARK-20135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992644#comment-16992644 ] angerszhu commented on SPARK-20135: --- meet same problem in spark2.4.0 [~xwc3504] Have you have any idel now? > spark thriftserver2: no job running but containers not release on yarn > -- > > Key: SPARK-20135 > URL: https://issues.apache.org/jira/browse/SPARK-20135 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 > Environment: spark 2.0.1 with hadoop 2.6.0 >Reporter: bruce xu >Priority: Major > Attachments: 0329-1.png, 0329-2.png, 0329-3.png > > > i opened the executor dynamic allocation feature, however it doesn't work > sometimes. > i set the initial executor num 50, after job finished the cores and mem > resource did not release. > from the spark web UI, the active job/running task/stage num is 0 , but the > executors page show cores 1276, active task 7288. > from the yarn web UI, the thriftserver job's running containers is 639 > without releasing. > this may be a bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30204) Support for config Pod DNS for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-30204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vanderliang updated SPARK-30204: Description: Current we can not configure the pod dns nameservers and searches when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. {code:java} //代码占位符 apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsConfig: nameservers: - 1.2.3.4 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: "2" - name: edns0 {code} As a result, we can use the following property to specify the pod dns config. * spark.kubernetes.dnsConfig.nameservers, Comma separated list of the Kubernetes dns nameservers for driver and executor. * spark.kubernetes.dnsConfig.searches, Comma separated list of the Kubernetes dns searches for driver and executor. * spark.kubernetes.dnsConfig.options.[OptionVariableName], Add the dns option variable specified by OptionVariableName to the Driver And Executor process. The user can specify multiple of these to set multiple options variables. was: Current we can not configure the pod dns nameservers, searches and options when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. {code:java} //代码占位符 apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsConfig: nameservers: - 1.2.3.4 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: "2" - name: edns0 {code} > Support for config Pod DNS for Kubernetes > - > > Key: SPARK-30204 > URL: https://issues.apache.org/jira/browse/SPARK-30204 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: vanderliang >Priority: Major > > Current we can not configure the pod dns nameservers and searches when submit > a job via cli for kubernetes. However, this's a common scenarios for > hybricloud where we use public cloud compute resourses while with private > dns. > > {code:java} > //代码占位符 > apiVersion: v1 > kind: Pod > metadata: > namespace: default > name: dns-example > spec: > containers: > - name: test > image: nginx > dnsConfig: > nameservers: > - 1.2.3.4 > searches: > - ns1.svc.cluster-domain.example > - my.dns.search.suffix > options: > - name: ndots > value: "2" > - name: edns0 > {code} > As a result, we can use the following property to specify the pod dns config. > * spark.kubernetes.dnsConfig.nameservers, Comma separated list of the > Kubernetes dns nameservers for driver and executor. > * spark.kubernetes.dnsConfig.searches, Comma separated list of the > Kubernetes dns searches for driver and executor. > * spark.kubernetes.dnsConfig.options.[OptionVariableName], Add the dns > option variable specified by OptionVariableName to the Driver And Executor > process. The user can specify multiple of these to set multiple options > variables. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30204) Support for config Pod DNS for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-30204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vanderliang updated SPARK-30204: Description: Current we can not configure the pod dns nameservers, searches and options when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. {code:java} //代码占位符 apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsConfig: nameservers: - 1.2.3.4 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: "2" - name: edns0 {code} was: Current we can not configure the pod dns nameservers and searches when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. {code:java} //代码占位符 apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsConfig: nameservers: - 1.2.3.4 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: "2" - name: edns0 {code} > Support for config Pod DNS for Kubernetes > - > > Key: SPARK-30204 > URL: https://issues.apache.org/jira/browse/SPARK-30204 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: vanderliang >Priority: Major > > Current we can not configure the pod dns nameservers, searches and options > when submit a job via cli for kubernetes. However, this's a common scenarios > for hybricloud where we use public cloud compute resourses while with private > dns. > {code:java} > //代码占位符 > apiVersion: v1 > kind: Pod > metadata: > namespace: default > name: dns-example > spec: > containers: > - name: test > image: nginx > dnsConfig: > nameservers: > - 1.2.3.4 > searches: > - ns1.svc.cluster-domain.example > - my.dns.search.suffix > options: > - name: ndots > value: "2" > - name: edns0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30204) Support for config Pod DNS for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-30204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vanderliang updated SPARK-30204: Description: Current we can not configure the pod dns nameservers and searches when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. {code:java} //代码占位符 apiVersion: v1 kind: Pod metadata: namespace: default name: dns-example spec: containers: - name: test image: nginx dnsConfig: nameservers: - 1.2.3.4 searches: - ns1.svc.cluster-domain.example - my.dns.search.suffix options: - name: ndots value: "2" - name: edns0 {code} was:Current we can not configure the pod dns nameservers and searches when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. > Support for config Pod DNS for Kubernetes > - > > Key: SPARK-30204 > URL: https://issues.apache.org/jira/browse/SPARK-30204 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: vanderliang >Priority: Major > > Current we can not configure the pod dns nameservers and searches when submit > a job via cli for kubernetes. However, this's a common scenarios for > hybricloud where we use public cloud compute resourses while with private > dns. > {code:java} > //代码占位符 > apiVersion: v1 > kind: Pod > metadata: > namespace: default > name: dns-example > spec: > containers: > - name: test > image: nginx > dnsConfig: > nameservers: > - 1.2.3.4 > searches: > - ns1.svc.cluster-domain.example > - my.dns.search.suffix > options: > - name: ndots > value: "2" > - name: edns0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30205) Importing ABC from collections module is removed in Python 3.9
Karthikeyan Singaravelan created SPARK-30205: Summary: Importing ABC from collections module is removed in Python 3.9 Key: SPARK-30205 URL: https://issues.apache.org/jira/browse/SPARK-30205 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.4 Environment: Python version : 3.9 Operating System : Linux Reporter: Karthikeyan Singaravelan Importing ABC from collections module directly is deprecated since 3.4 and is removed in Python 3.9. Thus this will cause ImportError for pyspark in Python 3.9 in the resultiterable module where Iterable is used from collections at https://github.com/tirkarthi/spark/blob/aa9da9365ff31948e42ab4c6dcc6cb4cec5fd852/python/pyspark/resultiterable.py#L23. Relevant CPython PR : https://github.com/python/cpython/pull/10596. I am a new contributor and would like to work on this issue. Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30204) Support for config Pod DNS for Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-30204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vanderliang updated SPARK-30204: Summary: Support for config Pod DNS for Kubernetes (was: Support for config DNS for Kubernetes) > Support for config Pod DNS for Kubernetes > - > > Key: SPARK-30204 > URL: https://issues.apache.org/jira/browse/SPARK-30204 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0 >Reporter: vanderliang >Priority: Major > > Current we can not configure the pod dns nameservers and searches when submit > a job via cli for kubernetes. However, this's a common scenarios for > hybricloud where we use public cloud compute resourses while with private > dns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30204) Support for config DNS for Kubernetes
vanderliang created SPARK-30204: --- Summary: Support for config DNS for Kubernetes Key: SPARK-30204 URL: https://issues.apache.org/jira/browse/SPARK-30204 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.1.0 Reporter: vanderliang Current we can not configure the pod dns nameservers and searches when submit a job via cli for kubernetes. However, this's a common scenarios for hybricloud where we use public cloud compute resourses while with private dns. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30151) Issue better error message when user-specified schema not match relation schema
[ https://issues.apache.org/jira/browse/SPARK-30151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30151: --- Assignee: wuyi > Issue better error message when user-specified schema not match relation > schema > --- > > Key: SPARK-30151 > URL: https://issues.apache.org/jira/browse/SPARK-30151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > In DataSource.resolveRelation(), when relation schema does not match > user-specified schema, it raises exception and says that "$className does not > allow user-specified schemas." However, it does allow user-specified schema > if it matches relation schema. Instead, we should issue a better error > message to tell user what is really happening here, e.g. clarify the > mismatched fields to user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30151) Issue better error message when user-specified schema not match relation schema
[ https://issues.apache.org/jira/browse/SPARK-30151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30151. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26781 [https://github.com/apache/spark/pull/26781] > Issue better error message when user-specified schema not match relation > schema > --- > > Key: SPARK-30151 > URL: https://issues.apache.org/jira/browse/SPARK-30151 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.0.0 > > > In DataSource.resolveRelation(), when relation schema does not match > user-specified schema, it raises exception and says that "$className does not > allow user-specified schemas." However, it does allow user-specified schema > if it matches relation schema. Instead, we should issue a better error > message to tell user what is really happening here, e.g. clarify the > mismatched fields to user. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30203) store assignable if there exists an appropriate user-defined cast function
[ https://issues.apache.org/jira/browse/SPARK-30203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-30203: - Description: h3. 9.2 Store assignment h4. Syntax Rules 1) Let T be the TARGET and let V be the VALUE in an application of the Syntax Rules of this Subclause. 2) Let TD and SD be the declared types of T and V, respectively. 3) If TD is character string, binary string, numeric, boolean, datetime, interval, or a user-defined type, then either SD shall be assignable to TD or there shall exist an appropriate user-defined cast function UDCF from SD to TD. _NOTE 319 — “Appropriate user-defined cast function” is defined in Subclause 4.11, “Data conversions”_ h3. 4.11 Data conversions Implicit type conversion can occur in expressions, fetch operations, single row select operations, inserts, deletes, and updates. Explicit type conversions can be specified by the use of the CAST operator. The current implementation for ANSI store assignment is totally out of context. According to this rule, `there shall exist an appropriate user-defined cast function UDCF`, the spark legacy store assignment is just fine because we do have *appropriate cast* _*functions.*_ At least according to the ansi cast rule, the current ANSI assignment policy is too strict to the ANSI cast rules {code:java} * (SD) - (TD) - * | EN AN C D T TS YM DT BO UDT B RT CT RW * EN | Y Y Y N N N M M N M N M N N * AN | Y Y Y N N N N N N M N M N N * C | Y Y Y Y Y Y Y Y Y M N M N N * D | N N Y Y N Y N N N M N M N N * T | N N Y N Y Y N N N M N M N N * TS | N N Y Y Y Y N N N M N M N N * YM | M N Y N N N Y N N M N M N N * DT | M N Y N N N N Y N M N M N N * BO | N N Y N N N N N Y M N M N N * UDT | M M M M M M M M M M M M M N * B | N N N N N N N N N M Y M N N * RT | M M M M M M M M M M M M N N * CT | N N N N N N N N N M N N M N * RW | N N N N N N N N N N N N N M * * Where: * EN = Exact Numeric * AN = Approximate Numeric * C = Character (Fixed- or Variable-Length, or Character Large Object) * D = Date * T = Time * TS = Timestamp * YM = Year-Month Interval * DT = Day-Time Interval * BO = Boolean * UDT = User-Defined Type * B = Binary (Fixed- or Variable-Length or Binary Large Object) * RT = Reference type * CT = Collection type * RW = Row type {code} _cc [~cloud_fan] [~gengliang]_ [~maropu] was: h3. 9.2 Store assignment h4. Syntax Rules 1) Let T be the TARGET and let V be the VALUE in an application of the Syntax Rules of this Subclause. 2) Let TD and SD be the declared types of T and V, respectively. 3) If TD is character string, binary string, numeric, boolean, datetime, interval, or a user-defined type, then either SD shall be assignable to TD or there shall exist an appropriate user-defined cast function UDCF from SD to TD. _NOTE 319 — “Appropriate user-defined cast function” is defined in Subclause 4.11, “Data conversions”_ h3. 4.11 Data conversions Implicit type conversion can occur in expressions, fetch operations, single row select operations, inserts, deletes, and updates. Explicit type conversions can be specified by the use of the CAST operator. The current implementation for ANSI store assignment is totally out of context. According to this rule, `there shall exist an appropriate user-defined cast function UDCF`, the spark legacy store assignment is just fine because we do have *a_ppropriate cast_* _*functions.*_ At least according to the ansi cast rule, the current ANSI assignment policy is too strict to the ANSI cast rules {code:java} * (SD) - (TD) - * | EN AN C D T TS YM DT BO UDT B RT CT RW * EN | Y Y Y N N N M M N M N M N N * AN | Y Y Y N N N N N N M N M N N * C | Y Y Y Y Y Y Y Y Y M N M N N * D | N N Y Y N Y N N N M N M N N * T | N N Y N Y Y N N N M N M N N * TS | N N Y Y Y Y N N N M N M N N * YM | M N Y N N N Y N N M N M N N * DT | M N Y N N N N Y N M N M N N * BO | N N Y N N N N N Y M N M N N * UDT | M M M M M M M M M M M M M N * B | N N N N N N N N N M Y M N N * RT | M M M M M M M M M M M M N N * CT | N N N N N N N N N M N N M N * RW | N N N N N N N N N N N N N M * * Where: *
[jira] [Created] (SPARK-30203) store assignable if there exists an appropriate user-defined cast function
Kent Yao created SPARK-30203: Summary: store assignable if there exists an appropriate user-defined cast function Key: SPARK-30203 URL: https://issues.apache.org/jira/browse/SPARK-30203 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Kent Yao h3. 9.2 Store assignment h4. Syntax Rules 1) Let T be the TARGET and let V be the VALUE in an application of the Syntax Rules of this Subclause. 2) Let TD and SD be the declared types of T and V, respectively. 3) If TD is character string, binary string, numeric, boolean, datetime, interval, or a user-defined type, then either SD shall be assignable to TD or there shall exist an appropriate user-defined cast function UDCF from SD to TD. _NOTE 319 — “Appropriate user-defined cast function” is defined in Subclause 4.11, “Data conversions”_ h3. 4.11 Data conversions Implicit type conversion can occur in expressions, fetch operations, single row select operations, inserts, deletes, and updates. Explicit type conversions can be specified by the use of the CAST operator. The current implementation for ANSI store assignment is totally out of context. According to this rule, `there shall exist an appropriate user-defined cast function UDCF`, the spark legacy store assignment is just fine because we do have *a_ppropriate cast_* _*functions.*_ At least according to the ansi cast rule, the current ANSI assignment policy is too strict to the ANSI cast rules {code:java} * (SD) - (TD) - * | EN AN C D T TS YM DT BO UDT B RT CT RW * EN | Y Y Y N N N M M N M N M N N * AN | Y Y Y N N N N N N M N M N N * C | Y Y Y Y Y Y Y Y Y M N M N N * D | N N Y Y N Y N N N M N M N N * T | N N Y N Y Y N N N M N M N N * TS | N N Y Y Y Y N N N M N M N N * YM | M N Y N N N Y N N M N M N N * DT | M N Y N N N N Y N M N M N N * BO | N N Y N N N N N Y M N M N N * UDT | M M M M M M M M M M M M M N * B | N N N N N N N N N M Y M N N * RT | M M M M M M M M M M M M N N * CT | N N N N N N N N N M N N M N * RW | N N N N N N N N N N N N N M * * Where: * EN = Exact Numeric * AN = Approximate Numeric * C = Character (Fixed- or Variable-Length, or Character Large Object) * D = Date * T = Time * TS = Timestamp * YM = Year-Month Interval * DT = Day-Time Interval * BO = Boolean * UDT = User-Defined Type * B = Binary (Fixed- or Variable-Length or Binary Large Object) * RT = Reference type * CT = Collection type * RW = Row type {code} _cc [~cloud_fan] [~gengliang]_ [~maropu] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30202) impl QuantileTransform
zhengruifeng created SPARK-30202: Summary: impl QuantileTransform Key: SPARK-30202 URL: https://issues.apache.org/jira/browse/SPARK-30202 Project: Spark Issue Type: Improvement Components: ML, PySpark Affects Versions: 3.0.0 Reporter: zhengruifeng Recently, I encountered some practice senarinos to map the data to another distribution. Then I found that QuantileTransformer in sklearn is what I needed, I locally fitted a model on sampled dataset and broadcast it to transform the whole dataset in pyspark. After that I impled QuantileTransform as a new Estimator atop Spark, the impl followed scikit-learn' s impl, however there still are sereral differences: 1, use QuantileSummaries for approximation, no matter the size of dataset; 2, use linear interpolate, the logic is similar to existing IsotonicRegression, while scikit-learn use a bi-directional interpolate; 3, when skipZero=true, treat sparse vectors just like dense ones, while scikit-learn have two different logics for sparse and dense datasets. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30201) HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT
ulysses you created SPARK-30201: --- Summary: HiveOutputWriter standardOI should use ObjectInspectorCopyOption.DEFAULT Key: SPARK-30201 URL: https://issues.apache.org/jira/browse/SPARK-30201 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: ulysses you Now spark use `ObjectInspectorCopyOption.JAVA` as oi option which will convert any string to UTF-8 string. When write non UTF-8 code data, then `EFBFBD` will appear. We should use `ObjectInspectorCopyOption.DEFAULT` to support pass the bytes. Here is the way to reproduce: 1. make a file contains 16 radix 'AABBCC' which is not the UTF-8 code. 2. create table test1 (c string) location '$file_path'; 3. select hex(c) from test1; // AABBCC 4. craete table test2 (c string) as select c from test1; 5. select hex(c) from test2; // EFBFBDEFBFBDEFBFBD -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28664) ORDER BY in aggregate function
[ https://issues.apache.org/jira/browse/SPARK-28664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992427#comment-16992427 ] jiaan.geng commented on SPARK-28664: [https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/expected/aggregates.out#L2239] [~yumwang] I didn't understand the meaning of this syntax . If it is valuable, I will do it. > ORDER BY in aggregate function > -- > > Key: SPARK-28664 > URL: https://issues.apache.org/jira/browse/SPARK-28664 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {code:sql} > SELECT min(x ORDER BY y) FROM (VALUES(1, NULL)) AS d(x,y); > SELECT min(x ORDER BY y) FROM (VALUES(1, 2)) AS d(x,y); > {code} > https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/sql/aggregates.sql#L978-L982 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30092) Number of active tasks is negative in Live UI Executors page
[ https://issues.apache.org/jira/browse/SPARK-30092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992401#comment-16992401 ] ZhongYu commented on SPARK-30092: - It is hard to give steps that will certainly reproduce this issues. But I use this step to reproduce this issues with relatively large probability. # Deploy yarn using AWS ec2 ( or other virtual machine ) # Start spark job on yarn in client mode. # Stop some yarn ec2 slaves that spark job are running > Number of active tasks is negative in Live UI Executors page > > > Key: SPARK-30092 > URL: https://issues.apache.org/jira/browse/SPARK-30092 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.1 > Environment: Hadoop version: 2.7.3 > ResourceManager version: 2.7.3 >Reporter: ZhongYu >Priority: Major > Attachments: wx20191202-102...@2x.png > > > The number of active tasks is negative in Live UI Executors page when there > is executor lost and task failure. I am using spark on yarn which built on > AWS spot instances. When yarn work lost, there is a large probability to > become negative active tasks in Spark Live UI. > I saw related tickets below and resolved in earlier version of Spark. But > Same things happened again in Spark 2.4.1. See attachment. > https://issues.apache.org/jira/browse/SPARK-8560 > https://issues.apache.org/jira/browse/SPARK-10141 > https://issues.apache.org/jira/browse/SPARK-19356 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org