[jira] [Commented] (SPARK-40574) Add PURGE to DROP TABLE doc
[ https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609810#comment-17609810 ] Apache Spark commented on SPARK-40574: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/38011 > Add PURGE to DROP TABLE doc > --- > > Key: SPARK-40574 > URL: https://issues.apache.org/jira/browse/SPARK-40574 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40574) Add PURGE to DROP TABLE doc
[ https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40574: Assignee: Apache Spark > Add PURGE to DROP TABLE doc > --- > > Key: SPARK-40574 > URL: https://issues.apache.org/jira/browse/SPARK-40574 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40574) Add PURGE to DROP TABLE doc
[ https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40574: Assignee: (was: Apache Spark) > Add PURGE to DROP TABLE doc > --- > > Key: SPARK-40574 > URL: https://issues.apache.org/jira/browse/SPARK-40574 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40574) Add PURGE to DROP TABLE doc
[ https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609809#comment-17609809 ] Apache Spark commented on SPARK-40574: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/38011 > Add PURGE to DROP TABLE doc > --- > > Key: SPARK-40574 > URL: https://issues.apache.org/jira/browse/SPARK-40574 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Priority: Minor (was: Major) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Issue Type: Improvement (was: Bug) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Minor > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40574) Add PURGE to DROP TABLE doc
Yuming Wang created SPARK-40574: --- Summary: Add PURGE to DROP TABLE doc Key: SPARK-40574 URL: https://issues.apache.org/jira/browse/SPARK-40574 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 3.4.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project
[ https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609802#comment-17609802 ] forrest lv commented on SPARK-26254: nice job > Move delegation token providers into a separate project > --- > > Key: SPARK-26254 > URL: https://issues.apache.org/jira/browse/SPARK-26254 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.0.0 > > > There was a discussion in > [PR#22598|https://github.com/apache/spark/pull/22598] that there are several > provided dependencies inside core project which shouldn't be there (for ex. > hive and kafka). This jira is to solve this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers
[ https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40573: Assignee: Apache Spark > Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary > integers > -- > > Key: SPARK-40573 > URL: https://issues.apache.org/jira/browse/SPARK-40573 > Project: Spark > Issue Type: Sub-task > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers
[ https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40573: Assignee: (was: Apache Spark) > Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary > integers > -- > > Key: SPARK-40573 > URL: https://issues.apache.org/jira/browse/SPARK-40573 > Project: Spark > Issue Type: Sub-task > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers
[ https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609791#comment-17609791 ] Apache Spark commented on SPARK-40573: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38009 > Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary > integers > -- > > Key: SPARK-40573 > URL: https://issues.apache.org/jira/browse/SPARK-40573 > Project: Spark > Issue Type: Sub-task > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers
Ruifeng Zheng created SPARK-40573: - Summary: Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers Key: SPARK-40573 URL: https://issues.apache.org/jira/browse/SPARK-40573 Project: Spark Issue Type: Sub-task Components: ps Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609785#comment-17609785 ] Qian Sun commented on SPARK-40572: -- I think the root cause is that [executorId is string in TaskDataWrapper|https://github.com/apache/spark/blob/072575c9e6fc304f09e01ad0ee180c8f309ede91/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L174-L175]. Executor ID is string in apache spark and there are tons of changes that will be introduced into apache spark if modify the type. > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Summary: Executor ID sorted as lexicographical order in Task Table of Stage Tab (was: Executor ID sorted as lexicographical order in UI Stages Tab) > Executor ID sorted as lexicographical order in Task Table of Stage Tab > -- > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40564: - Priority: Major (was: Blocker) > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Major > Attachments: Part of the code.png, The output of the abnormal > process.png, Value of df.png > > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away.The value of the dataframe, part of the code, and the output > of the exception process are attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609784#comment-17609784 ] Hyukjin Kwon commented on SPARK-40563: -- Please go ahead. > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40563: - Fix Version/s: (was: 3.3.0) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40563: - Target Version/s: (was: 3.3.0) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures
[ https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609780#comment-17609780 ] Apache Spark commented on SPARK-40571: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/38008 > Construct a test case to verify fault-tolerance semantic with random python > worker failures > --- > > Key: SPARK-40571 > URL: https://issues.apache.org/jira/browse/SPARK-40571 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Major > > We'd like to make sure fault-tolerance semantic is respected with random > failures on python worker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures
[ https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40571: Assignee: (was: Apache Spark) > Construct a test case to verify fault-tolerance semantic with random python > worker failures > --- > > Key: SPARK-40571 > URL: https://issues.apache.org/jira/browse/SPARK-40571 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Major > > We'd like to make sure fault-tolerance semantic is respected with random > failures on python worker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures
[ https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40571: Assignee: Apache Spark > Construct a test case to verify fault-tolerance semantic with random python > worker failures > --- > > Key: SPARK-40571 > URL: https://issues.apache.org/jira/browse/SPARK-40571 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Assignee: Apache Spark >Priority: Major > > We'd like to make sure fault-tolerance semantic is respected with random > failures on python worker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures
[ https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609779#comment-17609779 ] Apache Spark commented on SPARK-40571: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/38008 > Construct a test case to verify fault-tolerance semantic with random python > worker failures > --- > > Key: SPARK-40571 > URL: https://issues.apache.org/jira/browse/SPARK-40571 > Project: Spark > Issue Type: Sub-task > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Jungtaek Lim >Priority: Major > > We'd like to make sure fault-tolerance semantic is respected with random > failures on python worker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40557) Re-generate Spark Connect Python protos
[ https://issues.apache.org/jira/browse/SPARK-40557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40557. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37993 [https://github.com/apache/spark/pull/37993] > Re-generate Spark Connect Python protos > --- > > Key: SPARK-40557 > URL: https://issues.apache.org/jira/browse/SPARK-40557 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > > The existing protos have a reference to Databricks specific go package names > that have been removed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40557) Re-generate Spark Connect Python protos
[ https://issues.apache.org/jira/browse/SPARK-40557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40557: Assignee: Martin Grund > Re-generate Spark Connect Python protos > --- > > Key: SPARK-40557 > URL: https://issues.apache.org/jira/browse/SPARK-40557 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > > The existing protos have a reference to Databricks specific go package names > that have been removed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40561) Implement `min_count` in GroupBy.min
[ https://issues.apache.org/jira/browse/SPARK-40561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40561. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37998 [https://github.com/apache/spark/pull/37998] > Implement `min_count` in GroupBy.min > > > Key: SPARK-40561 > URL: https://issues.apache.org/jira/browse/SPARK-40561 > Project: Spark > Issue Type: Sub-task > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40561) Implement `min_count` in GroupBy.min
[ https://issues.apache.org/jira/browse/SPARK-40561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40561: - Assignee: Ruifeng Zheng > Implement `min_count` in GroupBy.min > > > Key: SPARK-40561 > URL: https://issues.apache.org/jira/browse/SPARK-40561 > Project: Spark > Issue Type: Sub-task > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40566) Add showIndex function
[ https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609771#comment-17609771 ] Apache Spark commented on SPARK-40566: -- User 'huleilei' has created a pull request for this issue: https://github.com/apache/spark/pull/38007 > Add showIndex function > --- > > Key: SPARK-40566 > URL: https://issues.apache.org/jira/browse/SPARK-40566 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: KaiXinXIaoLei >Priority: Major > > I find there isn't a showIndex function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Attachment: Executor_ID_IN_STAGES_TAB.png > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > Attachments: Executor_ID_IN_STAGES_TAB.png > > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40566) Add showIndex function
[ https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40566: Assignee: (was: Apache Spark) > Add showIndex function > --- > > Key: SPARK-40566 > URL: https://issues.apache.org/jira/browse/SPARK-40566 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: KaiXinXIaoLei >Priority: Major > > I find there isn't a showIndex function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40566) Add showIndex function
[ https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40566: Assignee: Apache Spark > Add showIndex function > --- > > Key: SPARK-40566 > URL: https://issues.apache.org/jira/browse/SPARK-40566 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: KaiXinXIaoLei >Assignee: Apache Spark >Priority: Major > > I find there isn't a showIndex function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40566) Add showIndex function
[ https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609770#comment-17609770 ] Apache Spark commented on SPARK-40566: -- User 'huleilei' has created a pull request for this issue: https://github.com/apache/spark/pull/38007 > Add showIndex function > --- > > Key: SPARK-40566 > URL: https://issues.apache.org/jira/browse/SPARK-40566 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: KaiXinXIaoLei >Priority: Major > > I find there isn't a showIndex function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
Qian Sun created SPARK-40572: Summary: Executor ID sorted as lexicographical order in UI Stages Tab Key: SPARK-40572 URL: https://issues.apache.org/jira/browse/SPARK-40572 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.3.0 Reporter: Qian Sun As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order !image-2022-09-27-09-26-46-755.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab
[ https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Qian Sun updated SPARK-40572: - Description: As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order (was: As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. Better sort as number order !image-2022-09-27-09-26-46-755.png!) > Executor ID sorted as lexicographical order in UI Stages Tab > > > Key: SPARK-40572 > URL: https://issues.apache.org/jira/browse/SPARK-40572 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.3.0 >Reporter: Qian Sun >Priority: Major > > As figure shows, Executor ID sorted as lexicographical order in UI Stages > Tab. Better sort as number order -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures
Jungtaek Lim created SPARK-40571: Summary: Construct a test case to verify fault-tolerance semantic with random python worker failures Key: SPARK-40571 URL: https://issues.apache.org/jira/browse/SPARK-40571 Project: Spark Issue Type: Sub-task Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Jungtaek Lim We'd like to make sure fault-tolerance semantic is respected with random failures on python worker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40570) Add doc for Docker Setup in standalone mode
Qian Sun created SPARK-40570: Summary: Add doc for Docker Setup in standalone mode Key: SPARK-40570 URL: https://issues.apache.org/jira/browse/SPARK-40570 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40569) Expose port for spark standalone mode
Qian Sun created SPARK-40569: Summary: Expose port for spark standalone mode Key: SPARK-40569 URL: https://issues.apache.org/jira/browse/SPARK-40569 Project: Spark Issue Type: Sub-task Components: Project Infra Affects Versions: 3.4.0 Reporter: Qian Sun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35242) Support change catalog default database for spark
[ https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35242: --- Assignee: Gabor Roczei > Support change catalog default database for spark > - > > Key: SPARK-35242 > URL: https://issues.apache.org/jira/browse/SPARK-35242 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: hong dongdong >Assignee: Gabor Roczei >Priority: Major > Fix For: 3.4.0 > > > Spark catalog default database can only be 'default'. When we can not access > 'default', we will get Exception 'Permission denied:'. We should support > change default datbase for catalog like 'jdbc/thrift' does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35242) Support change catalog default database for spark
[ https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35242. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37679 [https://github.com/apache/spark/pull/37679] > Support change catalog default database for spark > - > > Key: SPARK-35242 > URL: https://issues.apache.org/jira/browse/SPARK-35242 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: hong dongdong >Priority: Major > Fix For: 3.4.0 > > > Spark catalog default database can only be 'default'. When we can not access > 'default', we will get Exception 'Permission denied:'. We should support > change default datbase for catalog like 'jdbc/thrift' does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40536) Make Spark Connect port configurable.
[ https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40536: Assignee: Apache Spark > Make Spark Connect port configurable. > - > > Key: SPARK-40536 > URL: https://issues.apache.org/jira/browse/SPARK-40536 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Apache Spark >Priority: Minor > > Make Spark Connect port configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40536) Make Spark Connect port configurable.
[ https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609718#comment-17609718 ] Apache Spark commented on SPARK-40536: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38006 > Make Spark Connect port configurable. > - > > Key: SPARK-40536 > URL: https://issues.apache.org/jira/browse/SPARK-40536 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Minor > > Make Spark Connect port configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40536) Make Spark Connect port configurable.
[ https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40536: Assignee: (was: Apache Spark) > Make Spark Connect port configurable. > - > > Key: SPARK-40536 > URL: https://issues.apache.org/jira/browse/SPARK-40536 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Priority: Minor > > Make Spark Connect port configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609691#comment-17609691 ] Apache Spark commented on SPARK-40550: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/38005 > DataSource V2: Handle DELETE commands for delta-based sources > - > > Key: SPARK-40550 > URL: https://issues.apache.org/jira/browse/SPARK-40550 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > We need to support DELETE operations for delta-based sources per approved > SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609690#comment-17609690 ] Apache Spark commented on SPARK-40550: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/38005 > DataSource V2: Handle DELETE commands for delta-based sources > - > > Key: SPARK-40550 > URL: https://issues.apache.org/jira/browse/SPARK-40550 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > We need to support DELETE operations for delta-based sources per approved > SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40550: Assignee: (was: Apache Spark) > DataSource V2: Handle DELETE commands for delta-based sources > - > > Key: SPARK-40550 > URL: https://issues.apache.org/jira/browse/SPARK-40550 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > We need to support DELETE operations for delta-based sources per approved > SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources
[ https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40550: Assignee: Apache Spark > DataSource V2: Handle DELETE commands for delta-based sources > - > > Key: SPARK-40550 > URL: https://issues.apache.org/jira/browse/SPARK-40550 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > We need to support DELETE operations for delta-based sources per approved > SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations
[ https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40551: Assignee: Apache Spark > DataSource V2: Add APIs for delta-based row-level operations > > > Key: SPARK-40551 > URL: https://issues.apache.org/jira/browse/SPARK-40551 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > Add DataSource V2 APIs for handling delta-based row-level operations per > approved SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations
[ https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609688#comment-17609688 ] Apache Spark commented on SPARK-40551: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/38004 > DataSource V2: Add APIs for delta-based row-level operations > > > Key: SPARK-40551 > URL: https://issues.apache.org/jira/browse/SPARK-40551 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > Add DataSource V2 APIs for handling delta-based row-level operations per > approved SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations
[ https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40551: Assignee: (was: Apache Spark) > DataSource V2: Add APIs for delta-based row-level operations > > > Key: SPARK-40551 > URL: https://issues.apache.org/jira/browse/SPARK-40551 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Anton Okolnychyi >Priority: Major > > Add DataSource V2 APIs for handling delta-based row-level operations per > approved SPIP. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609592#comment-17609592 ] Max Gekk commented on SPARK-40358: -- [~lvshaokang] Sure, go ahead. > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609584#comment-17609584 ] Shaokang Lv commented on SPARK-40358: - Hi, [~maxgekk] , I will pick up this if possible. > Migrate collection type check failures onto error classes > - > > Key: SPARK-40358 > URL: https://issues.apache.org/jira/browse/SPARK-40358 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in collection > expressions: > 1. BinaryArrayExpressionWithImplicitCast (1): > [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69] > 2. MapContainsKey (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237 > 3. MapConcat (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663 > 4. MapFromEntries (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40357) Migrate window type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40357: Assignee: Shaokang Lv > Migrate window type check failures onto error classes > - > > Key: SPARK-40357 > URL: https://issues.apache.org/jira/browse/SPARK-40357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Shaokang Lv >Priority: Major > > Replace TypeCheckFailure by DataTypeMismatch in type checks in window > expressions: > 1. WindowSpecDefinition (4): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L68-L85 > 2. SpecifiedWindowFrame (3): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L216-L231 > 3. checkBoundary (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L264-L269 > 4. FrameLessOffsetWindowFunction (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L424 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40357) Migrate window type check failures onto error classes
[ https://issues.apache.org/jira/browse/SPARK-40357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40357. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37986 [https://github.com/apache/spark/pull/37986] > Migrate window type check failures onto error classes > - > > Key: SPARK-40357 > URL: https://issues.apache.org/jira/browse/SPARK-40357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Shaokang Lv >Priority: Major > Fix For: 3.4.0 > > > Replace TypeCheckFailure by DataTypeMismatch in type checks in window > expressions: > 1. WindowSpecDefinition (4): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L68-L85 > 2. SpecifiedWindowFrame (3): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L216-L231 > 3. checkBoundary (2): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L264-L269 > 4. FrameLessOffsetWindowFunction (1): > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L424 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609536#comment-17609536 ] ming95 commented on SPARK-40563: I can reproduce this problem. I can try to fix this issue if no one else is working on it . :) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40568) Spark Streaming support Debezium
melin created SPARK-40568: - Summary: Spark Streaming support Debezium Key: SPARK-40568 URL: https://issues.apache.org/jira/browse/SPARK-40568 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.4.0 Reporter: melin Debezuim is a very popular CDC technology. Spark Structured Streaming supports Debezuim, which facilitates data writing to data lakes。 The most commonly used scheme is FLink CDC,Hope Spark can support it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6
[ https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-40552: - Priority: Minor (was: Major) > Upgrade protobuf-python from 4.21.5 to 4.21.6 > - > > Key: SPARK-40552 > URL: https://issues.apache.org/jira/browse/SPARK-40552 > Project: Spark > Issue Type: Dependency upgrade > Components: Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] > [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6
[ https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40552. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37991 [https://github.com/apache/spark/pull/37991] > Upgrade protobuf-python from 4.21.5 to 4.21.6 > - > > Key: SPARK-40552 > URL: https://issues.apache.org/jira/browse/SPARK-40552 > Project: Spark > Issue Type: Dependency upgrade > Components: Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.4.0 > > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] > [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6
[ https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-40552: - Component/s: Build > Upgrade protobuf-python from 4.21.5 to 4.21.6 > - > > Key: SPARK-40552 > URL: https://issues.apache.org/jira/browse/SPARK-40552 > Project: Spark > Issue Type: Dependency upgrade > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Minor > Fix For: 3.4.0 > > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] > [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6
[ https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40552: Assignee: Bjørn Jørgensen > Upgrade protobuf-python from 4.21.5 to 4.21.6 > - > > Key: SPARK-40552 > URL: https://issues.apache.org/jira/browse/SPARK-40552 > Project: Spark > Issue Type: Dependency upgrade > Components: Connect >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > > [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941] > [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40478) Add create datasource table options docs
[ https://issues.apache.org/jira/browse/SPARK-40478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40478. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37919 [https://github.com/apache/spark/pull/37919] > Add create datasource table options docs > > > Key: SPARK-40478 > URL: https://issues.apache.org/jira/browse/SPARK-40478 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40478) Add create datasource table options docs
[ https://issues.apache.org/jira/browse/SPARK-40478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40478: Assignee: XiDuo You > Add create datasource table options docs > > > Key: SPARK-40478 > URL: https://issues.apache.org/jira/browse/SPARK-40478 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507 ] John Pellman edited comment on SPARK-12216 at 9/26/22 1:42 PM: --- Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {code} The problem in this case seems to be that {{spark-shell}} is attempting to do a recursive unlink while files are still open (NFS client-side [silly renames|http://nfs.sourceforge.net/#faq_d2]). It looks like this overall issue might be less of a "weird Windows thing" and more of an issue with spark-shell not waiting until all file handles are closed before attempting to remove the temp dir. This behavior cannot be reproduced consistently and appears to be non-deterministic. The obvious workaround here is to not put temp directories on NFS, but it does seem like you're relying upon file handling behavior that is specific to how Linux behaves using non-NFS volumes rather than doing a sanity check within spark-shell/scala(which might not be a bad idea). was (Author: jpellman): Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507 ] John Pellman edited comment on SPARK-12216 at 9/26/22 1:40 PM: --- Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {code} The problem in this case seems to be that {{spark-shell}} is attempting to do a recursive unlink while files are still open (NFS client-side [silly renames|http://nfs.sourceforge.net/#faq_d2]). It looks like this overall issue might be less of a "weird Windows thing" and more of an issue with spark-shell not waiting until all file handles are closed before attempting to remove the temp dir. This behavior cannot be reproduced consistently and appears to be non-deterministic. The obvious workaround here is to not put temp directories on NFS, but it does seem like you're relying upon file handling behavior that is specific to Linux rather than doing a sanity check within spark-shell/scala(which might not be a bad idea). was (Author: jpellman): Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at
[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507 ] John Pellman commented on SPARK-12216: -- Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {code} The problem in this case seems to be that {{spark-shell}} is attempting to do a recursive unlink while files are still open (NFS client-side [silly renames|http://nfs.sourceforge.net/#faq_d2]). It looks like this overall issue might be less of a "weird Windows thing" and more of an issue with spark-shell not waiting until all file handles are closed before attempting to remove the temp dir. This behavior cannot be reproduced consistently and appears to be non-deterministic. The obvious workaround here is to not put temp directories on NFS, but it does seem like you're relying upon Linux to block the recursive unlink until all file handles are closed rather than doing a sanity check within spark-shell/scala(which might not be a bad idea). > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR
[jira] [Created] (SPARK-40567) SharedState to redact secrets when propagating them to HadoopConf
Steve Loughran created SPARK-40567: -- Summary: SharedState to redact secrets when propagating them to HadoopConf Key: SPARK-40567 URL: https://issues.apache.org/jira/browse/SPARK-40567 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Steve Loughran When SharedState propagates (key, value) pairs from initialConfigs to HadoopConf, it logs the values at debug. If the config contained secrets (cloud credentials, etc) the log will contain them. The org.apache.hadoop.conf.ConfigRedactor class will redact values of all keys matching a patten in "hadoop.security.sensitive-config-keys"; this is configured by default to be {code} "secret$", "password$", "ssl.keystore.pass$", "fs.s3.*[Ss]ecret.?[Kk]ey", "fs.s3a.*.server-side-encryption.key", "fs.s3a.encryption.algorithm", "fs.s3a.encryption.key", "fs.azure\\.account.key.*", "credential$", "oauth.*secret", "oauth.*password", "oauth.*token", "hadoop.security.sensitive-config-keys" {code} ...And it may be extended in site configs/future hadoop releases Spark should be using the redactor for log hygiene/security -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40566) Add showIndex function
KaiXinXIaoLei created SPARK-40566: - Summary: Add showIndex function Key: SPARK-40566 URL: https://issues.apache.org/jira/browse/SPARK-40566 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: KaiXinXIaoLei I find there isn't a showIndex function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources
[ https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40565: Assignee: Apache Spark > Non-deterministic filters shouldn't get pushed to V2 file sources > - > > Key: SPARK-40565 > URL: https://issues.apache.org/jira/browse/SPARK-40565 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Adam Binford >Assignee: Apache Spark >Priority: Major > > Currently non-deterministic filters can be pushed down to V2 file sources, > which is different from V1 which prevents out non-deterministic filters from > being pushed. > Main consequences: > * Things like doing a rand filter on a partition column will throw an > exception: > ** {{IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval.}} > * {{Using a non-deterministic UDF to collect metrics via accumulators gets > pushed down and gives the wrong metrics}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources
[ https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609489#comment-17609489 ] Apache Spark commented on SPARK-40565: -- User 'Kimahriman' has created a pull request for this issue: https://github.com/apache/spark/pull/38003 > Non-deterministic filters shouldn't get pushed to V2 file sources > - > > Key: SPARK-40565 > URL: https://issues.apache.org/jira/browse/SPARK-40565 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Adam Binford >Priority: Major > > Currently non-deterministic filters can be pushed down to V2 file sources, > which is different from V1 which prevents out non-deterministic filters from > being pushed. > Main consequences: > * Things like doing a rand filter on a partition column will throw an > exception: > ** {{IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval.}} > * {{Using a non-deterministic UDF to collect metrics via accumulators gets > pushed down and gives the wrong metrics}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources
[ https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40565: Assignee: (was: Apache Spark) > Non-deterministic filters shouldn't get pushed to V2 file sources > - > > Key: SPARK-40565 > URL: https://issues.apache.org/jira/browse/SPARK-40565 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Adam Binford >Priority: Major > > Currently non-deterministic filters can be pushed down to V2 file sources, > which is different from V1 which prevents out non-deterministic filters from > being pushed. > Main consequences: > * Things like doing a rand filter on a partition column will throw an > exception: > ** {{IllegalArgumentException: requirement failed: Nondeterministic > expression org.apache.spark.sql.catalyst.expressions.Rand should be > initialized before eval.}} > * {{Using a non-deterministic UDF to collect metrics via accumulators gets > pushed down and gives the wrong metrics}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Description: When I ran my program with the Dataframe structure in Pyspark.PANDAS, there is an abnormal extra process on the master. My dataframe contains three columns named "id", "path", and "category". It contains more than 300,000 pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I use "groupBy (" id ").apply(func)", my four nodes run normally, but there is an abnormal process in the master, which contains 1001 pieces of data. This process also executes the code in "func" and is divided into four parts, each part contains more than 200 pieces of data. When I collect the results in each node, I can only collect the results of 1001 data points, and the results of 300,000 data points are lost. When I tried to reduce the number of data to about 20,000, this problem still occurred and the data volume was still 1001. I suspect there is a problem with the implementation of this API.I tried setting the number of data partitions to 4, but the problem didn't go away.The value of the dataframe, part of the code, and the output of the exception process are attached (was: When I ran my program with the Dataframe structure in Pyspark.PANDAS, there is an abnormal extra process on the master. My dataframe contains three columns named "id", "path", and "category". It contains more than 300,000 pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I use "groupBy (" id ").apply(func)", my four nodes run normally, but there is an abnormal process in the master, which contains 1001 pieces of data. This process also executes the code in "func" and is divided into four parts, each part contains more than 200 pieces of data. When I collect the results in each node, I can only collect the results of 1001 data points, and the results of 300,000 data points are lost. When I tried to reduce the number of data to about 20,000, this problem still occurred and the data volume was still 1001. I suspect there is a problem with the implementation of this API.I tried setting the number of data partitions to 4, but the problem didn't go away.) > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > Attachments: Part of the code.png, The output of the abnormal > process.png, Value of df.png > > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away.The value of the dataframe, part of the code, and the output > of the exception process are attached -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Attachment: Part of the code.png > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > Attachments: Part of the code.png, The output of the abnormal > process.png, Value of df.png > > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Attachment: Value of df.png > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > Attachments: Part of the code.png, The output of the abnormal > process.png, Value of df.png > > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Attachment: The output of the abnormal process.png > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > Attachments: Part of the code.png, The output of the abnormal > process.png, Value of df.png > > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Attachment: (was: 2022-09-26 20-54-37 的屏幕截图.png) > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > is an abnormal extra process on the master. My dataframe contains three > columns named "id", "path", and "category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources
Adam Binford created SPARK-40565: Summary: Non-deterministic filters shouldn't get pushed to V2 file sources Key: SPARK-40565 URL: https://issues.apache.org/jira/browse/SPARK-40565 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Adam Binford Currently non-deterministic filters can be pushed down to V2 file sources, which is different from V1 which prevents out non-deterministic filters from being pushed. Main consequences: * Things like doing a rand filter on a partition column will throw an exception: ** {{IllegalArgumentException: requirement failed: Nondeterministic expression org.apache.spark.sql.catalyst.expressions.Rand should be initialized before eval.}} * {{Using a non-deterministic UDF to collect metrics via accumulators gets pushed down and gives the wrong metrics}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
[ https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YuNing Liu updated SPARK-40564: --- Remaining Estimate: (was: 672h) Original Estimate: (was: 672h) > The distributed runtime has one more identical process with a small amount of > data on the master > > > Key: SPARK-40564 > URL: https://issues.apache.org/jira/browse/SPARK-40564 > Project: Spark > Issue Type: Question > Components: PySpark >Affects Versions: 3.3.0 > Environment: Hadoop 3.3.1 > 蟒蛇3.8 > 火花3.3.0 > pyspark 3.3.0 > ubuntu 20.04 >Reporter: YuNing Liu >Priority: Blocker > > When I ran my program with the Dataframe structure in Pyspark.PANDAS, there > was an extra exception in the master node. My dataframe contains three > columns named "ID", "Path", and "Category". It contains more than 300,000 > pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I > use "groupBy (" id ").apply(func)", my four nodes run normally, but there is > an abnormal process in the master, which contains 1001 pieces of data. This > process also executes the code in "func" and is divided into four parts, each > part contains more than 200 pieces of data. When I collect the results in > each node, I can only collect the results of 1001 data points, and the > results of 300,000 data points are lost. When I tried to reduce the number of > data to about 20,000, this problem still occurred and the data volume was > still 1001. I suspect there is a problem with the implementation of this > API.I tried setting the number of data partitions to 4, but the problem > didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master
YuNing Liu created SPARK-40564: -- Summary: The distributed runtime has one more identical process with a small amount of data on the master Key: SPARK-40564 URL: https://issues.apache.org/jira/browse/SPARK-40564 Project: Spark Issue Type: Question Components: PySpark Affects Versions: 3.3.0 Environment: Hadoop 3.3.1 蟒蛇3.8 火花3.3.0 pyspark 3.3.0 ubuntu 20.04 Reporter: YuNing Liu When I ran my program with the Dataframe structure in Pyspark.PANDAS, there was an extra exception in the master node. My dataframe contains three columns named "ID", "Path", and "Category". It contains more than 300,000 pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I use "groupBy (" id ").apply(func)", my four nodes run normally, but there is an abnormal process in the master, which contains 1001 pieces of data. This process also executes the code in "func" and is divided into four parts, each part contains more than 200 pieces of data. When I collect the results in each node, I can only collect the results of 1001 data points, and the results of 300,000 data points are lost. When I tried to reduce the number of data to about 20,000, this problem still occurred and the data volume was still 1001. I suspect there is a problem with the implementation of this API.I tried setting the number of data partitions to 4, but the problem didn't go away. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40560) Rename message to messageFormat in the STANDARD format of errors
[ https://issues.apache.org/jira/browse/SPARK-40560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40560. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37997 [https://github.com/apache/spark/pull/37997] > Rename message to messageFormat in the STANDARD format of errors > > > Key: SPARK-40560 > URL: https://issues.apache.org/jira/browse/SPARK-40560 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.4.0 > > > Rename the field in the JSON format `STANDARD` because it contains a format > actually. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: java-code-example.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: java-code-example.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: sql.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'default' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'default' end *= 'foo'; -> works as expected* *--* select case when (t_name = 'foo') then 'foo' else 'default' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'default' end *= 'default'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as instance of Predicate when sql case return 'default' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'defualt_name' end *= 'foo'; -> works as expected* *--* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'defualt_name' end *= 'defualt_name'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: sql.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'defualt_name' end *= 'foo'; -> works as expected* *--* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'defualt_name' end *= 'defualt_name'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'foo'; -> works as expected* *--* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'defualt_name' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'defualt_name' > end *= 'defualt_name'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' *end = 'foo'; -> works as expected* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' *end = 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end = 'else_will_throw_ex' In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > *end = 'foo'; -> works as expected* > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > *end = 'else_will_throw_ex'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex'
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'foo'; -> works as expected* *--* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'foo'; -> works as expected* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end *= 'else_will_throw_ex'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by:
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'foo'; -> works as expected* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end *= 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' *end = 'foo'; -> works as expected* select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' *end = 'else_will_throw_ex'; -> query throw ex;* In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end *= 'foo'; -> works as expected* > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end *= 'else_will_throw_ex'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > >
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: sql.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: sql.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: java-code-example.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end = 'else_will_throw_ex' In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_stmt_failed' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end = 'else_will_throw_ex' In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: java-code-example.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Description: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_stmt_failed' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end = 'else_will_throw_ex' In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error was: Hello! The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. - Spark verison 3.3.0 - Scala version 2.12 - DatasourceV2 - Postgres - Postrgres JDBC Driver: 42+ - Java8 Case: select case when (t_name = 'foo') then 'foo' else 'else_stmt_failed' end as case_when from t where case when (t_name = 'foo') then 'foo' else 'else_will_throw_ex' end = 'else_will_throw_ex' In where clause when we try find rows by else branch, spark thrown exception: The Spark SQL phase optimization failed with an internal error. Please, fill a bug report in, and provide the full stack trace. Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:208) org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) At debugger def unapply function in PushablePredicate.class when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as instance of Predicate when sql case return 'else_will_throw_ex' -> function unapply accept: COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed with error > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: test.sql) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: java-code-example-1.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, stack-trace.txt, test.sql > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: java-code-example.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, stack-trace.txt, test.sql > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: test.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: test.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: sql.txt > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: java-code-example-1.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, stack-trace.txt, test.sql > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vadim updated SPARK-40563: -- Attachment: (was: java-code-example.txt) > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.0 > > Attachments: java-code-example.txt, stack-trace.txt, test.sql > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'else_stmt_failed' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'else_will_throw_ex' > end = 'else_will_throw_ex' > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply function in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as > instance of Predicate > when sql case return 'else_will_throw_ex' -> function unapply accept: > COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and > assertation failed with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org