[jira] [Commented] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609810#comment-17609810
 ] 

Apache Spark commented on SPARK-40574:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/38011

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40574:


Assignee: Apache Spark

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40574:


Assignee: (was: Apache Spark)

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609809#comment-17609809
 ] 

Apache Spark commented on SPARK-40574:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/38011

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Priority: Minor  (was: Major)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Issue Type: Improvement  (was: Bug)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Minor
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-26 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-40574:
---

 Summary: Add PURGE to DROP TABLE doc
 Key: SPARK-40574
 URL: https://issues.apache.org/jira/browse/SPARK-40574
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26254) Move delegation token providers into a separate project

2022-09-26 Thread forrest lv (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609802#comment-17609802
 ] 

forrest lv commented on SPARK-26254:


nice job

> Move delegation token providers into a separate project
> ---
>
> Key: SPARK-26254
> URL: https://issues.apache.org/jira/browse/SPARK-26254
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> There was a discussion in 
> [PR#22598|https://github.com/apache/spark/pull/22598] that there are several 
> provided dependencies inside core project which shouldn't be there (for ex. 
> hive and kafka). This jira is to solve this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40573:


Assignee: Apache Spark

> Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary 
> integers
> --
>
> Key: SPARK-40573
> URL: https://issues.apache.org/jira/browse/SPARK-40573
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40573:


Assignee: (was: Apache Spark)

> Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary 
> integers
> --
>
> Key: SPARK-40573
> URL: https://issues.apache.org/jira/browse/SPARK-40573
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609791#comment-17609791
 ] 

Apache Spark commented on SPARK-40573:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38009

> Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary 
> integers
> --
>
> Key: SPARK-40573
> URL: https://issues.apache.org/jira/browse/SPARK-40573
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40573) Make `ddof` in `GroupBy.std`, `GroupBy.var` and `GroupBy.sem` accept arbitary integers

2022-09-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40573:
-

 Summary: Make `ddof` in `GroupBy.std`, `GroupBy.var` and 
`GroupBy.sem` accept arbitary integers
 Key: SPARK-40573
 URL: https://issues.apache.org/jira/browse/SPARK-40573
 Project: Spark
  Issue Type: Sub-task
  Components: ps
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609785#comment-17609785
 ] 

Qian Sun commented on SPARK-40572:
--

I think the root cause is that [executorId is string in 
TaskDataWrapper|https://github.com/apache/spark/blob/072575c9e6fc304f09e01ad0ee180c8f309ede91/core/src/main/scala/org/apache/spark/status/storeTypes.scala#L174-L175].
 Executor ID is string in apache spark and there are tons of changes that will 
be introduced into apache spark if modify the type.

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in Task Table of Stage Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Summary: Executor ID sorted as lexicographical order in Task Table of Stage 
Tab  (was: Executor ID sorted as lexicographical order in UI Stages Tab)

> Executor ID sorted as lexicographical order in Task Table of Stage Tab
> --
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40564:
-
Priority: Major  (was: Blocker)

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Major
> Attachments: Part of the code.png, The output of the abnormal 
> process.png, Value of df.png
>
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.The value of the dataframe, part of the code, and the output 
> of the exception process are attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609784#comment-17609784
 ] 

Hyukjin Kwon commented on SPARK-40563:
--

Please go ahead.

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40563:
-
Fix Version/s: (was: 3.3.0)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-40563:
-
Target Version/s:   (was: 3.3.0)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609780#comment-17609780
 ] 

Apache Spark commented on SPARK-40571:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38008

> Construct a test case to verify fault-tolerance semantic with random python 
> worker failures
> ---
>
> Key: SPARK-40571
> URL: https://issues.apache.org/jira/browse/SPARK-40571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We'd like to make sure fault-tolerance semantic is respected with random 
> failures on python worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40571:


Assignee: (was: Apache Spark)

> Construct a test case to verify fault-tolerance semantic with random python 
> worker failures
> ---
>
> Key: SPARK-40571
> URL: https://issues.apache.org/jira/browse/SPARK-40571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We'd like to make sure fault-tolerance semantic is respected with random 
> failures on python worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40571:


Assignee: Apache Spark

> Construct a test case to verify fault-tolerance semantic with random python 
> worker failures
> ---
>
> Key: SPARK-40571
> URL: https://issues.apache.org/jira/browse/SPARK-40571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Apache Spark
>Priority: Major
>
> We'd like to make sure fault-tolerance semantic is respected with random 
> failures on python worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609779#comment-17609779
 ] 

Apache Spark commented on SPARK-40571:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/38008

> Construct a test case to verify fault-tolerance semantic with random python 
> worker failures
> ---
>
> Key: SPARK-40571
> URL: https://issues.apache.org/jira/browse/SPARK-40571
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> We'd like to make sure fault-tolerance semantic is respected with random 
> failures on python worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40557) Re-generate Spark Connect Python protos

2022-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40557.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37993
[https://github.com/apache/spark/pull/37993]

> Re-generate Spark Connect Python protos
> ---
>
> Key: SPARK-40557
> URL: https://issues.apache.org/jira/browse/SPARK-40557
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.4.0
>
>
> The existing protos have a reference to Databricks specific go package names 
> that have been removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40557) Re-generate Spark Connect Python protos

2022-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40557:


Assignee: Martin Grund

> Re-generate Spark Connect Python protos
> ---
>
> Key: SPARK-40557
> URL: https://issues.apache.org/jira/browse/SPARK-40557
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>
> The existing protos have a reference to Databricks specific go package names 
> that have been removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40561) Implement `min_count` in GroupBy.min

2022-09-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40561.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37998
[https://github.com/apache/spark/pull/37998]

> Implement `min_count` in GroupBy.min
> 
>
> Key: SPARK-40561
> URL: https://issues.apache.org/jira/browse/SPARK-40561
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40561) Implement `min_count` in GroupBy.min

2022-09-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40561:
-

Assignee: Ruifeng Zheng

> Implement `min_count` in GroupBy.min
> 
>
> Key: SPARK-40561
> URL: https://issues.apache.org/jira/browse/SPARK-40561
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40566) Add showIndex function

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609771#comment-17609771
 ] 

Apache Spark commented on SPARK-40566:
--

User 'huleilei' has created a pull request for this issue:
https://github.com/apache/spark/pull/38007

> Add showIndex function 
> ---
>
> Key: SPARK-40566
> URL: https://issues.apache.org/jira/browse/SPARK-40566
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> I find there isn't a showIndex function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Attachment: Executor_ID_IN_STAGES_TAB.png

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
> Attachments: Executor_ID_IN_STAGES_TAB.png
>
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40566) Add showIndex function

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40566:


Assignee: (was: Apache Spark)

> Add showIndex function 
> ---
>
> Key: SPARK-40566
> URL: https://issues.apache.org/jira/browse/SPARK-40566
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> I find there isn't a showIndex function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40566) Add showIndex function

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40566:


Assignee: Apache Spark

> Add showIndex function 
> ---
>
> Key: SPARK-40566
> URL: https://issues.apache.org/jira/browse/SPARK-40566
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: KaiXinXIaoLei
>Assignee: Apache Spark
>Priority: Major
>
> I find there isn't a showIndex function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40566) Add showIndex function

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609770#comment-17609770
 ] 

Apache Spark commented on SPARK-40566:
--

User 'huleilei' has created a pull request for this issue:
https://github.com/apache/spark/pull/38007

> Add showIndex function 
> ---
>
> Key: SPARK-40566
> URL: https://issues.apache.org/jira/browse/SPARK-40566
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> I find there isn't a showIndex function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40572:


 Summary: Executor ID sorted as lexicographical order in UI Stages 
Tab
 Key: SPARK-40572
 URL: https://issues.apache.org/jira/browse/SPARK-40572
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.3.0
Reporter: Qian Sun


As figure shows, Executor ID sorted as lexicographical order in UI Stages Tab. 
Better sort as number order

!image-2022-09-27-09-26-46-755.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40572) Executor ID sorted as lexicographical order in UI Stages Tab

2022-09-26 Thread Qian Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Sun updated SPARK-40572:
-
Description: As figure shows, Executor ID sorted as lexicographical order 
in UI Stages Tab. Better sort as number order  (was: As figure shows, Executor 
ID sorted as lexicographical order in UI Stages Tab. Better sort as number order

!image-2022-09-27-09-26-46-755.png!)

> Executor ID sorted as lexicographical order in UI Stages Tab
> 
>
> Key: SPARK-40572
> URL: https://issues.apache.org/jira/browse/SPARK-40572
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.3.0
>Reporter: Qian Sun
>Priority: Major
>
> As figure shows, Executor ID sorted as lexicographical order in UI Stages 
> Tab. Better sort as number order



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40571) Construct a test case to verify fault-tolerance semantic with random python worker failures

2022-09-26 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-40571:


 Summary: Construct a test case to verify fault-tolerance semantic 
with random python worker failures
 Key: SPARK-40571
 URL: https://issues.apache.org/jira/browse/SPARK-40571
 Project: Spark
  Issue Type: Sub-task
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Jungtaek Lim


We'd like to make sure fault-tolerance semantic is respected with random 
failures on python worker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40570) Add doc for Docker Setup in standalone mode

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40570:


 Summary: Add doc for Docker Setup in standalone mode
 Key: SPARK-40570
 URL: https://issues.apache.org/jira/browse/SPARK-40570
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40569) Expose port for spark standalone mode

2022-09-26 Thread Qian Sun (Jira)
Qian Sun created SPARK-40569:


 Summary: Expose port for spark standalone mode
 Key: SPARK-40569
 URL: https://issues.apache.org/jira/browse/SPARK-40569
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Qian Sun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35242) Support change catalog default database for spark

2022-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35242:
---

Assignee: Gabor Roczei

> Support change catalog default database for spark
> -
>
> Key: SPARK-35242
> URL: https://issues.apache.org/jira/browse/SPARK-35242
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: hong dongdong
>Assignee: Gabor Roczei
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark catalog default database can only be 'default'. When we can not access 
> 'default', we will get Exception 'Permission denied:'. We should support 
> change default datbase for catalog like 'jdbc/thrift' does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35242) Support change catalog default database for spark

2022-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-35242.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37679
[https://github.com/apache/spark/pull/37679]

> Support change catalog default database for spark
> -
>
> Key: SPARK-35242
> URL: https://issues.apache.org/jira/browse/SPARK-35242
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: hong dongdong
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark catalog default database can only be 'default'. When we can not access 
> 'default', we will get Exception 'Permission denied:'. We should support 
> change default datbase for catalog like 'jdbc/thrift' does.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40536) Make Spark Connect port configurable.

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40536:


Assignee: Apache Spark

> Make Spark Connect port configurable.
> -
>
> Key: SPARK-40536
> URL: https://issues.apache.org/jira/browse/SPARK-40536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Minor
>
> Make Spark Connect port configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40536) Make Spark Connect port configurable.

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609718#comment-17609718
 ] 

Apache Spark commented on SPARK-40536:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38006

> Make Spark Connect port configurable.
> -
>
> Key: SPARK-40536
> URL: https://issues.apache.org/jira/browse/SPARK-40536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Minor
>
> Make Spark Connect port configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40536) Make Spark Connect port configurable.

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40536:


Assignee: (was: Apache Spark)

> Make Spark Connect port configurable.
> -
>
> Key: SPARK-40536
> URL: https://issues.apache.org/jira/browse/SPARK-40536
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Minor
>
> Make Spark Connect port configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609691#comment-17609691
 ] 

Apache Spark commented on SPARK-40550:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/38005

> DataSource V2: Handle DELETE commands for delta-based sources
> -
>
> Key: SPARK-40550
> URL: https://issues.apache.org/jira/browse/SPARK-40550
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We need to support DELETE operations for delta-based sources per approved 
> SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609690#comment-17609690
 ] 

Apache Spark commented on SPARK-40550:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/38005

> DataSource V2: Handle DELETE commands for delta-based sources
> -
>
> Key: SPARK-40550
> URL: https://issues.apache.org/jira/browse/SPARK-40550
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We need to support DELETE operations for delta-based sources per approved 
> SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40550:


Assignee: (was: Apache Spark)

> DataSource V2: Handle DELETE commands for delta-based sources
> -
>
> Key: SPARK-40550
> URL: https://issues.apache.org/jira/browse/SPARK-40550
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We need to support DELETE operations for delta-based sources per approved 
> SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40550) DataSource V2: Handle DELETE commands for delta-based sources

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40550:


Assignee: Apache Spark

> DataSource V2: Handle DELETE commands for delta-based sources
> -
>
> Key: SPARK-40550
> URL: https://issues.apache.org/jira/browse/SPARK-40550
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> We need to support DELETE operations for delta-based sources per approved 
> SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40551:


Assignee: Apache Spark

> DataSource V2: Add APIs for delta-based row-level operations
> 
>
> Key: SPARK-40551
> URL: https://issues.apache.org/jira/browse/SPARK-40551
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Assignee: Apache Spark
>Priority: Major
>
> Add DataSource V2 APIs for handling delta-based row-level operations per 
> approved SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609688#comment-17609688
 ] 

Apache Spark commented on SPARK-40551:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/38004

> DataSource V2: Add APIs for delta-based row-level operations
> 
>
> Key: SPARK-40551
> URL: https://issues.apache.org/jira/browse/SPARK-40551
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Add DataSource V2 APIs for handling delta-based row-level operations per 
> approved SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40551) DataSource V2: Add APIs for delta-based row-level operations

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40551:


Assignee: (was: Apache Spark)

> DataSource V2: Add APIs for delta-based row-level operations
> 
>
> Key: SPARK-40551
> URL: https://issues.apache.org/jira/browse/SPARK-40551
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> Add DataSource V2 APIs for handling delta-based row-level operations per 
> approved SPIP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes

2022-09-26 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609592#comment-17609592
 ] 

Max Gekk commented on SPARK-40358:
--

[~lvshaokang] Sure, go ahead.

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40358) Migrate collection type check failures onto error classes

2022-09-26 Thread Shaokang Lv (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609584#comment-17609584
 ] 

Shaokang Lv commented on SPARK-40358:
-

Hi, [~maxgekk] , I will pick up this if possible.

> Migrate collection type check failures onto error classes
> -
>
> Key: SPARK-40358
> URL: https://issues.apache.org/jira/browse/SPARK-40358
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in collection 
> expressions:
> 1. BinaryArrayExpressionWithImplicitCast (1): 
> [https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L69]
> 2. MapContainsKey (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L231-L237
> 3. MapConcat (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L663
> 4. MapFromEntries (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L801



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40357) Migrate window type check failures onto error classes

2022-09-26 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40357:


Assignee: Shaokang Lv

> Migrate window type check failures onto error classes
> -
>
> Key: SPARK-40357
> URL: https://issues.apache.org/jira/browse/SPARK-40357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Shaokang Lv
>Priority: Major
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in window 
> expressions:
> 1. WindowSpecDefinition (4): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L68-L85
> 2. SpecifiedWindowFrame (3): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L216-L231
> 3. checkBoundary (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L264-L269
> 4. FrameLessOffsetWindowFunction (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L424



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40357) Migrate window type check failures onto error classes

2022-09-26 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40357.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37986
[https://github.com/apache/spark/pull/37986]

> Migrate window type check failures onto error classes
> -
>
> Key: SPARK-40357
> URL: https://issues.apache.org/jira/browse/SPARK-40357
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Shaokang Lv
>Priority: Major
> Fix For: 3.4.0
>
>
> Replace TypeCheckFailure by DataTypeMismatch in type checks in window 
> expressions:
> 1. WindowSpecDefinition (4): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L68-L85
> 2. SpecifiedWindowFrame (3): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L216-L231
> 3. checkBoundary (2): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L264-L269
> 4. FrameLessOffsetWindowFunction (1): 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L424



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609536#comment-17609536
 ] 

ming95 commented on SPARK-40563:


I can reproduce this problem. I can try to fix this issue if no one else is 
working on it . :)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40568) Spark Streaming support Debezium

2022-09-26 Thread melin (Jira)
melin created SPARK-40568:
-

 Summary: Spark Streaming support Debezium
 Key: SPARK-40568
 URL: https://issues.apache.org/jira/browse/SPARK-40568
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: melin


Debezuim is a very popular CDC technology. Spark Structured Streaming supports 
Debezuim, which facilitates data writing to data lakes。

The most commonly used scheme is FLink CDC,Hope Spark can support it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40552:
-
Priority: Minor  (was: Major)

> Upgrade protobuf-python from 4.21.5 to 4.21.6
> -
>
> Key: SPARK-40552
> URL: https://issues.apache.org/jira/browse/SPARK-40552
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941]
> [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40552.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37991
[https://github.com/apache/spark/pull/37991]

> Upgrade protobuf-python from 4.21.5 to 4.21.6
> -
>
> Key: SPARK-40552
> URL: https://issues.apache.org/jira/browse/SPARK-40552
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
>
> [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941]
> [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40552:
-
Component/s: Build

> Upgrade protobuf-python from 4.21.5 to 4.21.6
> -
>
> Key: SPARK-40552
> URL: https://issues.apache.org/jira/browse/SPARK-40552
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build, Connect
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Minor
> Fix For: 3.4.0
>
>
> [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941]
> [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40552) Upgrade protobuf-python from 4.21.5 to 4.21.6

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40552:


Assignee: Bjørn Jørgensen

> Upgrade protobuf-python from 4.21.5 to 4.21.6
> -
>
> Key: SPARK-40552
> URL: https://issues.apache.org/jira/browse/SPARK-40552
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>
> [CVE-2022-1941|https://nvd.nist.gov/vuln/detail/CVE-2022-1941]
> [Github|https://github.com/advisories/GHSA-8gq9-2x98-w8hf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40478) Add create datasource table options docs

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-40478.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37919
[https://github.com/apache/spark/pull/37919]

> Add create datasource table options docs
> 
>
> Key: SPARK-40478
> URL: https://issues.apache.org/jira/browse/SPARK-40478
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40478) Add create datasource table options docs

2022-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-40478:


Assignee: XiDuo You

> Add create datasource table options docs
> 
>
> Key: SPARK-40478
> URL: https://issues.apache.org/jira/browse/SPARK-40478
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2022-09-26 Thread John Pellman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507
 ] 

John Pellman edited comment on SPARK-12216 at 9/26/22 1:42 PM:
---

Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

The problem in this case seems to be that {{spark-shell}} is attempting to do a 
recursive unlink while files are still open (NFS client-side [silly 
renames|http://nfs.sourceforge.net/#faq_d2]).  It looks like this overall issue 
might be less of a "weird Windows thing" and more of an issue with spark-shell 
not waiting until all file handles are closed before attempting to remove the 
temp dir.  This behavior cannot be reproduced consistently and appears to be 
non-deterministic.

The obvious workaround here is to not put temp directories on NFS, but it does 
seem like you're relying upon file handling behavior that is specific to how 
Linux behaves using non-NFS volumes rather than doing a sanity check within 
spark-shell/scala(which might not be a bad idea).


was (Author: jpellman):
Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2022-09-26 Thread John Pellman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507
 ] 

John Pellman edited comment on SPARK-12216 at 9/26/22 1:40 PM:
---

Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

The problem in this case seems to be that {{spark-shell}} is attempting to do a 
recursive unlink while files are still open (NFS client-side [silly 
renames|http://nfs.sourceforge.net/#faq_d2]).  It looks like this overall issue 
might be less of a "weird Windows thing" and more of an issue with spark-shell 
not waiting until all file handles are closed before attempting to remove the 
temp dir.  This behavior cannot be reproduced consistently and appears to be 
non-deterministic.

The obvious workaround here is to not put temp directories on NFS, but it does 
seem like you're relying upon file handling behavior that is specific to Linux 
rather than doing a sanity check within spark-shell/scala(which might not be a 
bad idea).


was (Author: jpellman):
Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2022-09-26 Thread John Pellman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507
 ] 

John Pellman commented on SPARK-12216:
--

Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

The problem in this case seems to be that {{spark-shell}} is attempting to do a 
recursive unlink while files are still open (NFS client-side [silly 
renames|http://nfs.sourceforge.net/#faq_d2]).  It looks like this overall issue 
might be less of a "weird Windows thing" and more of an issue with spark-shell 
not waiting until all file handles are closed before attempting to remove the 
temp dir.  This behavior cannot be reproduced consistently and appears to be 
non-deterministic.

The obvious workaround here is to not put temp directories on NFS, but it does 
seem like you're relying upon Linux to block the recursive unlink until all 
file handles are closed rather than doing a sanity check within 
spark-shell/scala(which might not be a bad idea).

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR 

[jira] [Created] (SPARK-40567) SharedState to redact secrets when propagating them to HadoopConf

2022-09-26 Thread Steve Loughran (Jira)
Steve Loughran created SPARK-40567:
--

 Summary: SharedState to redact secrets when propagating them to 
HadoopConf
 Key: SPARK-40567
 URL: https://issues.apache.org/jira/browse/SPARK-40567
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Steve Loughran



When SharedState propagates (key, value) pairs from initialConfigs to 
HadoopConf, it logs the values at debug.

If the config contained secrets (cloud credentials, etc) the log will contain 
them.

The org.apache.hadoop.conf.ConfigRedactor class will redact values of all keys 
matching a patten in "hadoop.security.sensitive-config-keys"; this is 
configured by default to be


{code}
  "secret$",
  "password$",
  "ssl.keystore.pass$",
  "fs.s3.*[Ss]ecret.?[Kk]ey",
  "fs.s3a.*.server-side-encryption.key",
  "fs.s3a.encryption.algorithm",
  "fs.s3a.encryption.key",
  "fs.azure\\.account.key.*",
  "credential$",
  "oauth.*secret",
  "oauth.*password",
  "oauth.*token",
"hadoop.security.sensitive-config-keys"
{code}

...And it may be extended in site configs/future hadoop releases

Spark should be using the redactor for log hygiene/security





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40566) Add showIndex function

2022-09-26 Thread KaiXinXIaoLei (Jira)
KaiXinXIaoLei created SPARK-40566:
-

 Summary: Add showIndex function 
 Key: SPARK-40566
 URL: https://issues.apache.org/jira/browse/SPARK-40566
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: KaiXinXIaoLei


I find there isn't a showIndex function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40565:


Assignee: Apache Spark

> Non-deterministic filters shouldn't get pushed to V2 file sources
> -
>
> Key: SPARK-40565
> URL: https://issues.apache.org/jira/browse/SPARK-40565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Assignee: Apache Spark
>Priority: Major
>
> Currently non-deterministic filters can be pushed down to V2 file sources, 
> which is different from V1 which prevents out non-deterministic filters from 
> being pushed.
> Main consequences:
>  * Things like doing a rand filter on a partition column will throw an 
> exception:
>  ** {{IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.}}
>  * {{Using a non-deterministic UDF to collect metrics via accumulators gets 
> pushed down and gives the wrong metrics}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources

2022-09-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609489#comment-17609489
 ] 

Apache Spark commented on SPARK-40565:
--

User 'Kimahriman' has created a pull request for this issue:
https://github.com/apache/spark/pull/38003

> Non-deterministic filters shouldn't get pushed to V2 file sources
> -
>
> Key: SPARK-40565
> URL: https://issues.apache.org/jira/browse/SPARK-40565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> Currently non-deterministic filters can be pushed down to V2 file sources, 
> which is different from V1 which prevents out non-deterministic filters from 
> being pushed.
> Main consequences:
>  * Things like doing a rand filter on a partition column will throw an 
> exception:
>  ** {{IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.}}
>  * {{Using a non-deterministic UDF to collect metrics via accumulators gets 
> pushed down and gives the wrong metrics}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources

2022-09-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40565:


Assignee: (was: Apache Spark)

> Non-deterministic filters shouldn't get pushed to V2 file sources
> -
>
> Key: SPARK-40565
> URL: https://issues.apache.org/jira/browse/SPARK-40565
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> Currently non-deterministic filters can be pushed down to V2 file sources, 
> which is different from V1 which prevents out non-deterministic filters from 
> being pushed.
> Main consequences:
>  * Things like doing a rand filter on a partition column will throw an 
> exception:
>  ** {{IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.}}
>  * {{Using a non-deterministic UDF to collect metrics via accumulators gets 
> pushed down and gives the wrong metrics}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Description: When I ran my program with the Dataframe structure in 
Pyspark.PANDAS, there is an abnormal extra process on the master. My dataframe 
contains three columns named "id", "path", and "category". It contains more 
than 300,000 pieces of data in total, and the "id" values are only 1, 2, 3, and 
4. When I use "groupBy (" id ").apply(func)", my four nodes run normally, but 
there is an abnormal process in the master, which contains 1001 pieces of data. 
This process also executes the code in "func" and is divided into four parts, 
each part contains more than 200 pieces of data. When I collect the results in 
each node, I can only collect the results of 1001 data points, and the results 
of 300,000 data points are lost. When I tried to reduce the number of data to 
about 20,000, this problem still occurred and the data volume was still 1001. I 
suspect there is a problem with the implementation of this API.I tried setting 
the number of data partitions to 4, but the problem didn't go away.The value of 
the dataframe, part of the code, and the output of the exception process are 
attached  (was: When I ran my program with the Dataframe structure in 
Pyspark.PANDAS, there is an abnormal extra process on the master. My dataframe 
contains three columns named "id", "path", and "category". It contains more 
than 300,000 pieces of data in total, and the "id" values are only 1, 2, 3, and 
4. When I use "groupBy (" id ").apply(func)", my four nodes run normally, but 
there is an abnormal process in the master, which contains 1001 pieces of data. 
This process also executes the code in "func" and is divided into four parts, 
each part contains more than 200 pieces of data. When I collect the results in 
each node, I can only collect the results of 1001 data points, and the results 
of 300,000 data points are lost. When I tried to reduce the number of data to 
about 20,000, this problem still occurred and the data volume was still 1001. I 
suspect there is a problem with the implementation of this API.I tried setting 
the number of data partitions to 4, but the problem didn't go away.)

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
> Attachments: Part of the code.png, The output of the abnormal 
> process.png, Value of df.png
>
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.The value of the dataframe, part of the code, and the output 
> of the exception process are attached



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Attachment: Part of the code.png

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
> Attachments: Part of the code.png, The output of the abnormal 
> process.png, Value of df.png
>
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Attachment: Value of df.png

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
> Attachments: Part of the code.png, The output of the abnormal 
> process.png, Value of df.png
>
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Attachment: The output of the abnormal process.png

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
> Attachments: Part of the code.png, The output of the abnormal 
> process.png, Value of df.png
>
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Attachment: (was: 2022-09-26 20-54-37 的屏幕截图.png)

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> is an abnormal extra process on the master. My dataframe contains three 
> columns named "id", "path", and "category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40565) Non-deterministic filters shouldn't get pushed to V2 file sources

2022-09-26 Thread Adam Binford (Jira)
Adam Binford created SPARK-40565:


 Summary: Non-deterministic filters shouldn't get pushed to V2 file 
sources
 Key: SPARK-40565
 URL: https://issues.apache.org/jira/browse/SPARK-40565
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Adam Binford


Currently non-deterministic filters can be pushed down to V2 file sources, 
which is different from V1 which prevents out non-deterministic filters from 
being pushed.

Main consequences:
 * Things like doing a rand filter on a partition column will throw an 
exception:
 ** {{IllegalArgumentException: requirement failed: Nondeterministic expression 
org.apache.spark.sql.catalyst.expressions.Rand should be initialized before 
eval.}}
 * {{Using a non-deterministic UDF to collect metrics via accumulators gets 
pushed down and gives the wrong metrics}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YuNing Liu updated SPARK-40564:
---
Remaining Estimate: (was: 672h)
 Original Estimate: (was: 672h)

> The distributed runtime has one more identical process with a small amount of 
> data on the master
> 
>
> Key: SPARK-40564
> URL: https://issues.apache.org/jira/browse/SPARK-40564
> Project: Spark
>  Issue Type: Question
>  Components: PySpark
>Affects Versions: 3.3.0
> Environment: Hadoop 3.3.1
> 蟒蛇3.8
> 火花3.3.0
> pyspark 3.3.0
> ubuntu 20.04
>Reporter: YuNing Liu
>Priority: Blocker
>
> When I ran my program with the Dataframe structure in Pyspark.PANDAS, there 
> was an extra exception in the master node. My dataframe contains three 
> columns named "ID", "Path", and "Category". It contains more than 300,000 
> pieces of data in total, and the "id" values are only 1, 2, 3, and 4. When I 
> use "groupBy (" id ").apply(func)", my four nodes run normally, but there is 
> an abnormal process in the master, which contains 1001 pieces of data. This 
> process also executes the code in "func" and is divided into four parts, each 
> part contains more than 200 pieces of data. When I collect the results in 
> each node, I can only collect the results of 1001 data points, and the 
> results of 300,000 data points are lost. When I tried to reduce the number of 
> data to about 20,000, this problem still occurred and the data volume was 
> still 1001. I suspect there is a problem with the implementation of this 
> API.I tried setting the number of data partitions to 4, but the problem 
> didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40564) The distributed runtime has one more identical process with a small amount of data on the master

2022-09-26 Thread YuNing Liu (Jira)
YuNing Liu created SPARK-40564:
--

 Summary: The distributed runtime has one more identical process 
with a small amount of data on the master
 Key: SPARK-40564
 URL: https://issues.apache.org/jira/browse/SPARK-40564
 Project: Spark
  Issue Type: Question
  Components: PySpark
Affects Versions: 3.3.0
 Environment: Hadoop 3.3.1

蟒蛇3.8

火花3.3.0

pyspark 3.3.0

ubuntu 20.04
Reporter: YuNing Liu


When I ran my program with the Dataframe structure in Pyspark.PANDAS, there was 
an extra exception in the master node. My dataframe contains three columns 
named "ID", "Path", and "Category". It contains more than 300,000 pieces of 
data in total, and the "id" values are only 1, 2, 3, and 4. When I use "groupBy 
(" id ").apply(func)", my four nodes run normally, but there is an abnormal 
process in the master, which contains 1001 pieces of data. This process also 
executes the code in "func" and is divided into four parts, each part contains 
more than 200 pieces of data. When I collect the results in each node, I can 
only collect the results of 1001 data points, and the results of 300,000 data 
points are lost. When I tried to reduce the number of data to about 20,000, 
this problem still occurred and the data volume was still 1001. I suspect there 
is a problem with the implementation of this API.I tried setting the number of 
data partitions to 4, but the problem didn't go away.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40560) Rename message to messageFormat in the STANDARD format of errors

2022-09-26 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40560.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37997
[https://github.com/apache/spark/pull/37997]

> Rename message to messageFormat in the STANDARD format of errors
> 
>
> Key: SPARK-40560
> URL: https://issues.apache.org/jira/browse/SPARK-40560
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Rename the field in the JSON format `STANDARD` because it contains a format 
> actually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: java-code-example.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: java-code-example.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: sql.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'default'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'default'
    end *= 'foo';  -> works as expected*

*--*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'default'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'default'
    end *= 'default'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
instance of Predicate
when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'defualt_name'
    end *= 'foo';  -> works as expected*

*--*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'defualt_name'
    end *= 'defualt_name'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at 

[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: sql.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'default'
>     end *= 'default'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as 
> instance of Predicate
> when sql case return 'default' -> function unapply accept: COALESCE(t_name = 
> 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed 
> with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'defualt_name'
    end *= 'foo';  -> works as expected*

*--*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'defualt_name'
    end *= 'defualt_name'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'foo';  -> works as expected*

*--*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'defualt_name'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'defualt_name'
>     end *= 'defualt_name'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack 

[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    *end = 'foo';  -> works as expected*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    *end = 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end = 'else_will_throw_ex'

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     *end = 'foo';  -> works as expected*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     *end = 'else_will_throw_ex'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' 

[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'foo';  -> works as expected*

*--*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'foo';  -> works as expected*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end *= 'foo';  -> works as expected*
> *--*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end *= 'else_will_throw_ex'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: 

[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'foo';  -> works as expected*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end *= 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    *end = 'foo';  -> works as expected*

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    *end = 'else_will_throw_ex'; -> query throw ex;*

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end *= 'foo';  -> works as expected*
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end *= 'else_will_throw_ex'; -> query throw ex;*
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> 

[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: sql.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: sql.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: java-code-example.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end = 'else_will_throw_ex'

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_stmt_failed'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end = 'else_will_throw_ex'

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: java-code-example.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Description: 
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_stmt_failed'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end = 'else_will_throw_ex'

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 

  was:
Hello!

The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.
 - Spark verison 3.3.0
 - Scala version 2.12
 - DatasourceV2
 - Postgres
 - Postrgres JDBC Driver: 42+
 - Java8

Case:

select
    case
        when (t_name = 'foo') then 'foo'
        else 'else_stmt_failed'
    end as case_when
from
    t
where
    case
        when (t_name = 'foo') then 'foo'
        else 'else_will_throw_ex'
    end = 'else_will_throw_ex'

In where clause when we try find rows by else branch, spark thrown exception:
The Spark SQL phase optimization failed with an internal error. Please, fill a 
bug report in, and provide the full stack trace.

Caused by: java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:208)

 
org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)

At debugger def unapply function in PushablePredicate.class
when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
instance of Predicate
when sql case return 'else_will_throw_ex' -> function unapply accept: 
COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
assertation failed with error

 


> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: test.sql)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: java-code-example-1.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, stack-trace.txt, test.sql
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: java-code-example.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, stack-trace.txt, test.sql
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: test.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: test.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: sql.txt

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, sql.txt, stack-trace.txt
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: java-code-example-1.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, stack-trace.txt, test.sql
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40563) Error at where clause, when sql case executes by else branch

2022-09-26 Thread Vadim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim updated SPARK-40563:
--
Attachment: (was: java-code-example.txt)

> Error at where clause, when sql case executes by else branch
> 
>
> Key: SPARK-40563
> URL: https://issues.apache.org/jira/browse/SPARK-40563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vadim
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: java-code-example.txt, stack-trace.txt, test.sql
>
>
> Hello!
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
>  - Spark verison 3.3.0
>  - Scala version 2.12
>  - DatasourceV2
>  - Postgres
>  - Postrgres JDBC Driver: 42+
>  - Java8
> Case:
> select
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_stmt_failed'
>     end as case_when
> from
>     t
> where
>     case
>         when (t_name = 'foo') then 'foo'
>         else 'else_will_throw_ex'
>     end = 'else_will_throw_ex'
> In where clause when we try find rows by else branch, spark thrown exception:
> The Spark SQL phase optimization failed with an internal error. Please, fill 
> a bug report in, and provide the full stack trace.
> Caused by: java.lang.AssertionError: assertion failed
>     at scala.Predef$.assert(Predef.scala:208)
>  
> org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589)
> At debugger def unapply function in PushablePredicate.class
> when sql case return 'foo' -> function unapply accept: (t_name#1 = foo), as 
> instance of Predicate
> when sql case return 'else_will_throw_ex' -> function unapply accept: 
> COALESCE(t_name = 'foo', FALSE) as instance of GeneralScalarExpression and 
> assertation failed with error
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >