[jira] [Created] (SPARK-39175) Provide runtime error query context for Cast when WSCG is off

2022-05-12 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-39175:
--

 Summary: Provide runtime error query context for Cast when WSCG is 
off
 Key: SPARK-39175
 URL: https://issues.apache.org/jira/browse/SPARK-39175
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39174) Catalogs loading swallows missing classname for ClassNotFoundException

2022-05-12 Thread Kent Yao (Jira)
Kent Yao created SPARK-39174:


 Summary: Catalogs loading swallows missing classname for 
ClassNotFoundException
 Key: SPARK-39174
 URL: https://issues.apache.org/jira/browse/SPARK-39174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.1, 3.1.2, 3.3.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-39166.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 36525
[https://github.com/apache/spark/pull/36525]

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, for most of the cases, the project 
> https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the 
> runtime errors happen within the original query.
> However, after trying on production, I found that the following queries won't 
> show where the divide by 0 error happens
> {code:java}
> create table aggTest(i int, j int, k int, d date) using parquet
> insert into aggTest values(1, 2, 0, date'2022-01-01')
> select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}
> With `percentile` function in the query, the plan can't execute with whole 
> stage codegen. Thus the child plan of `Project` is serialized to executors 
> for execution, from ProjectExec:
> {code:java}
>   protected override def doExecute(): RDD[InternalRow] = {
>     child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
>       val project = UnsafeProjection.create(projectList, child.output)
>       project.initialize(index)
>       iter.map(project)
>     }
>   }{code}
> Note that the `TreeNode.origin` is not serialized to executors since 
> `TreeNode` doesn't extend the trait `Serializable`, which results in an empty 
> query context on errors. For more details, please read 
> https://issues.apache.org/jira/browse/SPARK-39140
> A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, 
> it can be performance regression if the query text is long (every `TreeNode` 
> carries it for serialization). 
> A better fix is to introduce a new trait `SupportQueryContext` and 
> materialize the truncated query context for special expressions. This jira 
> targets on binary arithmetic expressions only. I will create follow-ups for 
> the remaining expressions which support runtime error query context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39172) Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39172:


Assignee: (was: Apache Spark)

> Remove outer join if all output come from streamed side and buffered side 
> keys exist unique key
> ---
>
> Key: SPARK-39172
> URL: https://issues.apache.org/jira/browse/SPARK-39172
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Improve the optimzation case using the distinct keys framework.
> For example:
> {code:java}
> SELECT t1.* FROM t1 LEFT JOIN (SELECT distinct c1 as c1 FROM t)t2 ON t1.c1 = 
> t2.c1
> ==>
> SELECT t1.* FROM t1 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39172) Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536428#comment-17536428
 ] 

Apache Spark commented on SPARK-39172:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/36530

> Remove outer join if all output come from streamed side and buffered side 
> keys exist unique key
> ---
>
> Key: SPARK-39172
> URL: https://issues.apache.org/jira/browse/SPARK-39172
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Improve the optimzation case using the distinct keys framework.
> For example:
> {code:java}
> SELECT t1.* FROM t1 LEFT JOIN (SELECT distinct c1 as c1 FROM t)t2 ON t1.c1 = 
> t2.c1
> ==>
> SELECT t1.* FROM t1 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39172) Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39172:


Assignee: Apache Spark

> Remove outer join if all output come from streamed side and buffered side 
> keys exist unique key
> ---
>
> Key: SPARK-39172
> URL: https://issues.apache.org/jira/browse/SPARK-39172
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> Improve the optimzation case using the distinct keys framework.
> For example:
> {code:java}
> SELECT t1.* FROM t1 LEFT JOIN (SELECT distinct c1 as c1 FROM t)t2 ON t1.c1 = 
> t2.c1
> ==>
> SELECT t1.* FROM t1 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39173) The error message is different if disable broadcast join

2022-05-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-39173:

Description: 
How to reproduce this issue:
{code:scala}
Seq(-1, 10L).foreach { broadcastThreshold =>
  withSQLConf(
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
SQLConf.ANSI_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT
|  item.i_brand_id brand_id,
|  avg(ss_ext_sales_price) avg_agg
|FROM store_sales, item
|WHERE store_sales.ss_item_sk = item.i_item_sk
|GROUP BY item.i_brand_id
  """.stripMargin)
val error = intercept[SparkException] {
  df.collect()
}
println("Error message: " + error.getMessage)
  }
}
{code}

{noformat}
Error message: org.apache.spark.SparkArithmeticException: 
[CANNOT_CHANGE_DECIMAL_PRECISION] 
Decimal(expanded,9.28175,38,5}) cannot be 
represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
false to bypass this error.
Error message: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] 
Overflow in sum of decimals. If necessary set spark.sql.ansi.enabled to false 
(except for ANSI interval type) to bypass this error.
{noformat}


  was:
How to reproduce this issue:
{code:scala}
Seq(-1, 10L).foreach { broadcastThreshold =>
  withSQLConf(
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
SQLConf.ANSI_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT
|  item.i_brand_id brand_id,
|  avg(ss_ext_sales_price) avg_agg
|FROM store_sales, item
|WHERE store_sales.ss_item_sk = item.i_item_sk
|GROUP BY item.i_brand_id
  """.stripMargin)
df.collect()
  }
}
{code}

{noformat}
Error message: org.apache.spark.SparkArithmeticException: 
[CANNOT_CHANGE_DECIMAL_PRECISION] 
Decimal(expanded,9.28175,38,5}) cannot be 
represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
false to bypass this error.
Error message: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] 
Overflow in sum of decimals. If necessary set spark.sql.ansi.enabled to false 
(except for ANSI interval type) to bypass this error.
{noformat}



> The error message is different if disable broadcast join
> 
>
> Key: SPARK-39173
> URL: https://issues.apache.org/jira/browse/SPARK-39173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> Seq(-1, 10L).foreach { broadcastThreshold =>
>   withSQLConf(
> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
> SQLConf.ANSI_ENABLED.key -> "true") {
> val df = sql(
>   """
> |SELECT
> |  item.i_brand_id brand_id,
> |  avg(ss_ext_sales_price) avg_agg
> |FROM store_sales, item
> |WHERE store_sales.ss_item_sk = item.i_item_sk
> |GROUP BY item.i_brand_id
>   """.stripMargin)
> val error = intercept[SparkException] {
>   df.collect()
> }
> println("Error message: " + error.getMessage)
>   }
> }
> {code}
> {noformat}
> Error message: org.apache.spark.SparkArithmeticException: 
> [CANNOT_CHANGE_DECIMAL_PRECISION] 
> Decimal(expanded,9.28175,38,5}) cannot be 
> represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
> false to bypass this error.
> Error message: org.apache.spark.SparkArithmeticException: 
> [ARITHMETIC_OVERFLOW] Overflow in sum of decimals. If necessary set 
> spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass 
> this error.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39173) The error message is different if disable broadcast join

2022-05-12 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-39173:

Description: 
How to reproduce this issue:
{code:scala}
Seq(-1, 10L).foreach { broadcastThreshold =>
  withSQLConf(
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
SQLConf.ANSI_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT
|  item.i_brand_id brand_id,
|  avg(ss_ext_sales_price) avg_agg
|FROM store_sales, item
|WHERE store_sales.ss_item_sk = item.i_item_sk
|GROUP BY item.i_brand_id
  """.stripMargin)
df.collect()
  }
}
{code}

{noformat}
Error message: org.apache.spark.SparkArithmeticException: 
[CANNOT_CHANGE_DECIMAL_PRECISION] 
Decimal(expanded,9.28175,38,5}) cannot be 
represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
false to bypass this error.
Error message: org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] 
Overflow in sum of decimals. If necessary set spark.sql.ansi.enabled to false 
(except for ANSI interval type) to bypass this error.
{noformat}


  was:
How to reproduce this issue:
{code:scala}
Seq(-1, 10L).foreach { broadcastThreshold =>
  withSQLConf(
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
SQLConf.ANSI_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT
|  item.i_brand_id brand_id,
|  avg(ss_ext_sales_price) avg_agg
|FROM store_sales, item
|WHERE store_sales.ss_item_sk = item.i_item_sk
|GROUP BY item.i_brand_id
  """.stripMargin)
df.collect()
  }
}
{code}

{noformat}
Error message: Job aborted due to stage failure: Task 0 in stage 10.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 10.0 (TID 9) (localhost 
executor driver): org.apache.spark.SparkArithmeticException: 
[CANNOT_CHANGE_DECIMAL_PRECISION] 
Decimal(expanded,9.28175,38,5}) cannot be 
represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
false to bypass this error.
Error message: Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 14.0 (TID 14) (localhost 
executor driver): org.apache.spark.SparkArithmeticException: 
[ARITHMETIC_OVERFLOW] Overflow in sum of decimals. If necessary set 
spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass this 
error.
{noformat}



> The error message is different if disable broadcast join
> 
>
> Key: SPARK-39173
> URL: https://issues.apache.org/jira/browse/SPARK-39173
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:scala}
> Seq(-1, 10L).foreach { broadcastThreshold =>
>   withSQLConf(
> SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
> SQLConf.ANSI_ENABLED.key -> "true") {
> val df = sql(
>   """
> |SELECT
> |  item.i_brand_id brand_id,
> |  avg(ss_ext_sales_price) avg_agg
> |FROM store_sales, item
> |WHERE store_sales.ss_item_sk = item.i_item_sk
> |GROUP BY item.i_brand_id
>   """.stripMargin)
> df.collect()
>   }
> }
> {code}
> {noformat}
> Error message: org.apache.spark.SparkArithmeticException: 
> [CANNOT_CHANGE_DECIMAL_PRECISION] 
> Decimal(expanded,9.28175,38,5}) cannot be 
> represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
> false to bypass this error.
> Error message: org.apache.spark.SparkArithmeticException: 
> [ARITHMETIC_OVERFLOW] Overflow in sum of decimals. If necessary set 
> spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass 
> this error.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39173) The error message is different if disable broadcast join

2022-05-12 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-39173:
---

 Summary: The error message is different if disable broadcast join
 Key: SPARK-39173
 URL: https://issues.apache.org/jira/browse/SPARK-39173
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang


How to reproduce this issue:
{code:scala}
Seq(-1, 10L).foreach { broadcastThreshold =>
  withSQLConf(
SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> broadcastThreshold.toString,
SQLConf.ANSI_ENABLED.key -> "true") {
val df = sql(
  """
|SELECT
|  item.i_brand_id brand_id,
|  avg(ss_ext_sales_price) avg_agg
|FROM store_sales, item
|WHERE store_sales.ss_item_sk = item.i_item_sk
|GROUP BY item.i_brand_id
  """.stripMargin)
df.collect()
  }
}
{code}

{noformat}
Error message: Job aborted due to stage failure: Task 0 in stage 10.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 10.0 (TID 9) (localhost 
executor driver): org.apache.spark.SparkArithmeticException: 
[CANNOT_CHANGE_DECIMAL_PRECISION] 
Decimal(expanded,9.28175,38,5}) cannot be 
represented as Decimal(38, 6). If necessary set "spark.sql.ansi.enabled" to 
false to bypass this error.
Error message: Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 
times, most recent failure: Lost task 0.0 in stage 14.0 (TID 14) (localhost 
executor driver): org.apache.spark.SparkArithmeticException: 
[ARITHMETIC_OVERFLOW] Overflow in sum of decimals. If necessary set 
spark.sql.ansi.enabled to false (except for ANSI interval type) to bypass this 
error.
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28516:
---

Assignee: Daniel

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Assignee: Daniel
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28516) Data Type Formatting Functions: `to_char`

2022-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28516.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36365
[https://github.com/apache/spark/pull/36365]

> Data Type Formatting Functions: `to_char`
> -
>
> Key: SPARK-28516
> URL: https://issues.apache.org/jira/browse/SPARK-28516
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, Spark does not have support for `to_char`. PgSQL, however, 
> [does|[https://www.postgresql.org/docs/12/functions-formatting.html]]:
> Query example: 
> {code:sql}
> SELECT to_char(SUM(n) OVER (ORDER BY i ROWS BETWEEN CURRENT ROW AND 1 
> FOLLOWING),'9D9')
> {code}
> ||Function||Return Type||Description||Example||
> |{{to_char(}}{{timestamp}}{{, }}{{text}}{{)}}|{{text}}|convert time stamp to 
> string|{{to_char(current_timestamp, 'HH12:MI:SS')}}|
> |{{to_char(}}{{interval}}{{, }}{{text}}{{)}}|{{text}}|convert interval to 
> string|{{to_char(interval '15h 2m 12s', 'HH24:MI:SS')}}|
> |{{to_char(}}{{int}}{{, }}{{text}}{{)}}|{{text}}|convert integer to 
> string|{{to_char(125, '999')}}|
> |{{to_char}}{{(}}{{double precision}}{{, }}{{text}}{{)}}|{{text}}|convert 
> real/double precision to string|{{to_char(125.8::real, '999D9')}}|
> |{{to_char(}}{{numeric}}{{, }}{{text}}{{)}}|{{text}}|convert numeric to 
> string|{{to_char(-125.8, '999D99S')}}|



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39172) Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-12 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-39172:
--
Summary: Remove outer join if all output come from streamed side and 
buffered side keys exist unique key  (was: Remove outer join if all output come 
from streamed side and buffered side keys exist unique)

> Remove outer join if all output come from streamed side and buffered side 
> keys exist unique key
> ---
>
> Key: SPARK-39172
> URL: https://issues.apache.org/jira/browse/SPARK-39172
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> Improve the optimzation case using the distinct keys framework.
> For example:
> {code:java}
> SELECT t1.* FROM t1 LEFT JOIN (SELECT distinct c1 as c1 FROM t)t2 ON t1.c1 = 
> t2.c1
> ==>
> SELECT t1.* FROM t1 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39102:


Assignee: Apache Spark

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Assignee: Apache Spark
>Priority: Minor
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39102:


Assignee: (was: Apache Spark)

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Priority: Minor
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39172) Remove outer join if all output come from streamed side and buffered side keys exist unique

2022-05-12 Thread XiDuo You (Jira)
XiDuo You created SPARK-39172:
-

 Summary: Remove outer join if all output come from streamed side 
and buffered side keys exist unique
 Key: SPARK-39172
 URL: https://issues.apache.org/jira/browse/SPARK-39172
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


Improve the optimzation case using the distinct keys framework.

For example:
{code:java}
SELECT t1.* FROM t1 LEFT JOIN (SELECT distinct c1 as c1 FROM t)t2 ON t1.c1 = 
t2.c1
==>
SELECT t1.* FROM t1 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39102) Replace the usage of guava's Files.createTempDir() with java.nio.file.Files.createTempDirectory()

2022-05-12 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536413#comment-17536413
 ] 

Yang Jie commented on SPARK-39102:
--

Give a pr https://github.com/apache/spark/pull/36529

> Replace the usage of  guava's Files.createTempDir() with 
> java.nio.file.Files.createTempDirectory()
> --
>
> Key: SPARK-39102
> URL: https://issues.apache.org/jira/browse/SPARK-39102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1, 3.4.0
>Reporter: pralabhkumar
>Priority: Minor
>
> Hi 
> There are several classes where Spark is using guava's Files.createTempDir() 
> which have vulnerabilities. I think its better to move to 
> java.nio.file.Files.createTempDirectory() for those classes. 
> Classes 
> Java8RDDAPISuite
> JavaAPISuite.java
> RPackageUtilsSuite
> StreamTestHelper
> TestShuffleDataContext
> ExternalBlockHandlerSuite
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39171) Unify the Cast expression

2022-05-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-39171:
---
Issue Type: Improvement  (was: New Feature)

> Unify the Cast expression
> -
>
> Key: SPARK-39171
> URL: https://issues.apache.org/jira/browse/SPARK-39171
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39171) Unify the Cast expression

2022-05-12 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-39171:
--

 Summary: Unify the Cast expression
 Key: SPARK-39171
 URL: https://issues.apache.org/jira/browse/SPARK-39171
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly

2022-05-12 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-39041.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36373
[https://github.com/apache/spark/pull/36373]

> Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
> -
>
> Key: SPARK-39041
> URL: https://issues.apache.org/jira/browse/SPARK-39041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39041) Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly

2022-05-12 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-39041:


Assignee: Kent Yao

> Mapping Spark Query ResultSet/Schema to TRowSet/TTableSchema directly
> -
>
> Key: SPARK-39041
> URL: https://issues.apache.org/jira/browse/SPARK-39041
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38850) Upgrade Kafka to 3.2.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38850.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36526
[https://github.com/apache/spark/pull/36526]

> Upgrade Kafka to 3.2.0
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38850) Upgrade Kafka to 3.2.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38850:
-

Assignee: Dongjoon Hyun

> Upgrade Kafka to 3.2.0
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39170) ImportError when creating pyspark.pandas document "Supported APIs" if pandas version is low.

2022-05-12 Thread Hyunwoo Park (Jira)
Hyunwoo Park created SPARK-39170:


 Summary: ImportError when creating pyspark.pandas document 
"Supported APIs" if pandas version is low.
 Key: SPARK-39170
 URL: https://issues.apache.org/jira/browse/SPARK-39170
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Hyunwoo Park


The pyspark.pandas documentation "Supported APIs" will be auto-generated. 
([SPARK-38961|https://issues.apache.org/jira/browse/SPARK-38961])

At this point, we need to verify the version of pandas. It can be applied after 
the docker image used in github action is upgraded and republished at 
https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage.

Related: https://github.com/apache/spark/pull/36509



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39144) Nested subquery expressions deduplicate relations should be done bottom up

2022-05-12 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated SPARK-39144:
-
Description: When we have nested subquery expressions, there is a chance 
that deduplicate relations could replace an attributes with a wrong one. This 
is because the attributes replacement is done by top down than bottom up. This 
could happen if the subplan gets deduplicate relations first (thus two same 
relation with different attributes id), then a more complex plan built on top 
of the subplan (e.g. a UNION of queries with nested subquery expressions) can 
trigger this wrong attribute replacement error.

> Nested subquery expressions deduplicate relations should be done bottom up
> --
>
> Key: SPARK-39144
> URL: https://issues.apache.org/jira/browse/SPARK-39144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Rui Wang
>Priority: Major
>
> When we have nested subquery expressions, there is a chance that deduplicate 
> relations could replace an attributes with a wrong one. This is because the 
> attributes replacement is done by top down than bottom up. This could happen 
> if the subplan gets deduplicate relations first (thus two same relation with 
> different attributes id), then a more complex plan built on top of the 
> subplan (e.g. a UNION of queries with nested subquery expressions) can 
> trigger this wrong attribute replacement error.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39144) Nested subquery expressions deduplicate relations should be done bottom up

2022-05-12 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated SPARK-39144:
-
Summary: Nested subquery expressions deduplicate relations should be done 
bottom up  (was: Spark SQL replace wrong attributes for nested subquery 
expression in which all tables are the same relation)

> Nested subquery expressions deduplicate relations should be done bottom up
> --
>
> Key: SPARK-39144
> URL: https://issues.apache.org/jira/browse/SPARK-39144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39169) Optimize FIRST when used as a single aggregate function

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39169:


Assignee: Apache Spark

> Optimize FIRST when used as a single aggregate function
> ---
>
> Key: SPARK-39169
> URL: https://issues.apache.org/jira/browse/SPARK-39169
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Assignee: Apache Spark
>Priority: Major
>
> When `FIRST` is a single aggregate function in `Aggregate` we could either 
> rewrite whole query or optimize execution logic. 
>  * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
> since returns could differ in case all values of  are `NULL`
>  * Execution => `SELECT FIRST() FROM  GROUP BY ` => 
> short circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39169) Optimize FIRST when used as a single aggregate function

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536390#comment-17536390
 ] 

Apache Spark commented on SPARK-39169:
--

User 'vli-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/36527

> Optimize FIRST when used as a single aggregate function
> ---
>
> Key: SPARK-39169
> URL: https://issues.apache.org/jira/browse/SPARK-39169
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> When `FIRST` is a single aggregate function in `Aggregate` we could either 
> rewrite whole query or optimize execution logic. 
>  * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
> since returns could differ in case all values of  are `NULL`
>  * Execution => `SELECT FIRST() FROM  GROUP BY ` => 
> short circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39169) Optimize FIRST when used as a single aggregate function

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39169:


Assignee: (was: Apache Spark)

> Optimize FIRST when used as a single aggregate function
> ---
>
> Key: SPARK-39169
> URL: https://issues.apache.org/jira/browse/SPARK-39169
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> When `FIRST` is a single aggregate function in `Aggregate` we could either 
> rewrite whole query or optimize execution logic. 
>  * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
> since returns could differ in case all values of  are `NULL`
>  * Execution => `SELECT FIRST() FROM  GROUP BY ` => 
> short circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39142) Type overloads in `pandas_udf`

2022-05-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536388#comment-17536388
 ] 

Hyukjin Kwon commented on SPARK-39142:
--

[~tigerhawkvok], Are you interested in submitting a PR? 

cc [~zero323] FYI

> Type overloads in `pandas_udf` 
> ---
>
> Key: SPARK-39142
> URL: https://issues.apache.org/jira/browse/SPARK-39142
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Philip Kahn
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It seems that the `returnType` in the type overloads for `pandas_udf` never 
> specify a generic for PySpark SQL types or explicitly list those types:
>  
> [https://github.com/apache/spark/blob/f84018a4810867afa84658fec76494aaae6d57fc/python/pyspark/sql/pandas/functions.pyi]
>  
> This results in static type checkers flagging the type of the decorated 
> functions (and their parameters) as incorrect, see 
> [https://github.com/microsoft/pylance-release/issues/2789] as an example.
>  
> For someone familiar with the code base, this should be a very fast patch.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38397) Support Kueue: K8s-native Job Queueing

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38397:
--
Description: 
There are several ways to run Spark on K8s including vanilla `spark-submit` 
with built-in  `KubernetesClusterManager`, `spark-submit` with custom 
`ExternalClusterManager`, CRD-based operators (like spark-on-k8s-operator), 
custom K8s `schedulers`, custom `standalone pod definitions`, and so on.

This issue is tracking K8s-native Job Queueing related work.
 * [https://github.com/kubernetes-sigs/kueue]
{code}
metadata:
  generateName: sample-job-
  annotations:
kueue.k8s.io/queue-name: main
{code}

The best case is Apache Spark users use it in the future via pod templates or 
existing configuration. In other words, we don't need to do anything and close 
this JIRA without any patches.

*Documentation*
- https://github.com/kubernetes-sigs/kueue/tree/main/docs

*Release History*
- https://github.com/kubernetes-sigs/kueue/releases/tag/v0.1.0

  was:
There are several ways to run Spark on K8s including vanilla `spark-submit` 
with built-in  `KubernetesClusterManager`, `spark-submit` with custom 
`ExternalClusterManager`, CRD-based operators (like spark-on-k8s-operator), 
custom K8s `schedulers`, custom `standalone pod definitions`, and so on.

This issue is tracking K8s-native Job Queueing related work.
 * [https://github.com/kubernetes-sigs/kueue]
{code}
metadata:
  generateName: sample-job-
  annotations:
kueue.k8s.io/queue-name: main
{code}

The best case is Apache Spark users use it in the future via pod templates or 
existing configuration. In other words, we don't need to do anything and close 
this JIRA without any patches.


> Support Kueue: K8s-native Job Queueing
> --
>
> Key: SPARK-38397
> URL: https://issues.apache.org/jira/browse/SPARK-38397
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> There are several ways to run Spark on K8s including vanilla `spark-submit` 
> with built-in  `KubernetesClusterManager`, `spark-submit` with custom 
> `ExternalClusterManager`, CRD-based operators (like spark-on-k8s-operator), 
> custom K8s `schedulers`, custom `standalone pod definitions`, and so on.
> This issue is tracking K8s-native Job Queueing related work.
>  * [https://github.com/kubernetes-sigs/kueue]
> {code}
> metadata:
>   generateName: sample-job-
>   annotations:
> kueue.k8s.io/queue-name: main
> {code}
> The best case is Apache Spark users use it in the future via pod templates or 
> existing configuration. In other words, we don't need to do anything and 
> close this JIRA without any patches.
> *Documentation*
> - https://github.com/kubernetes-sigs/kueue/tree/main/docs
> *Release History*
> - https://github.com/kubernetes-sigs/kueue/releases/tag/v0.1.0



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38850) Upgrade Kafka to 3.2.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38850:
--
Affects Version/s: 3.4.0
   (was: 3.3.0)

> Upgrade Kafka to 3.2.0
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39168) Consider all values in a python list when inferring schema

2022-05-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536385#comment-17536385
 ] 

Hyukjin Kwon commented on SPARK-39168:
--

Sounds making sense to me. Are you interested in submitting a PR?

cc [~itholic] who wrote {{spark.sql.pyspark.inferNestedDictAsStruct.enabled}} 
option.

> Consider all values in a python list when inferring schema
> --
>
> Key: SPARK-39168
> URL: https://issues.apache.org/jira/browse/SPARK-39168
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Major
>
> Schema inference fails on the following case:
> {code:python}
> >>> data = [{"a": [1, None], "b": [None, 2]}]
> >>> spark.createDataFrame(data)
> ValueError: Some of types cannot be determined after inferring
> {code}
> This is because only the first value in the array is used to infer the 
> element type for the array: 
> [https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/types.py#L1260].
>  The element type of the "b" array is inferred as {{NullType}} but I think it 
> makes sense to infer the element type as {{{}LongType{}}}.
> One approach to address the above would be to infer the type from the first 
> non-null value in the array. However, consider a case with structs:
> {code:python}
> >>> spark.conf.set("spark.sql.pyspark.inferNestedDictAsStruct.enabled",  True)
> >>> data = [{"a": [{"b": 1}, {"c": 2}]}]
> >>> spark.createDataFrame(data).schema
> StructType([StructField('a', ArrayType(StructType([StructField('b', 
> LongType(), True)]), True), True)])
> {code}
> The element type of the "a" array is inferred as a struct with one field, 
> "b". However, it would be convenient to infer the element type as a struct 
> with both fields "b" and "c". Omitted fields from each dictionary would 
> become null values in each struct:
> {code:java}
> +--+
> | a|
> +--+
> |[{1, null}, {null, 2}]|
> +--+
> {code}
> To support both of these cases, the type of each array element could be 
> inferred, and those types could be merged, similar to the approach 
> [here|https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/session.py#L574-L576].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39169) Optimize FIRST when used as a single aggregate function

2022-05-12 Thread Vitalii Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Li updated SPARK-39169:
---
Summary: Optimize FIRST when used as a single aggregate function  (was: 
Optimize FIRST when used as non-aggregate)

> Optimize FIRST when used as a single aggregate function
> ---
>
> Key: SPARK-39169
> URL: https://issues.apache.org/jira/browse/SPARK-39169
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> When `FIRST` is a single aggregate function in `Aggregate` we could either 
> rewrite whole query or optimize execution logic. 
>  * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
> since returns could differ in case all values of  are `NULL`
>  * Execution => `SELECT FIRST() FROM  GROUP BY ` => 
> short circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39169) Optimize FIRST when used as non-aggregate

2022-05-12 Thread Vitalii Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Li updated SPARK-39169:
---
Description: 
When `FIRST` is a single aggregate function in `Aggregate` we could either 
rewrite whole query or optimize execution logic. 
 * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
since returns could differ in case all values of  are `NULL`
 * Execution => `SELECT FIRST() FROM  GROUP BY ` => short 
circuit iteration per key once a value for `FIRST` is set.

  was:
When `FIRST` is a single aggregate function in `Aggregate` we could either 
rewrite whole query or optimize execution logic. 
 * Plan => `SELECT FIRST() FROM  [GROUP BY ]` => `SELECT  
FROM  LIMIT 1`. Note that setting `ignoreNulls` to `true` should block 
such rewrite since returns could differ in case all values of  are `NULL`
 * Execution => `SELECT FIRST() FROM  GROUP BY ` => short 
circuit iteration per key once a value for `FIRST` is set.


> Optimize FIRST when used as non-aggregate
> -
>
> Key: SPARK-39169
> URL: https://issues.apache.org/jira/browse/SPARK-39169
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Vitalii Li
>Priority: Major
>
> When `FIRST` is a single aggregate function in `Aggregate` we could either 
> rewrite whole query or optimize execution logic. 
>  * Plan => `SELECT FIRST() FROM ` => `SELECT  FROM  
> LIMIT 1`. Note that setting `ignoreNulls` to `true` should block such rewrite 
> since returns could differ in case all values of  are `NULL`
>  * Execution => `SELECT FIRST() FROM  GROUP BY ` => 
> short circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38850) Upgrade Kafka to 3.2.0

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536381#comment-17536381
 ] 

Apache Spark commented on SPARK-38850:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/36526

> Upgrade Kafka to 3.2.0
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39169) Optimize FIRST when used as non-aggregate

2022-05-12 Thread Vitalii Li (Jira)
Vitalii Li created SPARK-39169:
--

 Summary: Optimize FIRST when used as non-aggregate
 Key: SPARK-39169
 URL: https://issues.apache.org/jira/browse/SPARK-39169
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Vitalii Li


When `FIRST` is a single aggregate function in `Aggregate` we could either 
rewrite whole query or optimize execution logic. 
 * Plan => `SELECT FIRST() FROM  [GROUP BY ]` => `SELECT  
FROM  LIMIT 1`. Note that setting `ignoreNulls` to `true` should block 
such rewrite since returns could differ in case all values of  are `NULL`
 * Execution => `SELECT FIRST() FROM  GROUP BY ` => short 
circuit iteration per key once a value for `FIRST` is set.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-34930) Install PyArrow and pandas on Jenkins

2022-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-34930.
--
Resolution: Invalid

We dropped Jenkins.

> Install PyArrow and pandas on Jenkins
> -
>
> Key: SPARK-34930
> URL: https://issues.apache.org/jira/browse/SPARK-34930
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Hyukjin Kwon
>Assignee: Shane Knapp
>Priority: Critical
>
> Looks like Jenkins mahcines don't have pandas and PyArrow (ever since it got 
> upgraded?) which result in skipping related tests in PySpark, see also 
> https://github.com/apache/spark/pull/31470#issuecomment-811618571
> It would be great if we can install both in Python 3.6 on Jenkins.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38850) Upgrade Kafka to 3.1.1

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38850:
--
Issue Type: Improvement  (was: Bug)

> Upgrade Kafka to 3.1.1
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38850) Upgrade Kafka to 3.2.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38850:
--
Summary: Upgrade Kafka to 3.2.0  (was: Upgrade Kafka to 3.1.1)

> Upgrade Kafka to 3.2.0
> --
>
> Key: SPARK-38850
> URL: https://issues.apache.org/jira/browse/SPARK-38850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39160) Remove workaround for ARROW-1948

2022-05-12 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-39160.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36518
[https://github.com/apache/spark/pull/36518]

> Remove workaround for ARROW-1948
> 
>
> Key: SPARK-39160
> URL: https://issues.apache.org/jira/browse/SPARK-39160
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39160) Remove workaround for ARROW-1948

2022-05-12 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned SPARK-39160:


Assignee: Cheng Pan

> Remove workaround for ARROW-1948
> 
>
> Key: SPARK-39160
> URL: https://issues.apache.org/jira/browse/SPARK-39160
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions

2022-05-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39164.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36500
[https://github.com/apache/spark/pull/36500]

> Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in 
> actions
> 
>
> Key: SPARK-39164
> URL: https://issues.apache.org/jira/browse/SPARK-39164
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Catch exceptions from asserts and IllegalStateException raised from actions, 
> and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39168) Consider all values in a python list when inferring schema

2022-05-12 Thread Brian Schaefer (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Schaefer updated SPARK-39168:
---
Description: 
Schema inference fails on the following case:
{code:python}
>>> data = [{"a": [1, None], "b": [None, 2]}]
>>> spark.createDataFrame(data)
ValueError: Some of types cannot be determined after inferring
{code}
This is because only the first value in the array is used to infer the element 
type for the array: 
[https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/types.py#L1260].
 The element type of the "b" array is inferred as {{NullType}} but I think it 
makes sense to infer the element type as {{{}LongType{}}}.

One approach to address the above would be to infer the type from the first 
non-null value in the array. However, consider a case with structs:
{code:python}
>>> spark.conf.set("spark.sql.pyspark.inferNestedDictAsStruct.enabled",  True)
>>> data = [{"a": [{"b": 1}, {"c": 2}]}]
>>> spark.createDataFrame(data).schema
StructType([StructField('a', ArrayType(StructType([StructField('b', LongType(), 
True)]), True), True)])
{code}
The element type of the "a" array is inferred as a struct with one field, "b". 
However, it would be convenient to infer the element type as a struct with both 
fields "b" and "c". Omitted fields from each dictionary would become null 
values in each struct:
{code:java}
+--+
| a|
+--+
|[{1, null}, {null, 2}]|
+--+
{code}
To support both of these cases, the type of each array element could be 
inferred, and those types could be merged, similar to the approach 
[here|https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/session.py#L574-L576].

  was:
Schema inference fails on the following case:
{code:python}
>>> data = [{"a": [1, None], "b": [None, 2]}]
>>> spark.createDataFrame(data)
ValueError: Some of types cannot be determined after inferring
{code}
This is because only the first value in the array is used to infer the element 
type for the array: 
[https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/types.py#L1260].
 The element type of the "b" array is inferred as {{NullType}} but I think it 
makes sense to infer the element type as {{{}LongType{}}}.

One approach to address the above would be to infer the type from the first 
non-null value in the array. However, consider a case with structs:
{code:python}
>>> spark.conf.set("spark.sql.pyspark.inferNestedDictAsStruct.enabled",  True)
>>> data = [{"a": [{"b": 1}, {"c": 2}]}]
>>> spark.createDataFrame(data).schema
StructType([StructField('a', ArrayType(StructType([StructField('b', LongType(), 
True)]), True), True)])
{code}
The element type of the "a" array is inferred as a struct with one field, "b". 
However, it would be convenient to infer the element type as a struct with both 
fields "b" and "c". Omitted fields from each dictionary would become null 
values in each struct:
{code:java}
+--+
| a|
+--+
|[{1, null}, {null, 1}]|
+--+
{code}
To support both of these cases, the type of each array element could be 
inferred, and those types could be merged, similar to the approach 
[here|https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/session.py#L574-L576].


> Consider all values in a python list when inferring schema
> --
>
> Key: SPARK-39168
> URL: https://issues.apache.org/jira/browse/SPARK-39168
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Major
>
> Schema inference fails on the following case:
> {code:python}
> >>> data = [{"a": [1, None], "b": [None, 2]}]
> >>> spark.createDataFrame(data)
> ValueError: Some of types cannot be determined after inferring
> {code}
> This is because only the first value in the array is used to infer the 
> element type for the array: 
> [https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/types.py#L1260].
>  The element type of the "b" array is inferred as {{NullType}} but I think it 
> makes sense to infer the element type as {{{}LongType{}}}.
> One approach to address the above would be to infer the type from the first 
> non-null value in the array. However, consider a case with structs:
> {code:python}
> >>> spark.conf.set("spark.sql.pyspark.inferNestedDictAsStruct.enabled",  True)
> >>> data = [{"a": [{"b": 1}, {"c": 2}]}]
> >>> spark.createDataFrame(data).schema
> StructType([StructField('a', ArrayType(StructType([StructField('b', 
> LongType(), True)]), True), Tru

[jira] [Resolved] (SPARK-39145) CLONE - SPIP: Public APIs for extended Columnar Processing Support

2022-05-12 Thread Erik Krogen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen resolved SPARK-39145.
-
Resolution: Duplicate

Closing this as a duplicate..

> CLONE - SPIP: Public APIs for extended Columnar Processing Support
> --
>
> Key: SPARK-39145
> URL: https://issues.apache.org/jira/browse/SPARK-39145
> Project: Spark
>  Issue Type: Epic
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Abhi Shah
>Assignee: Robert Joseph Evans
>Priority: Major
>
> *strong text**SPIP: Columnar Processing Without Arrow Formatting Guarantees.*
>  
> *Q1.* What are you trying to do? Articulate your objectives using absolutely 
> no jargon.
> The Dataset/DataFrame API in Spark currently only exposes to users one row at 
> a time when processing data.  The goals of this are to
>  # Add to the current sql extensions mechanism so advanced users can have 
> access to the physical SparkPlan and manipulate it to provide columnar 
> processing for existing operators, including shuffle.  This will allow them 
> to implement their own cost based optimizers to decide when processing should 
> be columnar and when it should not.
>  # Make any transitions between the columnar memory layout and a row based 
> layout transparent to the users so operations that are not columnar see the 
> data as rows, and operations that are columnar see the data as columns.
>  
> Not Requirements, but things that would be nice to have.
>  # Transition the existing in memory columnar layouts to be compatible with 
> Apache Arrow.  This would make the transformations to Apache Arrow format a 
> no-op. The existing formats are already very close to those layouts in many 
> cases.  This would not be using the Apache Arrow java library, but instead 
> being compatible with the memory 
> [layout|https://arrow.apache.org/docs/format/Layout.html] and possibly only a 
> subset of that layout.
>  
> *Q2.* What problem is this proposal NOT designed to solve? 
> The goal of this is not for ML/AI but to provide APIs for accelerated 
> computing in Spark primarily targeting SQL/ETL like workloads.  ML/AI already 
> have several mechanisms to get data into/out of them. These can be improved 
> but will be covered in a separate SPIP.
> This is not trying to implement any of the processing itself in a columnar 
> way, with the exception of examples for documentation.
> This does not cover exposing the underlying format of the data.  The only way 
> to get at the data in a ColumnVector is through the public APIs.  Exposing 
> the underlying format to improve efficiency will be covered in a separate 
> SPIP.
> This is not trying to implement new ways of transferring data to external 
> ML/AI applications.  That is covered by separate SPIPs already.
> This is not trying to add in generic code generation for columnar processing. 
>  Currently code generation for columnar processing is only supported when 
> translating columns to rows.  We will continue to support this, but will not 
> extend it as a general solution. That will be covered in a separate SPIP if 
> we find it is helpful.  For now columnar processing will be interpreted.
> This is not trying to expose a way to get columnar data into Spark through 
> DataSource V2 or any other similar API.  That would be covered by a separate 
> SPIP if we find it is needed.
>  
> *Q3.* How is it done today, and what are the limits of current practice?
> The current columnar support is limited to 3 areas.
>  # Internal implementations of FileFormats, optionally can return a 
> ColumnarBatch instead of rows.  The code generation phase knows how to take 
> that columnar data and iterate through it as rows for stages that wants rows, 
> which currently is almost everything.  The limitations here are mostly 
> implementation specific. The current standard is to abuse Scala’s type 
> erasure to return ColumnarBatches as the elements of an RDD[InternalRow]. The 
> code generation can handle this because it is generating java code, so it 
> bypasses scala’s type checking and just casts the InternalRow to the desired 
> ColumnarBatch.  This makes it difficult for others to implement the same 
> functionality for different processing because they can only do it through 
> code generation. There really is no clean separate path in the code 
> generation for columnar vs row based. Additionally, because it is only 
> supported through code generation if for any reason code generation would 
> fail there is no backup.  This is typically fine for input formats but can be 
> problematic when we get into more extensive processing.
>  # When caching data it can optionally be cached in a columnar format if the 
> input is also columnar.  This is similar to the first area and has t

[jira] [Resolved] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-39161.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36522
[https://github.com/apache/spark/pull/36522]

> Upgrade rocksdbjni to 7.2.2
> ---
>
> Key: SPARK-39161
> URL: https://issues.apache.org/jira/browse/SPARK-39161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-39161:
-

Assignee: Yang Jie

> Upgrade rocksdbjni to 7.2.2
> ---
>
> Key: SPARK-39161
> URL: https://issues.apache.org/jira/browse/SPARK-39161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36837) Upgrade Kafka to 3.1.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36837:
--
Parent: (was: SPARK-33772)
Issue Type: Improvement  (was: Sub-task)

> Upgrade Kafka to 3.1.0
> --
>
> Key: SPARK-36837
> URL: https://issues.apache.org/jira/browse/SPARK-36837
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.4.0
>
>
> Kafka 3.1.0 has the official Java 17 support. We had better align with it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36837) Upgrade Kafka to 3.1.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36837:
--
Fix Version/s: 3.4.0
   (was: 3.3.0)

> Upgrade Kafka to 3.1.0
> --
>
> Key: SPARK-36837
> URL: https://issues.apache.org/jira/browse/SPARK-36837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.4.0
>
>
> Kafka 3.1.0 has the official Java 17 support. We had better align with it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36837) Upgrade Kafka to 3.1.0

2022-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36837:
--
Affects Version/s: 3.4.0
   (was: 3.3.0)

> Upgrade Kafka to 3.1.0
> --
>
> Key: SPARK-36837
> URL: https://issues.apache.org/jira/browse/SPARK-36837
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: releasenotes
> Fix For: 3.3.0
>
>
> Kafka 3.1.0 has the official Java 17 support. We had better align with it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39167) Throw an exception w/ an error class for multiple rows from a subquery used as an expression

2022-05-12 Thread Max Gekk (Jira)
Max Gekk created SPARK-39167:


 Summary: Throw an exception w/ an error class for multiple rows 
from a subquery used as an expression
 Key: SPARK-39167
 URL: https://issues.apache.org/jira/browse/SPARK-39167
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Users can trigger an illegal state exception by the SQL statement:

{code:sql}
> select (select a from (select 1 as a union all select 2 as a) t) as b
{code}

{code:java}
Caused by: java.lang.IllegalStateException: more than one row returned by a 
subquery used as an expression:
Subquery subquery#242, [id=#100]
+- AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
  Union
  :- *(1) Project [1 AS a#240]
  :  +- *(1) Scan OneRowRelation[]
  +- *(2) Project [2 AS a#241]
 +- *(2) Scan OneRowRelation[]
   +- == Initial Plan ==
  Union
  :- Project [1 AS a#240]
  :  +- Scan OneRowRelation[]
  +- Project [2 AS a#241]
 +- Scan OneRowRelation[]

at 
org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
{code}
but such kind of exceptions are not supposed to be visible to users. Need to 
introduce an error class (or re-use an existing one), and replace the 
IllegalStateException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39168) Consider all values in a python list when inferring schema

2022-05-12 Thread Brian Schaefer (Jira)
Brian Schaefer created SPARK-39168:
--

 Summary: Consider all values in a python list when inferring schema
 Key: SPARK-39168
 URL: https://issues.apache.org/jira/browse/SPARK-39168
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.2.1
Reporter: Brian Schaefer


Schema inference fails on the following case:
{code:python}
>>> data = [{"a": [1, None], "b": [None, 2]}]
>>> spark.createDataFrame(data)
ValueError: Some of types cannot be determined after inferring
{code}
This is because only the first value in the array is used to infer the element 
type for the array: 
[https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/types.py#L1260].
 The element type of the "b" array is inferred as {{NullType}} but I think it 
makes sense to infer the element type as {{{}LongType{}}}.

One approach to address the above would be to infer the type from the first 
non-null value in the array. However, consider a case with structs:
{code:python}
>>> spark.conf.set("spark.sql.pyspark.inferNestedDictAsStruct.enabled",  True)
>>> data = [{"a": [{"b": 1}, {"c": 2}]}]
>>> spark.createDataFrame(data).schema
StructType([StructField('a', ArrayType(StructType([StructField('b', LongType(), 
True)]), True), True)])
{code}
The element type of the "a" array is inferred as a struct with one field, "b". 
However, it would be convenient to infer the element type as a struct with both 
fields "b" and "c". Omitted fields from each dictionary would become null 
values in each struct:
{code:java}
+--+
| a|
+--+
|[{1, null}, {null, 1}]|
+--+
{code}
To support both of these cases, the type of each array element could be 
inferred, and those types could be merged, similar to the approach 
[here|https://github.com/apache/spark/blob/b63674ea5f746306a96ab8c39c23a230a6cb9566/python/pyspark/sql/session.py#L574-L576].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-39165.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 36524
[https://github.com/apache/spark/pull/36524]

> Replace sys.error by IllegalStateException in Spark SQL
> ---
>
> Key: SPARK-39165
> URL: https://issues.apache.org/jira/browse/SPARK-39165
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Replace all sys.error by IllegalStateException. sys.error throws 
> RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-39165:


Assignee: Max Gekk

> Replace sys.error by IllegalStateException in Spark SQL
> ---
>
> Key: SPARK-39165
> URL: https://issues.apache.org/jira/browse/SPARK-39165
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Replace all sys.error by IllegalStateException. sys.error throws 
> RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-39166:
---
Description: 
Currently, for most of the cases, the project 
https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the 
runtime errors happen within the original query.
However, after trying on production, I found that the following queries won't 
show where the divide by 0 error happens


{code:java}
create table aggTest(i int, j int, k int, d date) using parquet
insert into aggTest values(1, 2, 0, date'2022-01-01')
select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}

With `percentile` function in the query, the plan can't execute with whole 
stage codegen. Thus the child plan of `Project` is serialized to executors for 
execution, from ProjectExec:


{code:java}
  protected override def doExecute(): RDD[InternalRow] = {
    child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
      val project = UnsafeProjection.create(projectList, child.output)
      project.initialize(index)
      iter.map(project)
    }
  }{code}
Note that the `TreeNode.origin` is not serialized to executors since `TreeNode` 
doesn't extend the trait `Serializable`, which results in an empty query 
context on errors. For more details, please read 
https://issues.apache.org/jira/browse/SPARK-39140

A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, it 
can be performance regression if the query text is long (every `TreeNode` 
carries it for serialization). 
A better fix is to introduce a new trait `SupportQueryContext` and materialize 
the truncated query context for special expressions. This jira targets on 
binary arithmetic expressions only. I will create follow-ups for the remaining 
expressions which support runtime error query context.

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Currently, for most of the cases, the project 
> https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the 
> runtime errors happen within the original query.
> However, after trying on production, I found that the following queries won't 
> show where the divide by 0 error happens
> {code:java}
> create table aggTest(i int, j int, k int, d date) using parquet
> insert into aggTest values(1, 2, 0, date'2022-01-01')
> select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d{code}
> With `percentile` function in the query, the plan can't execute with whole 
> stage codegen. Thus the child plan of `Project` is serialized to executors 
> for execution, from ProjectExec:
> {code:java}
>   protected override def doExecute(): RDD[InternalRow] = {
>     child.execute().mapPartitionsWithIndexInternal { (index, iter) =>
>       val project = UnsafeProjection.create(projectList, child.output)
>       project.initialize(index)
>       iter.map(project)
>     }
>   }{code}
> Note that the `TreeNode.origin` is not serialized to executors since 
> `TreeNode` doesn't extend the trait `Serializable`, which results in an empty 
> query context on errors. For more details, please read 
> https://issues.apache.org/jira/browse/SPARK-39140
> A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, 
> it can be performance regression if the query text is long (every `TreeNode` 
> carries it for serialization). 
> A better fix is to introduce a new trait `SupportQueryContext` and 
> materialize the truncated query context for special expressions. This jira 
> targets on binary arithmetic expressions only. I will create follow-ups for 
> the remaining expressions which support runtime error query context.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536175#comment-17536175
 ] 

Apache Spark commented on SPARK-39166:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36525

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536174#comment-17536174
 ] 

Apache Spark commented on SPARK-39166:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36525

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39166:


Assignee: Apache Spark  (was: Gengliang Wang)

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39166:


Assignee: Gengliang Wang  (was: Apache Spark)

> Provide runtime error query context for Binary Arithmetic when WSCG is off
> --
>
> Key: SPARK-39166
> URL: https://issues.apache.org/jira/browse/SPARK-39166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39166) Provide runtime error query context for Binary Arithmetic when WSCG is off

2022-05-12 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-39166:
--

 Summary: Provide runtime error query context for Binary Arithmetic 
when WSCG is off
 Key: SPARK-39166
 URL: https://issues.apache.org/jira/browse/SPARK-39166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39165:


Assignee: (was: Apache Spark)

> Replace sys.error by IllegalStateException in Spark SQL
> ---
>
> Key: SPARK-39165
> URL: https://issues.apache.org/jira/browse/SPARK-39165
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace all sys.error by IllegalStateException. sys.error throws 
> RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39165:


Assignee: Apache Spark

> Replace sys.error by IllegalStateException in Spark SQL
> ---
>
> Key: SPARK-39165
> URL: https://issues.apache.org/jira/browse/SPARK-39165
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Replace all sys.error by IllegalStateException. sys.error throws 
> RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536101#comment-17536101
 ] 

Apache Spark commented on SPARK-39165:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36524

> Replace sys.error by IllegalStateException in Spark SQL
> ---
>
> Key: SPARK-39165
> URL: https://issues.apache.org/jira/browse/SPARK-39165
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace all sys.error by IllegalStateException. sys.error throws 
> RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39165) Replace sys.error by IllegalStateException in Spark SQL

2022-05-12 Thread Max Gekk (Jira)
Max Gekk created SPARK-39165:


 Summary: Replace sys.error by IllegalStateException in Spark SQL
 Key: SPARK-39165
 URL: https://issues.apache.org/jira/browse/SPARK-39165
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Replace all sys.error by IllegalStateException. sys.error throws 
RuntimeException which is hard to distinguish from Spark exception.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536047#comment-17536047
 ] 

Apache Spark commented on SPARK-39164:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/36500

> Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in 
> actions
> 
>
> Key: SPARK-39164
> URL: https://issues.apache.org/jira/browse/SPARK-39164
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Catch exceptions from asserts and IllegalStateException raised from actions, 
> and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39164:


Assignee: Apache Spark  (was: Max Gekk)

> Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in 
> actions
> 
>
> Key: SPARK-39164
> URL: https://issues.apache.org/jira/browse/SPARK-39164
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Catch exceptions from asserts and IllegalStateException raised from actions, 
> and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39164:


Assignee: Max Gekk  (was: Apache Spark)

> Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in 
> actions
> 
>
> Key: SPARK-39164
> URL: https://issues.apache.org/jira/browse/SPARK-39164
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Catch exceptions from asserts and IllegalStateException raised from actions, 
> and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37956) Add Java and Python examples to the Parquet encryption feature documentation

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536046#comment-17536046
 ] 

Apache Spark commented on SPARK-37956:
--

User 'andersonm-ibm' has created a pull request for this issue:
https://github.com/apache/spark/pull/36523

> Add Java and Python examples to the Parquet encryption feature documentation 
> -
>
> Key: SPARK-37956
> URL: https://issues.apache.org/jira/browse/SPARK-37956
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Priority: Minor
>
> Add Java and Python examples to the Parquet encryption feature documentation, 
> based on the Scala example in [SPARK-35658].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37956) Add Java and Python examples to the Parquet encryption feature documentation

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536045#comment-17536045
 ] 

Apache Spark commented on SPARK-37956:
--

User 'andersonm-ibm' has created a pull request for this issue:
https://github.com/apache/spark/pull/36523

> Add Java and Python examples to the Parquet encryption feature documentation 
> -
>
> Key: SPARK-37956
> URL: https://issues.apache.org/jira/browse/SPARK-37956
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Maya Anderson
>Priority: Minor
>
> Add Java and Python examples to the Parquet encryption feature documentation, 
> based on the Scala example in [SPARK-35658].



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in actions

2022-05-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-39164:
-
Summary: Wrap asserts/illegal state exceptions by the INTERNAL_ERROR 
exception in actions  (was: Wrap asserts/illegal state exceptions by the 
INTERNAL_ERROR exception)

> Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception in 
> actions
> 
>
> Key: SPARK-39164
> URL: https://issues.apache.org/jira/browse/SPARK-39164
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Catch exceptions from asserts and IllegalStateException raised from actions, 
> and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39164) Wrap asserts/illegal state exceptions by the INTERNAL_ERROR exception

2022-05-12 Thread Max Gekk (Jira)
Max Gekk created SPARK-39164:


 Summary: Wrap asserts/illegal state exceptions by the 
INTERNAL_ERROR exception
 Key: SPARK-39164
 URL: https://issues.apache.org/jira/browse/SPARK-39164
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk
Assignee: Max Gekk


Catch exceptions from asserts and IllegalStateException raised from actions, 
and replace them by SparkException w/ the INTERNAL_ERROR error class.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39163) Throw an exception w/ error class for an invalid bucket file

2022-05-12 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-39163:
-
Description: 
Replace IllegalStateException by Spark's exception w/ an error class there 
[https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621.|https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621]

Move related tests to an Query.*ErrorsSuite.

  was:Replace IllegalStateException by Spark's exception w/ an error class 
there 
https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621


> Throw an exception w/ error class for an invalid bucket file
> 
>
> Key: SPARK-39163
> URL: https://issues.apache.org/jira/browse/SPARK-39163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>
> Replace IllegalStateException by Spark's exception w/ an error class there 
> [https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621.|https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621]
> Move related tests to an Query.*ErrorsSuite.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39163) Throw an exception w/ error class for an invalid bucket file

2022-05-12 Thread Max Gekk (Jira)
Max Gekk created SPARK-39163:


 Summary: Throw an exception w/ error class for an invalid bucket 
file
 Key: SPARK-39163
 URL: https://issues.apache.org/jira/browse/SPARK-39163
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Max Gekk


Replace IllegalStateException by Spark's exception w/ an error class there 
https://github.com/apache/spark/blob/ee6ea3c68694e35c36ad006a7762297800d1e463/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L621



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39161:


Assignee: (was: Apache Spark)

> Upgrade rocksdbjni to 7.2.2
> ---
>
> Key: SPARK-39161
> URL: https://issues.apache.org/jira/browse/SPARK-39161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536020#comment-17536020
 ] 

Apache Spark commented on SPARK-39161:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/36522

> Upgrade rocksdbjni to 7.2.2
> ---
>
> Key: SPARK-39161
> URL: https://issues.apache.org/jira/browse/SPARK-39161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39161:


Assignee: Apache Spark

> Upgrade rocksdbjni to 7.2.2
> ---
>
> Key: SPARK-39161
> URL: https://issues.apache.org/jira/browse/SPARK-39161
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39162:


Assignee: Apache Spark

> Jdbc dialect should decide which function could be pushed down.
> ---
>
> Key: SPARK-39162
> URL: https://issues.apache.org/jira/browse/SPARK-39162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Regardless of whether the functions are ANSI or not, most databases are 
> actually unsure of their support.
> So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
> which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-39162:


Assignee: (was: Apache Spark)

> Jdbc dialect should decide which function could be pushed down.
> ---
>
> Key: SPARK-39162
> URL: https://issues.apache.org/jira/browse/SPARK-39162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Regardless of whether the functions are ANSI or not, most databases are 
> actually unsure of their support.
> So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
> which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536016#comment-17536016
 ] 

Apache Spark commented on SPARK-39162:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36521

> Jdbc dialect should decide which function could be pushed down.
> ---
>
> Key: SPARK-39162
> URL: https://issues.apache.org/jira/browse/SPARK-39162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Regardless of whether the functions are ANSI or not, most databases are 
> actually unsure of their support.
> So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
> which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-12 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-39162:
---
Description: 
Regardless of whether the functions are ANSI or not, most databases are 
actually unsure of their support.
So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
which function could be pushed down.

  was:
Regardless of whether the functions are ANSI or not, most databases are 
actually unsure of their support.
So we should add a new API into JdbcDialect so that Jdbc dialect should decide 
which function could be pushed down.


> Jdbc dialect should decide which function could be pushed down.
> ---
>
> Key: SPARK-39162
> URL: https://issues.apache.org/jira/browse/SPARK-39162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Regardless of whether the functions are ANSI or not, most databases are 
> actually unsure of their support.
> So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
> which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-12 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-39162:
--

 Summary: Jdbc dialect should decide which function could be pushed 
down.
 Key: SPARK-39162
 URL: https://issues.apache.org/jira/browse/SPARK-39162
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: jiaan.geng


Regardless of whether the functions are ANSI or not, most databases are 
actually unsure of their support.
So we should add a new API into JdbcDialect so that Jdbc dialect should decide 
which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39161) Upgrade rocksdbjni to 7.2.2

2022-05-12 Thread Yang Jie (Jira)
Yang Jie created SPARK-39161:


 Summary: Upgrade rocksdbjni to 7.2.2
 Key: SPARK-39161
 URL: https://issues.apache.org/jira/browse/SPARK-39161
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38633) Support push down Cast to JDBC data source V2

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535941#comment-17535941
 ] 

Apache Spark commented on SPARK-38633:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36520

> Support push down Cast to JDBC data source V2
> -
>
> Key: SPARK-38633
> URL: https://issues.apache.org/jira/browse/SPARK-38633
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Cast is very useful and Spark always use Cast to convert data type 
> automatically.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39159) Add new Dataset API for Offset

2022-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17535909#comment-17535909
 ] 

Apache Spark commented on SPARK-39159:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/36519

> Add new Dataset API for Offset
> --
>
> Key: SPARK-39159
> URL: https://issues.apache.org/jira/browse/SPARK-39159
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org