[jira] [Assigned] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38521:


Assignee: (was: Apache Spark)

> Throw Exception if overwriting hive partition table with dynamic and 
> staticPartitionOverwriteMode
> -
>
> Key: SPARK-38521
> URL: https://issues.apache.org/jira/browse/SPARK-38521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jackey Lee
>Priority: Major
>
> The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the 
> existing data of the table through staticmode, but for hive table, it is 
> disastrous. It may deleting all data in hive partitioned table while writing 
> with dynamic overwrite and `partitionOverwriteMode=STATIC`.
> Here we add a check for this and throw Exception if this happends.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504771#comment-17504771
 ] 

Apache Spark commented on SPARK-38521:
--

User 'jackylee-ch' has created a pull request for this issue:
https://github.com/apache/spark/pull/35815

> Throw Exception if overwriting hive partition table with dynamic and 
> staticPartitionOverwriteMode
> -
>
> Key: SPARK-38521
> URL: https://issues.apache.org/jira/browse/SPARK-38521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jackey Lee
>Priority: Major
>
> The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the 
> existing data of the table through staticmode, but for hive table, it is 
> disastrous. It may deleting all data in hive partitioned table while writing 
> with dynamic overwrite and `partitionOverwriteMode=STATIC`.
> Here we add a check for this and throw Exception if this happends.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38521:


Assignee: Apache Spark

> Throw Exception if overwriting hive partition table with dynamic and 
> staticPartitionOverwriteMode
> -
>
> Key: SPARK-38521
> URL: https://issues.apache.org/jira/browse/SPARK-38521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Jackey Lee
>Assignee: Apache Spark
>Priority: Major
>
> The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the 
> existing data of the table through staticmode, but for hive table, it is 
> disastrous. It may deleting all data in hive partitioned table while writing 
> with dynamic overwrite and `partitionOverwriteMode=STATIC`.
> Here we add a check for this and throw Exception if this happends.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38522) Strengthen the contract on iterator method in StateStore

2022-03-10 Thread Jungtaek Lim (Jira)
Jungtaek Lim created SPARK-38522:


 Summary: Strengthen the contract on iterator method in StateStore
 Key: SPARK-38522
 URL: https://issues.apache.org/jira/browse/SPARK-38522
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.3.0
Reporter: Jungtaek Lim


The root cause of SPARK-38320 was that the logic initialized the iterator 
first, and performed some updates against state store, and iterated through 
iterator expecting that all updates in between should be visible in iterator.

That is not guaranteed in RocksDB state store, and the contract of Java 
ConcurrentHashMap which is used in HDFSBackedStateStore does not also guarantee 
it.

It would be clearer if we update the contract to draw a line on behavioral 
guarantee to callers so that callers don't get such expectation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-38509.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35805
[https://github.com/apache/spark/pull/35805]

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-38509:


Assignee: Max Gekk

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode

2022-03-10 Thread Jackey Lee (Jira)
Jackey Lee created SPARK-38521:
--

 Summary: Throw Exception if overwriting hive partition table with 
dynamic and staticPartitionOverwriteMode
 Key: SPARK-38521
 URL: https://issues.apache.org/jira/browse/SPARK-38521
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Jackey Lee


The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the 
existing data of the table through staticmode, but for hive table, it is 
disastrous. It may deleting all data in hive partitioned table while writing 
with dynamic overwrite and `partitionOverwriteMode=STATIC`.

Here we add a check for this and throw Exception if this happends.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38502) Distribution with hadoop-provided is missing log4j2

2022-03-10 Thread Emil Ejbyfeldt (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ejbyfeldt resolved SPARK-38502.

Resolution: Duplicate

> Distribution with hadoop-provided is missing log4j2
> ---
>
> Key: SPARK-38502
> URL: https://issues.apache.org/jira/browse/SPARK-38502
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> Currently building spark 3.3.0-SNAPSHOT using `./dev/make-distribution.sh 
> --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn` script will build 
> a package that does not included log4j2. Trying to run spark-submit with the 
> latest hadoop release 3.3.2 and this build will result in
> {code:java}
> $ spark-submit run-example org.apache.spark.examples.SparkPi
> Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
> {code}
> Since log4j2 is not found. So I believe the maven build settings needs to be 
> tweaked so that log4j2 is included in the spark distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38502) Distribution with hadoop-provided is missing log4j2

2022-03-10 Thread Emil Ejbyfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504749#comment-17504749
 ] 

Emil Ejbyfeldt commented on SPARK-38502:


Duplicate of SPARK-38516

> Distribution with hadoop-provided is missing log4j2
> ---
>
> Key: SPARK-38502
> URL: https://issues.apache.org/jira/browse/SPARK-38502
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> Currently building spark 3.3.0-SNAPSHOT using `./dev/make-distribution.sh 
> --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn` script will build 
> a package that does not included log4j2. Trying to run spark-submit with the 
> latest hadoop release 3.3.2 and this build will result in
> {code:java}
> $ spark-submit run-example org.apache.spark.examples.SparkPi
> Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
> {code}
> Since log4j2 is not found. So I believe the maven build settings needs to be 
> tweaked so that log4j2 is included in the spark distribution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38520) Overflow occurs when reading ANSI day time interval from CSV file

2022-03-10 Thread chong (Jira)
chong created SPARK-38520:
-

 Summary: Overflow occurs when reading ANSI day time interval from 
CSV file
 Key: SPARK-38520
 URL: https://issues.apache.org/jira/browse/SPARK-38520
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: chong


*Problem:*

Overflow occurs when reading the following positive intervals, the results 
become to negative

interval '106751992' day     => INTERVAL '-106751990' DAY

INTERVAL +'+2562047789' hour => INTERVAL '-2562047787' HOUR

interval '153722867281' minute => INTERVAL '-153722867280' MINUTE

 

*Produce:*
```
// days overflow
 scala> val schema = StructType(Seq(StructField("c1",
   DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY
 scala> spark.read.csv(path).show(false)
 ++
 |_c0                     |
 ++
 |interval '106751992' day|
 ++
 scala> spark.read.schema(schema).csv(path).show(false)
 +-+
 |c1                       |
 +-+
 |INTERVAL '-106751990' DAY|
 +-+
  // hour overflow
 scala> val schema = StructType(Seq(StructField("c1",
   DayTimeIntervalType(DayTimeIntervalType.HOUR, DayTimeIntervalType.HOUR
 scala> spark.read.csv(path).show(false)
 ++
 |_c0                         |
 ++
 |INTERVAL +'+2562047789' hour|
 ++
 scala> spark.read.schema(schema).csv(path).show(false)
 +---+
 |c1                         |
 +---+
 |INTERVAL '-2562047787' HOUR|
 +---+
 // minute overflow
 scala> val schema = StructType(Seq(StructField("c1",
   DayTimeIntervalType(DayTimeIntervalType.MINUTE, 
DayTimeIntervalType.MINUTE
 scala> spark.read.csv(path).show(false)
 +--+
 |_c0                           |
 +--+
 |interval '153722867281' minute|
 +--+
 scala> spark.read.schema(schema).csv(path).show(false)
 +---+
 |c1                             |
 +---+
 |INTERVAL '-153722867280' MINUTE|
 +---+
```
 

*others:*

Also check the negative value is read to positive.

 

others:

should check the negative also, 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38519:


Assignee: (was: Apache Spark)

> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception inside SparkFatalException 
> and unwarp it before throw.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504734#comment-17504734
 ] 

Apache Spark commented on SPARK-38519:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/35814

> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception inside SparkFatalException 
> and unwarp it before throw.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38519:


Assignee: Apache Spark

> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception inside SparkFatalException 
> and unwarp it before throw.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-38519:
--
Description: 
BroadcastExchangeExec will wrap fatal exception inside SparkFatalException and 
unwarp it before throw.

AQE should also respect SparkFatalException  and throw original error.
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}

  was:
BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp it  before throw.

AQE should also respect SparkFatalException  and throw original error.
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}


> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception inside SparkFatalException 
> and unwarp it before throw.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-38519:
--
Description: 
BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp it  before throw.

AQE should also respect SparkFatalException  and throw original error.
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}

  was:
BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp it in some place.

AQE should also respect SparkFatalException  and throw original error.
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}


> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
> unwarp it  before throw.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-38519:
--
Description: 
BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp it in some place.

AQE should also respect SparkFatalException  and throw original error.
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}

  was:
BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp in some place during catch SparkFatalException.

 

AQE should also respect SparkFatalException  and throw original error.

 
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}


> AQE throw exception should respect SparkFatalException
> --
>
> Key: SPARK-38519
> URL: https://issues.apache.org/jira/browse/SPARK-38519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
> unwarp it in some place.
> AQE should also respect SparkFatalException  and throw original error.
> {code:java}
> Caused by: org.apache.spark.util.SparkFatalException
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38519) AQE throw exception should respect SparkFatalException

2022-03-10 Thread XiDuo You (Jira)
XiDuo You created SPARK-38519:
-

 Summary: AQE throw exception should respect SparkFatalException
 Key: SPARK-38519
 URL: https://issues.apache.org/jira/browse/SPARK-38519
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: XiDuo You


BroadcastExchangeExec will wrap fatal exception in SparkFatalException and 
unwarp in some place during catch SparkFatalException.

 

AQE should also respect SparkFatalException  and throw original error.

 
{code:java}
Caused by: org.apache.spark.util.SparkFatalException
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37273) Hidden File Metadata Support for Spark SQL

2022-03-10 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-37273:

Labels: release-notes  (was: )

> Hidden File Metadata Support for Spark SQL
> --
>
> Key: SPARK-37273
> URL: https://issues.apache.org/jira/browse/SPARK-37273
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yaohua Zhao
>Assignee: Yaohua Zhao
>Priority: Major
>  Labels: release-notes
> Fix For: 3.3.0
>
>
> Provide a new interface in Spark SQL that allows users to query the metadata 
> of the input files for all file formats, expose them as *built-in hidden 
> columns* meaning *users can only see them when they explicitly reference 
> them* (e.g. file path, file name)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504721#comment-17504721
 ] 

Apache Spark commented on SPARK-38518:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/35813

> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
> --
>
> Key: SPARK-38518
> URL: https://issues.apache.org/jira/browse/SPARK-38518
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38518:


Assignee: (was: Apache Spark)

> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
> --
>
> Key: SPARK-38518
> URL: https://issues.apache.org/jira/browse/SPARK-38518
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38518:


Assignee: Apache Spark

> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
> --
>
> Key: SPARK-38518
> URL: https://issues.apache.org/jira/browse/SPARK-38518
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values

2022-03-10 Thread Xinrong Meng (Jira)
Xinrong Meng created SPARK-38518:


 Summary: Implement `skipna` of `Series.all/Index.all` to exclude 
NA/null values
 Key: SPARK-38518
 URL: https://issues.apache.org/jira/browse/SPARK-38518
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng


Implement `skipna` of `Series.all/Index.all` to exclude NA/null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided

2022-03-10 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-38516:

Summary: Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if 
active hadoop-provided  (was: Add log4j-core and log4j-api to classpath if 
active hadoop-provided)

> Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active 
> hadoop-provided
> -
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-10 Thread qian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504703#comment-17504703
 ] 

qian commented on SPARK-38507:
--

Hi [~amavrommatis]
The reason for this problem is because you alias dataframe *df* as *df*, 
resulting in a shema conflict.

You can try this command:

{code:scala}
df.withColumn("field3", lit(0)).select("field3").show(2)
{code}

While this command works, the result is not right
{code:scala}
df.withColumn("df.field2", lit(0)).select("df.field2").show(2) 
{code}

Result is origin column *field2*, not your new column *df.field2*, the value of 
which is 0.

> DataFrame withColumn method not adding or replacing columns when alias is used
> --
>
> Key: SPARK-38507
> URL: https://issues.apache.org/jira/browse/SPARK-38507
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Alexandros Mavrommatis
>Priority: Major
>  Labels: SQL, catalyst
>
> I have an input DataFrame *df* created as follows:
> {code:java}
> import spark.implicits._
> val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
> When I execute either this command:
> {code:java}
> df.select("df.field2").show(2) {code}
> or that one:
> {code:java}
> df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
> I get the same result:
> {code:java}
> +--+
> |field2|
> +--+
> |    10|
> |    20|
> +--+ {code}
> Additionally, when I execute the following command:
> {code:java}
> df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
> I get this exception:
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
> input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
> Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       
> +- Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation 
> [_1#2, _2#3]  at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)  
>  at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
>    at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   
> at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)  
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
> scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
> scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184)   
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93)
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAn

[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38511:
-

Assignee: Dongjoon Hyun

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38511.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35807
[https://github.com/apache/spark/pull/35807]

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-38517:


Assignee: Hyukjin Kwon

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-38517.
--
Fix Version/s: 3.3.0
   3.2.2
   Resolution: Fixed

Issue resolved by pull request 35812
[https://github.com/apache/spark/pull/35812]

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504685#comment-17504685
 ] 

Apache Spark commented on SPARK-38517:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/35812

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38517:


Assignee: (was: Apache Spark)

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38517:


Assignee: Apache Spark

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504684#comment-17504684
 ] 

Apache Spark commented on SPARK-38517:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/35812

> Fix PySpark documentation generation (missing ipython_genutils)
> ---
>
> Key: SPARK-38517
> URL: https://issues.apache.org/jira/browse/SPARK-38517
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> Extension error:
> Could not import extension nbsphinx (exception: No module named 
> 'ipython_genutils')
> make: *** [Makefile:35: html] Error 2
> 
>   Jekyll 4.2.1   Please append `--trace` to the `build` command 
>  for any additional information or backtrace. 
> 
> /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
> Python doc generation failed (RuntimeError)
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
>  `block in require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
>  `require_with_graceful_fail'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
>  `block in require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `each'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
>  `require_plugin_files'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
>  `conscientious_require'
>   from 
> /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
>  `setup'
> {code}
> https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504682#comment-17504682
 ] 

Apache Spark commented on SPARK-38516:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35811

> Add log4j-core and log4j-api to classpath if active hadoop-provided
> ---
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38516:


Assignee: (was: Apache Spark)

> Add log4j-core and log4j-api to classpath if active hadoop-provided
> ---
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)

2022-03-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-38517:


 Summary: Fix PySpark documentation generation (missing 
ipython_genutils)
 Key: SPARK-38517
 URL: https://issues.apache.org/jira/browse/SPARK-38517
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.2.1, 3.3.0
Reporter: Hyukjin Kwon


{code}
Extension error:
Could not import extension nbsphinx (exception: No module named 
'ipython_genutils')
make: *** [Makefile:35: html] Error 2

  Jekyll 4.2.1   Please append `--trace` to the `build` command 
 for any additional information or backtrace. 

/__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': 
Python doc generation failed (RuntimeError)
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `require'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in
 `block in require_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in
 `require_with_graceful_fail'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in
 `block in require_plugin_files'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `each'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in
 `require_plugin_files'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in
 `conscientious_require'
from 
/__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in
 `setup'
{code}

https://github.com/apache/spark/runs/5504729423?check_suite_focus=true



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38516:


Assignee: Apache Spark

> Add log4j-core and log4j-api to classpath if active hadoop-provided
> ---
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504683#comment-17504683
 ] 

Apache Spark commented on SPARK-38516:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35811

> Add log4j-core and log4j-api to classpath if active hadoop-provided
> ---
>
> Key: SPARK-38516
> URL: https://issues.apache.org/jira/browse/SPARK-38516
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> Error: A JNI error has occurred, please check your installation and try again
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/core/Filter
>     at java.lang.Class.getDeclaredMethods0(Native Method)
>     at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
>     at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
>     at java.lang.Class.getMethod0(Class.java:3018)
>     at java.lang.Class.getMethod(Class.java:1784)
>     at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
>     at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.core.Filter
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 7 more{noformat}
> {noformat}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/logging/log4j/LogManager
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
>   at 
> org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
>   at 
> org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
>   at 
> org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
>   at org.apache.spark.SparkContext.(SparkContext.scala:563)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
>   at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.logging.log4j.LogManager
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>   ... 26 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided

2022-03-10 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-38516:
---

 Summary: Add log4j-core and log4j-api to classpath if active 
hadoop-provided
 Key: SPARK-38516
 URL: https://issues.apache.org/jira/browse/SPARK-38516
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.3.0
Reporter: Yuming Wang


{noformat}
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/logging/log4j/core/Filter
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
    at java.lang.Class.getMethod0(Class.java:3018)
    at java.lang.Class.getMethod(Class.java:1784)
    at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: 
org.apache.logging.log4j.core.Filter
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 7 more{noformat}

{noformat}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/logging/log4j/LogManager
at 
org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42)
at 
org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114)
at 
org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31)
at 
org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985)
at org.apache.spark.SparkContext.(SparkContext.scala:563)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704)
at 
org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.LogManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 26 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38515) Volcano queue is not deleted

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38515:
--
Priority: Critical  (was: Blocker)

> Volcano queue is not deleted
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> {code}
> $ k delete queue queue0
> Error from server: admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue0` state 
> is `Open`
> {code}
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38515) Volcano test fails at deleting queue

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38515:
--
Description: 
{code}
$ k delete queue queue0
Error from server: admission webhook "validatequeue.volcano.sh" denied the 
request: only queue with state `Closed` can be deleted, queue `queue0` state is 
`Open`
{code}

{code}
[info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** 
(7 minutes, 40 seconds)
[info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: DELETE at: 
https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
 Message: admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. 
Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, 
message=admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, 
metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
status=Failure, additionalProperties={}).
{code}

  was:
{code}
[info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** 
(7 minutes, 40 seconds)
[info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: DELETE at: 
https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
 Message: admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. 
Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, 
message=admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, 
metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
status=Failure, additionalProperties={}).
{code}


> Volcano test fails at deleting queue
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> $ k delete queue queue0
> Error from server: admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue0` state 
> is `Open`
> {code}
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38515) Volcano queue is not deleted

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38515:
--
Priority: Blocker  (was: Major)

> Volcano queue is not deleted
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> {code}
> $ k delete queue queue0
> Error from server: admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue0` state 
> is `Open`
> {code}
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38515) Volcano test fails at deleting queue

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38515:
--
Component/s: (was: Tests)

> Volcano test fails at deleting queue
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38515) Volcano queue is not deleted

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38515:
--
Summary: Volcano queue is not deleted  (was: Volcano test fails at deleting 
queue)

> Volcano queue is not deleted
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> $ k delete queue queue0
> Error from server: admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue0` state 
> is `Open`
> {code}
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38515) Volcano test fails at deleting queue

2022-03-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504670#comment-17504670
 ] 

Dongjoon Hyun commented on SPARK-38515:
---

cc [~yikunkero] this is happening Intel architecture EKS.

> Volcano test fails at deleting queue
> 
>
> Key: SPARK-38515
> URL: https://issues.apache.org/jira/browse/SPARK-38515
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED 
> *** (7 minutes, 40 seconds)
> [info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
> executing: DELETE at: 
> https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
>  Message: admission webhook "validatequeue.volcano.sh" denied the request: 
> only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is 
> `Open`. Received status: Status(apiVersion=v1, code=400, details=null, 
> kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the 
> request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` 
> state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
> status=Failure, additionalProperties={}).
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38515) Volcano test fails at deleting queue

2022-03-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38515:
-

 Summary: Volcano test fails at deleting queue
 Key: SPARK-38515
 URL: https://issues.apache.org/jira/browse/SPARK-38515
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes, Tests
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun


{code}
[info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** 
(7 minutes, 40 seconds)
[info]   io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: DELETE at: 
https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g.
 Message: admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. 
Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, 
message=admission webhook "validatequeue.volcano.sh" denied the request: only 
queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, 
metadata=ListMeta(_continue=null, remainingItemCount=null, 
resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, 
status=Failure, additionalProperties={}).
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-38513.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35809
[https://github.com/apache/spark/pull/35809]

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-38513:
-

Assignee: Dongjoon Hyun

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38514) Download link on spark 3.2.1 for hadoop 3.2 is wrong

2022-03-10 Thread Brett Ryan (Jira)
Brett Ryan created SPARK-38514:
--

 Summary: Download link on spark 3.2.1 for hadoop 3.2 is wrong
 Key: SPARK-38514
 URL: https://issues.apache.org/jira/browse/SPARK-38514
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.2.1
Reporter: Brett Ryan


When downloading spark 3.2.1 pre-built for hadoop, the dropdown reads:

{quote}
Pre-built for Apache Hadoop *3.3 and later*
{quote}

However the filename link reads

{quote}
spark-3.2.1-bin-*hadoop3.2*.tgz
{quote}

When downloading, the contents actually have hadoop 3.3.1 dependencies 
indicating the filename is incorrect.

https://spark.apache.org/downloads.html




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38320:


Assignee: (was: Apache Spark)

> (flat)MapGroupsWithState can timeout groups which just received inputs in the 
> same microbatch
> -
>
> Key: SPARK-38320
> URL: https://issues.apache.org/jira/browse/SPARK-38320
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Alex Balikov
>Priority: Major
>
> We have identified an issue where the RocksDB state store iterator will not 
> pick up store updates made after its creation. As a result of this, the 
> _timeoutProcessorIter_ in
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
> will not pick up state changes made during _newDataProcessorIter_ input 
> processing. The user observed behavior is that a group state may receive 
> input records and also be called with timeout in the same micro batch. This 
> contradics the public documentation for GroupState -
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html]
>  * The timeout is reset every time the function is called on a group, that 
> is, when the group has new data, or the group has timed out. So the user has 
> to set the timeout duration every time the function is called, otherwise, 
> there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504642#comment-17504642
 ] 

Apache Spark commented on SPARK-38320:
--

User 'alex-balikov' has created a pull request for this issue:
https://github.com/apache/spark/pull/35810

> (flat)MapGroupsWithState can timeout groups which just received inputs in the 
> same microbatch
> -
>
> Key: SPARK-38320
> URL: https://issues.apache.org/jira/browse/SPARK-38320
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Alex Balikov
>Priority: Major
>
> We have identified an issue where the RocksDB state store iterator will not 
> pick up store updates made after its creation. As a result of this, the 
> _timeoutProcessorIter_ in
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
> will not pick up state changes made during _newDataProcessorIter_ input 
> processing. The user observed behavior is that a group state may receive 
> input records and also be called with timeout in the same micro batch. This 
> contradics the public documentation for GroupState -
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html]
>  * The timeout is reset every time the function is called on a group, that 
> is, when the group has new data, or the group has timed out. So the user has 
> to set the timeout duration every time the function is called, otherwise, 
> there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38320:


Assignee: Apache Spark

> (flat)MapGroupsWithState can timeout groups which just received inputs in the 
> same microbatch
> -
>
> Key: SPARK-38320
> URL: https://issues.apache.org/jira/browse/SPARK-38320
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.2.1
>Reporter: Alex Balikov
>Assignee: Apache Spark
>Priority: Major
>
> We have identified an issue where the RocksDB state store iterator will not 
> pick up store updates made after its creation. As a result of this, the 
> _timeoutProcessorIter_ in
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
> will not pick up state changes made during _newDataProcessorIter_ input 
> processing. The user observed behavior is that a group state may receive 
> input records and also be called with timeout in the same micro batch. This 
> contradics the public documentation for GroupState -
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html]
>  * The timeout is reset every time the function is called on a group, that 
> is, when the group has new data, or the group has timed out. So the user has 
> to set the timeout duration every time the function is called, otherwise, 
> there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504619#comment-17504619
 ] 

Apache Spark commented on SPARK-38513:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35809

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38513:


Assignee: (was: Apache Spark)

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38513:


Assignee: Apache Spark

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504618#comment-17504618
 ] 

Apache Spark commented on SPARK-38513:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35809

> Move custom scheduler-specific configs to under 
> `spark.kubernetes.scheduler.NAME` prefix
> 
>
> Key: SPARK-38513
> URL: https://issues.apache.org/jira/browse/SPARK-38513
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix

2022-03-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38513:
-

 Summary: Move custom scheduler-specific configs to under 
`spark.kubernetes.scheduler.NAME` prefix
 Key: SPARK-38513
 URL: https://issues.apache.org/jira/browse/SPARK-38513
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38512:


Assignee: (was: Apache Spark)

> ResolveFunctions implemented incorrectly requiring multiple passes to Resolve 
> Nested Expressions 
> -
>
> Key: SPARK-38512
> URL: https://issues.apache.org/jira/browse/SPARK-38512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Alexey Kudinkin
>Priority: Critical
>
> ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
> Resolve Nested Expressions:
> While Plan object is traversed correctly in post-order (bottoms-up, 
> `plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
> traversed incorrectly in pre-order (top-down, using 
> `transformExpressionsWithPruning`):
>  
> {code:java}
> case q: LogicalPlan =>
>   q.transformExpressionsWithPruning(...) { ... } {code}
>  
> Traversing in pre-order means that attempt is taken to resolve the current 
> node, before its children are resolved, which is incorrect, since the node 
> itself could not be resolved before its children are.
> While this is not leading to failures yet, this is taxing on performance – 
> most of the expressions in Spark should be able to be resolved in a *single 
> pass* (if resolved bottoms-up, take reproducible sample at the bottom). 
> Instead, it currently takes Spark at least *N*  iterations to resolve such 
> expressions, where N is proportional to the depth of the Expression tree.
>  
> Example to reproduce: 
>  
> {code:java}
> def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: 
> StructType): Expression = {
>   val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
>   val analyzer = spark.sessionState.analyzer
>   val schemaFields = tableSchema.fields
>   val resolvedExpr = {
> val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
> schemaFields.drop(1): _*))
> val rules: Seq[Rule[LogicalPlan]] = {
>   analyzer.ResolveFunctions ::
>   analyzer.ResolveReferences ::
>   Nil
> }
> rules.foldRight(plan)((rule, plan) => rule.apply(plan))
>   .asInstanceOf[Filter]
>   .condition
>   }
>   resolvedExpr
> }
> // Invoke with
> resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), 
> 'MM/dd/')", StructType(StructField("B", StringType))){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38512:


Assignee: Apache Spark

> ResolveFunctions implemented incorrectly requiring multiple passes to Resolve 
> Nested Expressions 
> -
>
> Key: SPARK-38512
> URL: https://issues.apache.org/jira/browse/SPARK-38512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Alexey Kudinkin
>Assignee: Apache Spark
>Priority: Critical
>
> ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
> Resolve Nested Expressions:
> While Plan object is traversed correctly in post-order (bottoms-up, 
> `plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
> traversed incorrectly in pre-order (top-down, using 
> `transformExpressionsWithPruning`):
>  
> {code:java}
> case q: LogicalPlan =>
>   q.transformExpressionsWithPruning(...) { ... } {code}
>  
> Traversing in pre-order means that attempt is taken to resolve the current 
> node, before its children are resolved, which is incorrect, since the node 
> itself could not be resolved before its children are.
> While this is not leading to failures yet, this is taxing on performance – 
> most of the expressions in Spark should be able to be resolved in a *single 
> pass* (if resolved bottoms-up, take reproducible sample at the bottom). 
> Instead, it currently takes Spark at least *N*  iterations to resolve such 
> expressions, where N is proportional to the depth of the Expression tree.
>  
> Example to reproduce: 
>  
> {code:java}
> def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: 
> StructType): Expression = {
>   val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
>   val analyzer = spark.sessionState.analyzer
>   val schemaFields = tableSchema.fields
>   val resolvedExpr = {
> val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
> schemaFields.drop(1): _*))
> val rules: Seq[Rule[LogicalPlan]] = {
>   analyzer.ResolveFunctions ::
>   analyzer.ResolveReferences ::
>   Nil
> }
> rules.foldRight(plan)((rule, plan) => rule.apply(plan))
>   .asInstanceOf[Filter]
>   .condition
>   }
>   resolvedExpr
> }
> // Invoke with
> resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), 
> 'MM/dd/')", StructType(StructField("B", StringType))){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504617#comment-17504617
 ] 

Apache Spark commented on SPARK-38512:
--

User 'alexeykudinkin' has created a pull request for this issue:
https://github.com/apache/spark/pull/35808

> ResolveFunctions implemented incorrectly requiring multiple passes to Resolve 
> Nested Expressions 
> -
>
> Key: SPARK-38512
> URL: https://issues.apache.org/jira/browse/SPARK-38512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0, 3.2.1
>Reporter: Alexey Kudinkin
>Priority: Critical
>
> ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
> Resolve Nested Expressions:
> While Plan object is traversed correctly in post-order (bottoms-up, 
> `plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
> traversed incorrectly in pre-order (top-down, using 
> `transformExpressionsWithPruning`):
>  
> {code:java}
> case q: LogicalPlan =>
>   q.transformExpressionsWithPruning(...) { ... } {code}
>  
> Traversing in pre-order means that attempt is taken to resolve the current 
> node, before its children are resolved, which is incorrect, since the node 
> itself could not be resolved before its children are.
> While this is not leading to failures yet, this is taxing on performance – 
> most of the expressions in Spark should be able to be resolved in a *single 
> pass* (if resolved bottoms-up, take reproducible sample at the bottom). 
> Instead, it currently takes Spark at least *N*  iterations to resolve such 
> expressions, where N is proportional to the depth of the Expression tree.
>  
> Example to reproduce: 
>  
> {code:java}
> def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: 
> StructType): Expression = {
>   val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
>   val analyzer = spark.sessionState.analyzer
>   val schemaFields = tableSchema.fields
>   val resolvedExpr = {
> val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
> schemaFields.drop(1): _*))
> val rules: Seq[Rule[LogicalPlan]] = {
>   analyzer.ResolveFunctions ::
>   analyzer.ResolveReferences ::
>   Nil
> }
> rules.foldRight(plan)((rule, plan) => rule.apply(plan))
>   .asInstanceOf[Filter]
>   .condition
>   }
>   resolvedExpr
> }
> // Invoke with
> resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), 
> 'MM/dd/')", StructType(StructField("B", StringType))){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark

2022-03-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-38492:

Description: 
Currently, PySpark test coverage is around 91% according to codecov report: 
[https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.

  was:
Currently, PySpark test coverage is around 91% according to codecov report: 
[https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.


> Improve the test coverage for PySpark
> -
>
> Key: SPARK-38492
> URL: https://issues.apache.org/jira/browse/SPARK-38492
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, PySpark test coverage is around 91% according to codecov report: 
> [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark]
> Since there are still 9% missing tests, so I think it would be great to 
> improve our test coverage.
> Of course we might not target to 100%, but as much as possible, to the level 
> that we can currently cover with CI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

2022-03-10 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created SPARK-38512:
---

 Summary: ResolveFunctions implemented incorrectly requiring 
multiple passes to Resolve Nested Expressions 
 Key: SPARK-38512
 URL: https://issues.apache.org/jira/browse/SPARK-38512
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.1, 3.2.0
Reporter: Alexey Kudinkin


ResolveFunctions Rule is implemented incorrectly requiring multiple passes to 
Resolve Nested Expressions:

While Plan object is traversed correctly in post-order (bottoms-up, 
`plan.resolveOperatorsUpWithPruning), internally, Plan children though are 
traversed incorrectly in pre-order (top-down, using 
`transformExpressionsWithPruning`):

 
{code:java}
case q: LogicalPlan =>
  q.transformExpressionsWithPruning(...) { ... } {code}
 

Traversing in pre-order means that attempt is taken to resolve the current 
node, before its children are resolved, which is incorrect, since the node 
itself could not be resolved before its children are.

While this is not leading to failures yet, this is taxing on performance – most 
of the expressions in Spark should be able to be resolved in a *single pass* 
(if resolved bottoms-up, take reproducible sample at the bottom). Instead, it 
currently takes Spark at least *N*  iterations to resolve such expressions, 
where N is proportional to the depth of the Expression tree.

 

Example to reproduce: 

 
{code:java}
def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: StructType): 
Expression = {
  val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
  val analyzer = spark.sessionState.analyzer
  val schemaFields = tableSchema.fields

  val resolvedExpr = {
val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, 
schemaFields.drop(1): _*))
val rules: Seq[Rule[LogicalPlan]] = {
  analyzer.ResolveFunctions ::
  analyzer.ResolveReferences ::
  Nil
}

rules.foldRight(plan)((rule, plan) => rule.apply(plan))
  .asInstanceOf[Filter]
  .condition
  }
  resolvedExpr
}

// Invoke with
resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), 'MM/dd/')", 
StructType(StructField("B", StringType))){code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark

2022-03-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-38492:

Description: 
Currently, PySpark test coverage is around 91% according to codecov report: 
[https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.

  was:
Currently, PySpark test coverage is around 91% according to codecov report: 
[https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.


> Improve the test coverage for PySpark
> -
>
> Key: SPARK-38492
> URL: https://issues.apache.org/jira/browse/SPARK-38492
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, PySpark test coverage is around 91% according to codecov report: 
> [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).]
> Since there are still 9% missing tests, so I think it would be great to 
> improve our test coverage.
> Of course we might not target to 100%, but as much as possible, to the level 
> that we can currently cover with CI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark

2022-03-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-38492:

Description: 
Currently, PySpark test coverage is around 91% according to codecov report: 
[https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.

  was:
Currently, PySpark test coverage is around 91% according to codecov report 
([https://app.codecov.io/gh/apache/spark).]

Since there are still 9% missing tests, so I think it would be great to improve 
our test coverage.

Of course we might not target to 100%, but as much as possible, to the level 
that we can currently cover with CI.


> Improve the test coverage for PySpark
> -
>
> Key: SPARK-38492
> URL: https://issues.apache.org/jira/browse/SPARK-38492
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, PySpark test coverage is around 91% according to codecov report: 
> [https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).]
> Since there are still 9% missing tests, so I think it would be great to 
> improve our test coverage.
> Of course we might not target to 100%, but as much as possible, to the level 
> that we can currently cover with CI.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504552#comment-17504552
 ] 

Apache Spark commented on SPARK-38511:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35807

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38511:


Assignee: (was: Apache Spark)

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504550#comment-17504550
 ] 

Apache Spark commented on SPARK-38511:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35807

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38511:


Assignee: Apache Spark

> Remove priorityClassName propagation in favor of explicit settings
> --
>
> Key: SPARK-38511
> URL: https://issues.apache.org/jira/browse/SPARK-38511
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings

2022-03-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38511:
-

 Summary: Remove priorityClassName propagation in favor of explicit 
settings
 Key: SPARK-38511
 URL: https://issues.apache.org/jira/browse/SPARK-38511
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38510) Failure fetching JSON representation of Spark plans with Hive UDFs

2022-03-10 Thread Shardul Mahadik (Jira)
Shardul Mahadik created SPARK-38510:
---

 Summary: Failure fetching JSON representation of Spark plans with 
Hive UDFs
 Key: SPARK-38510
 URL: https://issues.apache.org/jira/browse/SPARK-38510
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Shardul Mahadik


Repro:
{code:java}
scala> spark.sql("CREATE TEMPORARY FUNCTION test_udf AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAesEncrypt'")


scala> spark.sql("SELECT test_udf('a', 'b')").queryExecution.analyzed.toJSON
scala.reflect.internal.Symbols$CyclicReference: illegal cyclic reference 
involving class InterfaceAudience
java.lang.RuntimeException: error reading Scala signature of 
org.apache.spark.sql.hive.HiveGenericUDF: illegal cyclic reference involving 
class InterfaceAudience
  at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:51)
  at 
scala.reflect.runtime.JavaMirrors$JavaMirror.unpickleClass(JavaMirrors.scala:660)
  at 
scala.reflect.runtime.SymbolLoaders$TopClassCompleter.$anonfun$complete$2(SymbolLoaders.scala:37)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at 
scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:333)
  at 
scala.reflect.runtime.SymbolLoaders$TopClassCompleter.complete(SymbolLoaders.scala:34)
  at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1551)
  at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514)
  at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$7.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:203)
  at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.$anonfun$info$1(SynchronizedSymbols.scala:158)
  at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:149)
  at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158)
  at 
scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$7.info(SynchronizedSymbols.scala:203)
  at scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1698)
  at 
scala.reflect.internal.Symbols$SymbolContextApiImpl.selfType(Symbols.scala:151)
  at scala.reflect.internal.Symbols$ClassSymbol.selfType(Symbols.scala:3287)
  at 
org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameterNames(ScalaReflection.scala:656)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:1019)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1009)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonValue$1(TreeNode.scala:1011)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonValue$1$adapted(TreeNode.scala:1011)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1011)
  at org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:1014)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.parseToJson(TreeNode.scala:1057)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$parseToJson$11(TreeNode.scala:1063)
  at scala.collection.immutable.List.map(List.scala:293)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.parseToJson(TreeNode.scala:1063)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonFields$2(TreeNode.scala:1033)
  at scala.collection.immutable.List.map(List.scala:293)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:1024)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1009)
  at org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:1014)
  at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:1000)
  ... 47 elided
{code}
This issue is due to [bug#12190 in 
Scala|https://github.com/scala/bug/issues/12190] which does not handle cyclic 
references in Java annotations correctly. The cyclic reference in this case 
comes from InterfaceAudience annotation which [annotates 
itself|https://github.com/apache/hadoop/blob/db8ae4b65448c506c9234641b2c1f9b8e894dc18/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/InterfaceAudience.java#L45].
 This annotation class is present in the type hierarchy of 
{{{}HiveGenericUDF{}}}.

A simple workaround for this issue, is to just retry the operation. It will 
succeed on the retry probably because the annotation is partially resolved from

[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-10 Thread Brian Schaefer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504529#comment-17504529
 ] 

Brian Schaefer edited comment on SPARK-38483 at 3/10/22, 7:13 PM:
--

The column name does differ between the two when selecting a struct field. 
However I think it makes sense to print out the name that the column _would_ 
take if it were selected. Seems like this should be fairly straightforward to 
handle:
{code:python}
>>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": 
>>> 1}}}])
>>> values = F.col("struct.outer_field.inner_field")
>>> print(df.select(values).schema[0].name)
inner_field
>>> print(values._jc.toString())
struct.outer_field.inner_field
>>> print(values._jc.toString().split(".")[-1]) 
inner_field{code}
 


was (Author: JIRAUSER286367):
The column name does differ between the two when selecting a struct field, but 
handling that case seems fairly straightforward.
{code:python}
>>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": 
>>> 1}}}])
>>> values = F.col("struct.outer_field.inner_field")
>>> print(df.select(values).schema[0].name)
inner_field
>>> print(values._jc.toString())
struct.outer_field.inner_field
>>> print(values._jc.toString().split(".")[-1]) 
inner_field{code}
 

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column.\_name}} that 
> obtains the name or alias of a column in a similar way as currently done in 
> the {{Column.\_\_repr\_\_}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-10 Thread Brian Schaefer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504529#comment-17504529
 ] 

Brian Schaefer commented on SPARK-38483:


The column name does differ between the two when selecting a struct field, but 
handling that case seems fairly straightforward.
{code:python}
>>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": 
>>> 1}}}])
>>> values = F.col("struct.outer_field.inner_field")
>>> print(df.select(values).schema[0].name)
inner_field
>>> print(values._jc.toString())
struct.outer_field.inner_field
>>> print(values._jc.toString().split(".")[-1]) 
inner_field{code}
 

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column.\_name}} that 
> obtains the name or alias of a column in a similar way as currently done in 
> the {{Column.\_\_repr\_\_}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504522#comment-17504522
 ] 

Apache Spark commented on SPARK-38509:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35805

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504521#comment-17504521
 ] 

Apache Spark commented on SPARK-38509:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/35805

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38509:


Assignee: Apache Spark

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38509:


Assignee: (was: Apache Spark)

> Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
> ---
>
> Key: SPARK-38509
> URL: https://issues.apache.org/jira/browse/SPARK-38509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Priority: Major
>
> 1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
> `FunctionRegistry.expressions`.
> 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
> `timestampdiff()`.
> 3. Align tests (regenerate golden files) to the syntax rules
> where the first parameter `unit` can have one of the identifiers:
>- YEAR
>- QUARTER
>- MONTH
>- WEEK
>- DAY, DAYOFYEAR (valid for timestampadd)
>- HOUR
>- MINUTE
>- SECOND
>- MILLISECOND
>- MICROSECOND
> h4. Why are the changes needed?
> 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
> arbitrary string column as the first parameter is not require by any standard.
> 2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF

2022-03-10 Thread Max Gekk (Jira)
Max Gekk created SPARK-38509:


 Summary: Unregister the TIMESTAMPADD/DIFF functions and remove 
DATE_ADD/DIFF
 Key: SPARK-38509
 URL: https://issues.apache.org/jira/browse/SPARK-38509
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk


1. Unregister the functions `timestampadd()` and `timestampdiff()` in 
`FunctionRegistry.expressions`.
2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for 
`timestampdiff()`.
3. Align tests (regenerate golden files) to the syntax rules

where the first parameter `unit` can have one of the identifiers:
   - YEAR
   - QUARTER
   - MONTH
   - WEEK
   - DAY, DAYOFYEAR (valid for timestampadd)
   - HOUR
   - MINUTE
   - SECOND
   - MILLISECOND
   - MICROSECOND

h4. Why are the changes needed?
1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with 
arbitrary string column as the first parameter is not require by any standard.
2. Remove the functions and aliases should reduce maintenance cost.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances

2022-03-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-38508:
-

 Summary: Volcano feature doesn't work on EKS graviton instances
 Key: SPARK-38508
 URL: https://issues.apache.org/jira/browse/SPARK-38508
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-10 Thread Cheng Su (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504500#comment-17504500
 ] 

Cheng Su commented on SPARK-34960:
--

Thanks [~tgraves] and [~ahussein] for commenting, and yes, if any ORC file of 
table is missing statistics at file footer, the Spark query with aggregate push 
down would be failed loudly. I agree this is not good for user experience, and 
we are planning to work on runtime fallback to read from real rows in ORC file 
if no statistics.

For now, if you have any concern to the feature, feel free to not enable in 
your environment, and that's the reason why we disable the feature by default 
to avoid failing any existing Spark workload.

For now I will create a PR to add more documentation to mention the behavior 
i.e. fail the query if any file missing statistics. For Spark 3.4/next next 
release, the runtime fallback logic will probably be added as it's too tight to 
work on the feature for Spark 3.3 (we are doing branch cut in this month), and 
we have similar problem for Parquet aggregate push down as well.

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: file_no_stats-orc.tar.gz
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38379:
--
Fix Version/s: 3.2.2
   (was: 3.3.0)

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.2.2
>
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$ad

[jira] [Updated] (SPARK-37735) Add appId interface to KubernetesConf

2022-03-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37735:
--
Fix Version/s: 3.2.2

> Add appId interface to KubernetesConf
> -
>
> Key: SPARK-37735
> URL: https://issues.apache.org/jira/browse/SPARK-37735
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>
> The appId now can be only access in KuberntesDriverConf and 
> KubernetesExecutorConf, but can't be accesss in KubernetesConf.
>  
> Some user featurestep are using KubernetesConf as init constructor parameter 
> in order to share the featurestep between driver and executor. So, we'd 
> better add appId to KubernetesConf to help such featurestep access appId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used

2022-03-10 Thread Alexandros Mavrommatis (Jira)
Alexandros Mavrommatis created SPARK-38507:
--

 Summary: DataFrame withColumn method not adding or replacing 
columns when alias is used
 Key: SPARK-38507
 URL: https://issues.apache.org/jira/browse/SPARK-38507
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Alexandros Mavrommatis


I have an input DataFrame *df* created as follows:
{code:java}
import spark.implicits._
val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code}
When I execute either this command:
{code:java}
df.select("df.field2").show(2) {code}
or that one:
{code:java}
df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code}
I get the same result:
{code:java}
+--+
|field2|
+--+
|    10|
|    20|
+--+ {code}
Additionally, when I execute the following command:
{code:java}
df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code}
I get this exception:
{code:java}
org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given 
input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- 
Project [field1#7, field2#8, 0 AS df.field3#31]    +- SubqueryAlias df       +- 
Project [_1#2 AS field1#7, _2#3 AS field2#8]          +- LocalRelation [_1#2, 
_2#3]  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155)
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342)
   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342)   
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116)
   at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132)
   at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)   at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)   at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)   at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)   at 
scala.collection.TraversableLike.map(TraversableLike.scala:238)   at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231)   at 
scala.collection.AbstractTraversable.map(Traversable.scala:108)   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137)
   at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104)
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152)
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93)
   at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184)   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93)
   at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:90)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:176)
   at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228)
   at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:173)
   at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73)
   at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
   at 
org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)   at 
org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
   at 
org.apache.spar

[jira] [Resolved] (SPARK-38501) Fix thriftserver test failures under ANSI mode

2022-03-10 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-38501.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35802
[https://github.com/apache/spark/pull/35802]

> Fix thriftserver test failures under ANSI mode
> --
>
> Key: SPARK-38501
> URL: https://issues.apache.org/jira/browse/SPARK-38501
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-10 Thread Brian Schaefer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504448#comment-17504448
 ] 

Brian Schaefer commented on SPARK-38483:


Could you provide an example of when the real column names would be different?

At least for basic examples, it looks like the real column names match those 
found using {{{}Column._jc.toString(){}}}. With some careful regex it may also 
be possible to catch aliases.
{code:python}
>>> df = spark.createDataFrame([{"values": [1,2,3]}])
>>> values = F.col("values")
>>> print(df.select(values).schema[0].name)
values
>>> print(values._jc.toString())
values

>>> import re
>>> aliased_values = F.col("values").alias("aliased")
>>> print(df.select(aliased_values).schema[0].name)
aliased
>>> print(re.match(".*`(.*)`", aliased_values._jc.toString())[1])
aliased
{code}

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column.\_name}} that 
> obtains the name or alias of a column in a similar way as currently done in 
> the {{Column.\_\_repr\_\_}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class

2022-03-10 Thread Brian Schaefer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503782#comment-17503782
 ] 

Brian Schaefer edited comment on SPARK-38483 at 3/10/22, 5:08 PM:
--

Extracting the column name from the {{Column.\_\_repr\_\_}} method has been 
discussed on StackExchange: [https://stackoverflow.com/a/43150264.] However, it 
would be useful to have the column name more easily accessible.


was (Author: JIRAUSER286367):
Extracting the column name from the {{Column.__repr__}} method has been 
discussed on StackExchange: [https://stackoverflow.com/a/43150264.] However, it 
would be useful to have the column name more easily accessible.

> Column name or alias as an attribute of the PySpark Column class
> 
>
> Key: SPARK-38483
> URL: https://issues.apache.org/jira/browse/SPARK-38483
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Brian Schaefer
>Priority: Minor
>  Labels: starter
>
> Having the name of a column as an attribute of PySpark {{Column}} class 
> instances can enable some convenient patterns, for example:
> Applying a function to a column and aliasing with the original name:
> {code:java}
> values = F.col("values")
> # repeating the column name as an alias
> distinct_values = F.array_distinct(values).alias("values")
> # re-using the existing column name
> distinct_values = F.array_distinct(values).alias(values._name){code}
> Checking the column name inside a custom function and applying conditional 
> logic on the name:
> {code:java}
> def custom_function(col: Column) -> Column:
> if col._name == "my_column":
> return col.astype("int")
> return col.astype("string"){code}
> The proposal in this issue is to add a property {{Column.\_name}} that 
> obtains the name or alias of a column in a similar way as currently done in 
> the {{Column.\_\_repr\_\_}} method: 
> [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.]
>  The choice of {{_name}} intentionally avoids collision with the existing 
> {{Column.name}} method, which is an alias for {{{}Column.alias{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38451) Fix R tests under ANSI mode

2022-03-10 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-38451.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35798
[https://github.com/apache/spark/pull/35798]

> Fix R tests under ANSI mode
> ---
>
> Key: SPARK-38451
> URL: https://issues.apache.org/jira/browse/SPARK-38451
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> [https://github.com/gengliangwang/spark/runs/5461227887?check_suite_focus=true]
>  
> {quote}1. Error (test_sparkSQL.R:2064:3): SPARK-37108: expose make_date 
> expression i
> 2022-03-08T10:06:54.9600113Z Error in `handleErrors(returnStatus, conn)`: 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 661.0 failed 1 times, most recent failure: Lost task 0.0 in stage 661.0 
> (TID 570) (localhost executor driver): java.time.DateTimeException: Invalid 
> value for MonthOfYear (valid values 1 - 12): 13. If necessary set 
> spark.sql.ansi.enabled to false to bypass this error.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504374#comment-17504374
 ] 

Ahmed Hussein edited comment on SPARK-34960 at 3/10/22, 3:57 PM:
-

Thanks [~chengsu] for putting up the optimization on pushed aggregates.
I am concerned that the changes introduced in this jira leads to inconsistent 
behavior in the following scenario:
 * Assume an ORC file with empty column statistics 
([^file_no_stats-orc.tar.gz]).
 * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with 
default configuration. This will be fine.
 * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There 
will be an exception because the new code assumes that an ORC file must have 
file statistics.

In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read 
jobs to fail on any ORC file with empty statistics.
This is going to be problematic for users because they would have to identify 
all ORC files or they would risk failing their jobs at runtime.

Note that according [ORC-specs|https://orc.apache.org/specification], the 
statistics are optional even for the futuristic ORCV2.

I second [~tgraves] that there should be a way to recover safely if those 
fields are missing.


was (Author: ahussein):
Thanks [~chengsu] for putting up the optimization on pushed aggregates.
I am concerned that the changes introduced in this jira leads to inconsistent 
behavior in the following scenario:
 * Assume an ORC file with empty column statistics (no_col_stats.orc).
 * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with 
default configuration. This will be fine.
 * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There 
will be an exception because the new code assumes that an ORC file must have 
file statistics.

In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read 
jobs to fail on any ORC file with empty statistics.
This is going to be problematic for users because they would have to identify 
all ORC files or they would risk failing their jobs at runtime.

Note that according [ORC-specs|https://orc.apache.org/specification], the 
statistics are optional even for the futuristic ORCV2.

I second [~tgraves] that there should be a way to recover safely if those 
fields are missing.

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: file_no_stats-orc.tar.gz
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-10 Thread Ahmed Hussein (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504374#comment-17504374
 ] 

Ahmed Hussein commented on SPARK-34960:
---

Thanks [~chengsu] for putting up the optimization on pushed aggregates.
I am concerned that the changes introduced in this jira leads to inconsistent 
behavior in the following scenario:
 * Assume an ORC file with empty column statistics (no_col_stats.orc).
 * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with 
default configuration. This will be fine.
 * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There 
will be an exception because the new code assumes that an ORC file must have 
file statistics.

In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read 
jobs to fail on any ORC file with empty statistics.
This is going to be problematic for users because they would have to identify 
all ORC files or they would risk failing their jobs at runtime.

Note that according [ORC-specs|https://orc.apache.org/specification], the 
statistics are optional even for the futuristic ORCV2.

I second [~tgraves] that there should be a way to recover safely if those 
fields are missing.

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: file_no_stats-orc.tar.gz
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC

2022-03-10 Thread Ahmed Hussein (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated SPARK-34960:
--
Attachment: file_no_stats-orc.tar.gz

> Aggregate (Min/Max/Count) push down for ORC
> ---
>
> Key: SPARK-34960
> URL: https://issues.apache.org/jira/browse/SPARK-34960
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Su
>Assignee: Cheng Su
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: file_no_stats-orc.tar.gz
>
>
> Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we 
> can also push down certain aggregations into ORC. ORC exposes column 
> statistics in interface `org.apache.orc.Reader` 
> ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118]
>  ), where Spark can utilize for aggregation push down.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38505) Make partial aggregation adaptive

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504349#comment-17504349
 ] 

Apache Spark commented on SPARK-38505:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35806

> Make partial aggregation adaptive
> -
>
> Key: SPARK-38505
> URL: https://issues.apache.org/jira/browse/SPARK-38505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can skip do partial aggregation to avoid spilling if this step does not 
> reduce the number of rows too much.
> https://github.com/trinodb/trino/pull/11011



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38505) Make partial aggregation adaptive

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38505:


Assignee: Apache Spark

> Make partial aggregation adaptive
> -
>
> Key: SPARK-38505
> URL: https://issues.apache.org/jira/browse/SPARK-38505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> We can skip do partial aggregation to avoid spilling if this step does not 
> reduce the number of rows too much.
> https://github.com/trinodb/trino/pull/11011



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38505) Make partial aggregation adaptive

2022-03-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38505:


Assignee: (was: Apache Spark)

> Make partial aggregation adaptive
> -
>
> Key: SPARK-38505
> URL: https://issues.apache.org/jira/browse/SPARK-38505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can skip do partial aggregation to avoid spilling if this step does not 
> reduce the number of rows too much.
> https://github.com/trinodb/trino/pull/11011



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38505) Make partial aggregation adaptive

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504350#comment-17504350
 ] 

Apache Spark commented on SPARK-38505:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/35806

> Make partial aggregation adaptive
> -
>
> Key: SPARK-38505
> URL: https://issues.apache.org/jira/browse/SPARK-38505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can skip do partial aggregation to avoid spilling if this step does not 
> reduce the number of rows too much.
> https://github.com/trinodb/trino/pull/11011



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38506) Push partial aggregation through join

2022-03-10 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-38506:
---

 Summary: Push partial aggregation through join
 Key: SPARK-38506
 URL: https://issues.apache.org/jira/browse/SPARK-38506
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yuming Wang


Please see 
https://docs.teradata.com/r/Teradata-VantageTM-SQL-Request-and-Transaction-Processing/March-2019/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization
 for more details.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504336#comment-17504336
 ] 

Apache Spark commented on SPARK-37735:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/35804

> Add appId interface to KubernetesConf
> -
>
> Key: SPARK-37735
> URL: https://issues.apache.org/jira/browse/SPARK-37735
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>
> The appId now can be only access in KuberntesDriverConf and 
> KubernetesExecutorConf, but can't be accesss in KubernetesConf.
>  
> Some user featurestep are using KubernetesConf as init constructor parameter 
> in order to share the featurestep between driver and executor. So, we'd 
> better add appId to KubernetesConf to help such featurestep access appId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504337#comment-17504337
 ] 

Apache Spark commented on SPARK-37735:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/35804

> Add appId interface to KubernetesConf
> -
>
> Key: SPARK-37735
> URL: https://issues.apache.org/jira/browse/SPARK-37735
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>
> The appId now can be only access in KuberntesDriverConf and 
> KubernetesExecutorConf, but can't be accesss in KubernetesConf.
>  
> Some user featurestep are using KubernetesConf as init constructor parameter 
> in order to share the featurestep between driver and executor. So, we'd 
> better add appId to KubernetesConf to help such featurestep access appId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38505) Make partial aggregation adaptive

2022-03-10 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-38505:
---

 Summary: Make partial aggregation adaptive
 Key: SPARK-38505
 URL: https://issues.apache.org/jira/browse/SPARK-38505
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yuming Wang


We can skip do partial aggregation to avoid spilling if this step does not 
reduce the number of rows too much.

https://github.com/trinodb/trino/pull/11011





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504335#comment-17504335
 ] 

Apache Spark commented on SPARK-37735:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/35804

> Add appId interface to KubernetesConf
> -
>
> Key: SPARK-37735
> URL: https://issues.apache.org/jira/browse/SPARK-37735
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.3.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
> Fix For: 3.3.0
>
>
> The appId now can be only access in KuberntesDriverConf and 
> KubernetesExecutorConf, but can't be accesss in KubernetesConf.
>  
> Some user featurestep are using KubernetesConf as init constructor parameter 
> in order to share the featurestep between driver and executor. So, we'd 
> better add appId to KubernetesConf to help such featurestep access appId.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504334#comment-17504334
 ] 

Apache Spark commented on SPARK-38379:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/35804

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.3.0
>
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117

[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes

2022-03-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504333#comment-17504333
 ] 

Apache Spark commented on SPARK-38379:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/35804

> Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes 
> --
>
> Key: SPARK-38379
> URL: https://issues.apache.org/jira/browse/SPARK-38379
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.3.0
>
>
> I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in 
> client mode.  I'm using persistent local volumes to mount nvme under /data in 
> the executors and on startup the driver always throws the warning below.
> using these options:
> --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data
>  \
>      --conf 
> spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false
>  
>  
> {code:java}
> 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when 
> notifying snapshot subscriber.
> java.util.NoSuchElementException: spark.app.id
>         at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.SparkConf.get(SparkConf.scala:245)
>         at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at scala.collection.Iterator.foreach(Iterator.scala:943)
>         at scala.collection.Iterator.foreach$(Iterator.scala:943)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>         at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>         at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57)
>         at 
> org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
>         at 
> scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
>         at scala.collection.immutable.List.foldLeft(List.scala:91)
>         at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339)
>         at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>         at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339)
>         at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117

[jira] [Commented] (SPARK-38330) Certificate doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]

2022-03-10 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504286#comment-17504286
 ] 

Steve Loughran commented on SPARK-38330:


this is a hadoop issue -create a Jira there and file as a causes.

 
 # the aws sdk bundled jar has its own httpclient, so upgrading that may fix it
 # and recent hadoop releases let you switch to openssl if it is on the system, 
so has it handling certs

> Certificate doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
> --
>
> Key: SPARK-38330
> URL: https://issues.apache.org/jira/browse/SPARK-38330
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 3.2.1
> Environment: Spark 3.2.1 built with `hadoop-cloud` flag.
> Direct access to s3 using default file committer.
> JDK8.
>  
>Reporter: André F.
>Priority: Major
>
> Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, 
> lead us to the current exception while reading files on s3:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on 
> s3a:///.parquet: com.amazonaws.SdkClientException: Unable to 
> execute HTTP request: Certificate for  doesn't match 
> any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: 
> Unable to execute HTTP request: Certificate for  doesn't match any of 
> the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at 
> org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) 
> at 
> org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at 
> org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596) {code}
>  
> {code:java}
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for 
>  doesn't match any of the subject alternative names: 
> [*.s3.amazonaws.com, s3.amazonaws.com]
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507)
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437)
>   at 
> com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
>   at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
>   at com.amazonaws.http.conn.$Proxy16.connect(Unknown Source)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>   at 
> com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>   at 
> com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>  

  1   2   >