[jira] [Assigned] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode
[ https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38521: Assignee: (was: Apache Spark) > Throw Exception if overwriting hive partition table with dynamic and > staticPartitionOverwriteMode > - > > Key: SPARK-38521 > URL: https://issues.apache.org/jira/browse/SPARK-38521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Jackey Lee >Priority: Major > > The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the > existing data of the table through staticmode, but for hive table, it is > disastrous. It may deleting all data in hive partitioned table while writing > with dynamic overwrite and `partitionOverwriteMode=STATIC`. > Here we add a check for this and throw Exception if this happends. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode
[ https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504771#comment-17504771 ] Apache Spark commented on SPARK-38521: -- User 'jackylee-ch' has created a pull request for this issue: https://github.com/apache/spark/pull/35815 > Throw Exception if overwriting hive partition table with dynamic and > staticPartitionOverwriteMode > - > > Key: SPARK-38521 > URL: https://issues.apache.org/jira/browse/SPARK-38521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Jackey Lee >Priority: Major > > The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the > existing data of the table through staticmode, but for hive table, it is > disastrous. It may deleting all data in hive partitioned table while writing > with dynamic overwrite and `partitionOverwriteMode=STATIC`. > Here we add a check for this and throw Exception if this happends. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode
[ https://issues.apache.org/jira/browse/SPARK-38521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38521: Assignee: Apache Spark > Throw Exception if overwriting hive partition table with dynamic and > staticPartitionOverwriteMode > - > > Key: SPARK-38521 > URL: https://issues.apache.org/jira/browse/SPARK-38521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Jackey Lee >Assignee: Apache Spark >Priority: Major > > The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the > existing data of the table through staticmode, but for hive table, it is > disastrous. It may deleting all data in hive partitioned table while writing > with dynamic overwrite and `partitionOverwriteMode=STATIC`. > Here we add a check for this and throw Exception if this happends. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38522) Strengthen the contract on iterator method in StateStore
Jungtaek Lim created SPARK-38522: Summary: Strengthen the contract on iterator method in StateStore Key: SPARK-38522 URL: https://issues.apache.org/jira/browse/SPARK-38522 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.3.0 Reporter: Jungtaek Lim The root cause of SPARK-38320 was that the logic initialized the iterator first, and performed some updates against state store, and iterated through iterator expecting that all updates in between should be visible in iterator. That is not guaranteed in RocksDB state store, and the contract of Java ConcurrentHashMap which is used in HDFSBackedStateStore does not also guarantee it. It would be clearer if we update the contract to draw a line on behavioral guarantee to callers so that callers don't get such expectation. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-38509. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35805 [https://github.com/apache/spark/pull/35805] > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Fix For: 3.3.0 > > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-38509: Assignee: Max Gekk > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38521) Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode
Jackey Lee created SPARK-38521: -- Summary: Throw Exception if overwriting hive partition table with dynamic and staticPartitionOverwriteMode Key: SPARK-38521 URL: https://issues.apache.org/jira/browse/SPARK-38521 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Jackey Lee The `spark.sql.sources.partitionOverwriteMode` allows us to overwrite the existing data of the table through staticmode, but for hive table, it is disastrous. It may deleting all data in hive partitioned table while writing with dynamic overwrite and `partitionOverwriteMode=STATIC`. Here we add a check for this and throw Exception if this happends. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38502) Distribution with hadoop-provided is missing log4j2
[ https://issues.apache.org/jira/browse/SPARK-38502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emil Ejbyfeldt resolved SPARK-38502. Resolution: Duplicate > Distribution with hadoop-provided is missing log4j2 > --- > > Key: SPARK-38502 > URL: https://issues.apache.org/jira/browse/SPARK-38502 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > Currently building spark 3.3.0-SNAPSHOT using `./dev/make-distribution.sh > --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn` script will build > a package that does not included log4j2. Trying to run spark-submit with the > latest hadoop release 3.3.2 and this build will result in > {code:java} > $ spark-submit run-example org.apache.spark.examples.SparkPi > Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit > Caused by: java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > {code} > Since log4j2 is not found. So I believe the maven build settings needs to be > tweaked so that log4j2 is included in the spark distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38502) Distribution with hadoop-provided is missing log4j2
[ https://issues.apache.org/jira/browse/SPARK-38502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504749#comment-17504749 ] Emil Ejbyfeldt commented on SPARK-38502: Duplicate of SPARK-38516 > Distribution with hadoop-provided is missing log4j2 > --- > > Key: SPARK-38502 > URL: https://issues.apache.org/jira/browse/SPARK-38502 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > Currently building spark 3.3.0-SNAPSHOT using `./dev/make-distribution.sh > --tgz --name hadoop-provided-test -Phadoop-provided -Pyarn` script will build > a package that does not included log4j2. Trying to run spark-submit with the > latest hadoop release 3.3.2 and this build will result in > {code:java} > $ spark-submit run-example org.apache.spark.examples.SparkPi > Error: Unable to initialize main class org.apache.spark.deploy.SparkSubmit > Caused by: java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > {code} > Since log4j2 is not found. So I believe the maven build settings needs to be > tweaked so that log4j2 is included in the spark distribution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38520) Overflow occurs when reading ANSI day time interval from CSV file
chong created SPARK-38520: - Summary: Overflow occurs when reading ANSI day time interval from CSV file Key: SPARK-38520 URL: https://issues.apache.org/jira/browse/SPARK-38520 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: chong *Problem:* Overflow occurs when reading the following positive intervals, the results become to negative interval '106751992' day => INTERVAL '-106751990' DAY INTERVAL +'+2562047789' hour => INTERVAL '-2562047787' HOUR interval '153722867281' minute => INTERVAL '-153722867280' MINUTE *Produce:* ``` // days overflow scala> val schema = StructType(Seq(StructField("c1", DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY scala> spark.read.csv(path).show(false) ++ |_c0 | ++ |interval '106751992' day| ++ scala> spark.read.schema(schema).csv(path).show(false) +-+ |c1 | +-+ |INTERVAL '-106751990' DAY| +-+ // hour overflow scala> val schema = StructType(Seq(StructField("c1", DayTimeIntervalType(DayTimeIntervalType.HOUR, DayTimeIntervalType.HOUR scala> spark.read.csv(path).show(false) ++ |_c0 | ++ |INTERVAL +'+2562047789' hour| ++ scala> spark.read.schema(schema).csv(path).show(false) +---+ |c1 | +---+ |INTERVAL '-2562047787' HOUR| +---+ // minute overflow scala> val schema = StructType(Seq(StructField("c1", DayTimeIntervalType(DayTimeIntervalType.MINUTE, DayTimeIntervalType.MINUTE scala> spark.read.csv(path).show(false) +--+ |_c0 | +--+ |interval '153722867281' minute| +--+ scala> spark.read.schema(schema).csv(path).show(false) +---+ |c1 | +---+ |INTERVAL '-153722867280' MINUTE| +---+ ``` *others:* Also check the negative value is read to positive. others: should check the negative also, -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38519: Assignee: (was: Apache Spark) > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BroadcastExchangeExec will wrap fatal exception inside SparkFatalException > and unwarp it before throw. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504734#comment-17504734 ] Apache Spark commented on SPARK-38519: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/35814 > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BroadcastExchangeExec will wrap fatal exception inside SparkFatalException > and unwarp it before throw. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38519: Assignee: Apache Spark > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > BroadcastExchangeExec will wrap fatal exception inside SparkFatalException > and unwarp it before throw. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-38519: -- Description: BroadcastExchangeExec will wrap fatal exception inside SparkFatalException and unwarp it before throw. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp it before throw. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BroadcastExchangeExec will wrap fatal exception inside SparkFatalException > and unwarp it before throw. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-38519: -- Description: BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp it before throw. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp it in some place. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BroadcastExchangeExec will wrap fatal exception in SparkFatalException and > unwarp it before throw. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38519) AQE throw exception should respect SparkFatalException
[ https://issues.apache.org/jira/browse/SPARK-38519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-38519: -- Description: BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp it in some place. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} was: BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp in some place during catch SparkFatalException. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} > AQE throw exception should respect SparkFatalException > -- > > Key: SPARK-38519 > URL: https://issues.apache.org/jira/browse/SPARK-38519 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > BroadcastExchangeExec will wrap fatal exception in SparkFatalException and > unwarp it in some place. > AQE should also respect SparkFatalException and throw original error. > {code:java} > Caused by: org.apache.spark.util.SparkFatalException > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38519) AQE throw exception should respect SparkFatalException
XiDuo You created SPARK-38519: - Summary: AQE throw exception should respect SparkFatalException Key: SPARK-38519 URL: https://issues.apache.org/jira/browse/SPARK-38519 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: XiDuo You BroadcastExchangeExec will wrap fatal exception in SparkFatalException and unwarp in some place during catch SparkFatalException. AQE should also respect SparkFatalException and throw original error. {code:java} Caused by: org.apache.spark.util.SparkFatalException at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.$anonfun$relationFuture$1(BroadcastExchangeExec.scala:168) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$1(SQLExecution.scala:191) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37273) Hidden File Metadata Support for Spark SQL
[ https://issues.apache.org/jira/browse/SPARK-37273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-37273: Labels: release-notes (was: ) > Hidden File Metadata Support for Spark SQL > -- > > Key: SPARK-37273 > URL: https://issues.apache.org/jira/browse/SPARK-37273 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yaohua Zhao >Assignee: Yaohua Zhao >Priority: Major > Labels: release-notes > Fix For: 3.3.0 > > > Provide a new interface in Spark SQL that allows users to query the metadata > of the input files for all file formats, expose them as *built-in hidden > columns* meaning *users can only see them when they explicitly reference > them* (e.g. file path, file name) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
[ https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504721#comment-17504721 ] Apache Spark commented on SPARK-38518: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/35813 > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values > -- > > Key: SPARK-38518 > URL: https://issues.apache.org/jira/browse/SPARK-38518 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
[ https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38518: Assignee: (was: Apache Spark) > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values > -- > > Key: SPARK-38518 > URL: https://issues.apache.org/jira/browse/SPARK-38518 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Priority: Major > > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
[ https://issues.apache.org/jira/browse/SPARK-38518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38518: Assignee: Apache Spark > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values > -- > > Key: SPARK-38518 > URL: https://issues.apache.org/jira/browse/SPARK-38518 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38518) Implement `skipna` of `Series.all/Index.all` to exclude NA/null values
Xinrong Meng created SPARK-38518: Summary: Implement `skipna` of `Series.all/Index.all` to exclude NA/null values Key: SPARK-38518 URL: https://issues.apache.org/jira/browse/SPARK-38518 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng Implement `skipna` of `Series.all/Index.all` to exclude NA/null values. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38516) Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-38516: Summary: Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active hadoop-provided (was: Add log4j-core and log4j-api to classpath if active hadoop-provided) > Add log4j-core, log4j-api and log4j-slf4j-impl to classpath if active > hadoop-provided > - > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
[ https://issues.apache.org/jira/browse/SPARK-38507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504703#comment-17504703 ] qian commented on SPARK-38507: -- Hi [~amavrommatis] The reason for this problem is because you alias dataframe *df* as *df*, resulting in a shema conflict. You can try this command: {code:scala} df.withColumn("field3", lit(0)).select("field3").show(2) {code} While this command works, the result is not right {code:scala} df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} Result is origin column *field2*, not your new column *df.field2*, the value of which is 0. > DataFrame withColumn method not adding or replacing columns when alias is used > -- > > Key: SPARK-38507 > URL: https://issues.apache.org/jira/browse/SPARK-38507 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Alexandros Mavrommatis >Priority: Major > Labels: SQL, catalyst > > I have an input DataFrame *df* created as follows: > {code:java} > import spark.implicits._ > val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} > When I execute either this command: > {code:java} > df.select("df.field2").show(2) {code} > or that one: > {code:java} > df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} > I get the same result: > {code:java} > +--+ > |field2| > +--+ > | 10| > | 20| > +--+ {code} > Additionally, when I execute the following command: > {code:java} > df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} > I get this exception: > {code:java} > org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given > input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- > Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df > +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation > [_1#2, _2#3] at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > scala.collection.TraversableLike.map(TraversableLike.scala:238) at > scala.collection.TraversableLike.map$(TraversableLike.scala:231) at > scala.collection.AbstractTraversable.map(Traversable.scala:108) at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAn
[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38511: - Assignee: Dongjoon Hyun > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38511. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35807 [https://github.com/apache/spark/pull/35807] > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-38517: Assignee: Hyukjin Kwon > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-38517. -- Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 35812 [https://github.com/apache/spark/pull/35812] > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504685#comment-17504685 ] Apache Spark commented on SPARK-38517: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/35812 > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38517: Assignee: (was: Apache Spark) > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38517: Assignee: Apache Spark > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
[ https://issues.apache.org/jira/browse/SPARK-38517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504684#comment-17504684 ] Apache Spark commented on SPARK-38517: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/35812 > Fix PySpark documentation generation (missing ipython_genutils) > --- > > Key: SPARK-38517 > URL: https://issues.apache.org/jira/browse/SPARK-38517 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.1, 3.3.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > Extension error: > Could not import extension nbsphinx (exception: No module named > 'ipython_genutils') > make: *** [Makefile:35: html] Error 2 > > Jekyll 4.2.1 Please append `--trace` to the `build` command > for any additional information or backtrace. > > /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': > Python doc generation failed (RuntimeError) > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in > `block in require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in > `require_with_graceful_fail' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in > `block in require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `each' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in > `require_plugin_files' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in > `conscientious_require' > from > /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in > `setup' > {code} > https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504682#comment-17504682 ] Apache Spark commented on SPARK-38516: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35811 > Add log4j-core and log4j-api to classpath if active hadoop-provided > --- > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38516: Assignee: (was: Apache Spark) > Add log4j-core and log4j-api to classpath if active hadoop-provided > --- > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38517) Fix PySpark documentation generation (missing ipython_genutils)
Hyukjin Kwon created SPARK-38517: Summary: Fix PySpark documentation generation (missing ipython_genutils) Key: SPARK-38517 URL: https://issues.apache.org/jira/browse/SPARK-38517 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.2.1, 3.3.0 Reporter: Hyukjin Kwon {code} Extension error: Could not import extension nbsphinx (exception: No module named 'ipython_genutils') make: *** [Makefile:35: html] Error 2 Jekyll 4.2.1 Please append `--trace` to the `build` command for any additional information or backtrace. /__w/spark/spark/docs/_plugins/copy_api_dirs.rb:130:in `': Python doc generation failed (RuntimeError) from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:60:in `block in require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/external.rb:57:in `require_with_graceful_fail' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:89:in `block in require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `each' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:87:in `require_plugin_files' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/plugin_manager.rb:21:in `conscientious_require' from /__w/spark/spark/docs/.local_ruby_bundle/ruby/2.7.0/gems/jekyll-4.2.1/lib/jekyll/site.rb:131:in `setup' {code} https://github.com/apache/spark/runs/5504729423?check_suite_focus=true -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38516: Assignee: Apache Spark > Add log4j-core and log4j-api to classpath if active hadoop-provided > --- > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-38516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504683#comment-17504683 ] Apache Spark commented on SPARK-38516: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35811 > Add log4j-core and log4j-api to classpath if active hadoop-provided > --- > > Key: SPARK-38516 > URL: https://issues.apache.org/jira/browse/SPARK-38516 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > Error: A JNI error has occurred, please check your installation and try again > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/core/Filter > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) > at java.lang.Class.privateGetMethodRecursive(Class.java:3048) > at java.lang.Class.getMethod0(Class.java:3018) > at java.lang.Class.getMethod(Class.java:1784) > at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.core.Filter > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 7 more{noformat} > {noformat} > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/logging/log4j/LogManager > at > org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) > at > org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) > at > org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) > at > org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) > at org.apache.spark.SparkContext.(SparkContext.scala:563) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.logging.log4j.LogManager > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 26 more > {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38516) Add log4j-core and log4j-api to classpath if active hadoop-provided
Yuming Wang created SPARK-38516: --- Summary: Add log4j-core and log4j-api to classpath if active hadoop-provided Key: SPARK-38516 URL: https://issues.apache.org/jira/browse/SPARK-38516 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Yuming Wang {noformat} Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/core/Filter at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.privateGetMethodRecursive(Class.java:3048) at java.lang.Class.getMethod0(Class.java:3018) at java.lang.Class.getMethod(Class.java:1784) at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526) Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.core.Filter at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more{noformat} {noformat} Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/logging/log4j/LogManager at org.apache.spark.deploy.yarn.SparkRackResolver.(SparkRackResolver.scala:42) at org.apache.spark.deploy.yarn.SparkRackResolver$.get(SparkRackResolver.scala:114) at org.apache.spark.scheduler.cluster.YarnScheduler.(YarnScheduler.scala:31) at org.apache.spark.scheduler.cluster.YarnClusterManager.createTaskScheduler(YarnClusterManager.scala:35) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2985) at org.apache.spark.SparkContext.(SparkContext.scala:563) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2704) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:54) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.(SparkSQLCLIDriver.scala:327) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:159) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.logging.log4j.LogManager at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 26 more {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38515) Volcano queue is not deleted
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38515: -- Priority: Critical (was: Blocker) > Volcano queue is not deleted > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Critical > > {code} > $ k delete queue queue0 > Error from server: admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue0` state > is `Open` > {code} > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38515) Volcano test fails at deleting queue
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38515: -- Description: {code} $ k delete queue queue0 Error from server: admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue0` state is `Open` {code} {code} [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** (7 minutes, 40 seconds) [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: DELETE at: https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. Message: admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). {code} was: {code} [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** (7 minutes, 40 seconds) [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: DELETE at: https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. Message: admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). {code} > Volcano test fails at deleting queue > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > $ k delete queue queue0 > Error from server: admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue0` state > is `Open` > {code} > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38515) Volcano queue is not deleted
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38515: -- Priority: Blocker (was: Major) > Volcano queue is not deleted > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Blocker > > {code} > $ k delete queue queue0 > Error from server: admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue0` state > is `Open` > {code} > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38515) Volcano test fails at deleting queue
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38515: -- Component/s: (was: Tests) > Volcano test fails at deleting queue > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38515) Volcano queue is not deleted
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38515: -- Summary: Volcano queue is not deleted (was: Volcano test fails at deleting queue) > Volcano queue is not deleted > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > $ k delete queue queue0 > Error from server: admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue0` state > is `Open` > {code} > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38515) Volcano test fails at deleting queue
[ https://issues.apache.org/jira/browse/SPARK-38515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504670#comment-17504670 ] Dongjoon Hyun commented on SPARK-38515: --- cc [~yikunkero] this is happening Intel architecture EKS. > Volcano test fails at deleting queue > > > Key: SPARK-38515 > URL: https://issues.apache.org/jira/browse/SPARK-38515 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED > *** (7 minutes, 40 seconds) > [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure > executing: DELETE at: > https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. > Message: admission webhook "validatequeue.volcano.sh" denied the request: > only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is > `Open`. Received status: Status(apiVersion=v1, code=400, details=null, > kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the > request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` > state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, > status=Failure, additionalProperties={}). > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38515) Volcano test fails at deleting queue
Dongjoon Hyun created SPARK-38515: - Summary: Volcano test fails at deleting queue Key: SPARK-38515 URL: https://issues.apache.org/jira/browse/SPARK-38515 Project: Spark Issue Type: Sub-task Components: Kubernetes, Tests Affects Versions: 3.3.0 Reporter: Dongjoon Hyun {code} [info] org.apache.spark.deploy.k8s.integrationtest.VolcanoSuite *** ABORTED *** (7 minutes, 40 seconds) [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: DELETE at: https://44bea09e70a5147f6b5b347ec26de85f.gr7.us-west-2.eks.amazonaws.com/apis/scheduling.volcano.sh/v1beta1/queues/queue-2u-3g. Message: admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`. Received status: Status(apiVersion=v1, code=400, details=null, kind=Status, message=admission webhook "validatequeue.volcano.sh" denied the request: only queue with state `Closed` can be deleted, queue `queue-2u-3g` state is `Open`, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=null, status=Failure, additionalProperties={}). {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-38513. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35809 [https://github.com/apache/spark/pull/35809] > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-38513: - Assignee: Dongjoon Hyun > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38514) Download link on spark 3.2.1 for hadoop 3.2 is wrong
Brett Ryan created SPARK-38514: -- Summary: Download link on spark 3.2.1 for hadoop 3.2 is wrong Key: SPARK-38514 URL: https://issues.apache.org/jira/browse/SPARK-38514 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 3.2.1 Reporter: Brett Ryan When downloading spark 3.2.1 pre-built for hadoop, the dropdown reads: {quote} Pre-built for Apache Hadoop *3.3 and later* {quote} However the filename link reads {quote} spark-3.2.1-bin-*hadoop3.2*.tgz {quote} When downloading, the contents actually have hadoop 3.3.1 dependencies indicating the filename is incorrect. https://spark.apache.org/downloads.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
[ https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38320: Assignee: (was: Apache Spark) > (flat)MapGroupsWithState can timeout groups which just received inputs in the > same microbatch > - > > Key: SPARK-38320 > URL: https://issues.apache.org/jira/browse/SPARK-38320 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Alex Balikov >Priority: Major > > We have identified an issue where the RocksDB state store iterator will not > pick up store updates made after its creation. As a result of this, the > _timeoutProcessorIter_ in > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala] > will not pick up state changes made during _newDataProcessorIter_ input > processing. The user observed behavior is that a group state may receive > input records and also be called with timeout in the same micro batch. This > contradics the public documentation for GroupState - > [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html] > * The timeout is reset every time the function is called on a group, that > is, when the group has new data, or the group has timed out. So the user has > to set the timeout duration every time the function is called, otherwise, > there will not be any timeout set. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
[ https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504642#comment-17504642 ] Apache Spark commented on SPARK-38320: -- User 'alex-balikov' has created a pull request for this issue: https://github.com/apache/spark/pull/35810 > (flat)MapGroupsWithState can timeout groups which just received inputs in the > same microbatch > - > > Key: SPARK-38320 > URL: https://issues.apache.org/jira/browse/SPARK-38320 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Alex Balikov >Priority: Major > > We have identified an issue where the RocksDB state store iterator will not > pick up store updates made after its creation. As a result of this, the > _timeoutProcessorIter_ in > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala] > will not pick up state changes made during _newDataProcessorIter_ input > processing. The user observed behavior is that a group state may receive > input records and also be called with timeout in the same micro batch. This > contradics the public documentation for GroupState - > [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html] > * The timeout is reset every time the function is called on a group, that > is, when the group has new data, or the group has timed out. So the user has > to set the timeout duration every time the function is called, otherwise, > there will not be any timeout set. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
[ https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38320: Assignee: Apache Spark > (flat)MapGroupsWithState can timeout groups which just received inputs in the > same microbatch > - > > Key: SPARK-38320 > URL: https://issues.apache.org/jira/browse/SPARK-38320 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.2.1 >Reporter: Alex Balikov >Assignee: Apache Spark >Priority: Major > > We have identified an issue where the RocksDB state store iterator will not > pick up store updates made after its creation. As a result of this, the > _timeoutProcessorIter_ in > [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala] > will not pick up state changes made during _newDataProcessorIter_ input > processing. The user observed behavior is that a group state may receive > input records and also be called with timeout in the same micro batch. This > contradics the public documentation for GroupState - > [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html] > * The timeout is reset every time the function is called on a group, that > is, when the group has new data, or the group has timed out. So the user has > to set the timeout duration every time the function is called, otherwise, > there will not be any timeout set. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504619#comment-17504619 ] Apache Spark commented on SPARK-38513: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35809 > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38513: Assignee: (was: Apache Spark) > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38513: Assignee: Apache Spark > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
[ https://issues.apache.org/jira/browse/SPARK-38513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504618#comment-17504618 ] Apache Spark commented on SPARK-38513: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35809 > Move custom scheduler-specific configs to under > `spark.kubernetes.scheduler.NAME` prefix > > > Key: SPARK-38513 > URL: https://issues.apache.org/jira/browse/SPARK-38513 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38513) Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix
Dongjoon Hyun created SPARK-38513: - Summary: Move custom scheduler-specific configs to under `spark.kubernetes.scheduler.NAME` prefix Key: SPARK-38513 URL: https://issues.apache.org/jira/browse/SPARK-38513 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions
[ https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38512: Assignee: (was: Apache Spark) > ResolveFunctions implemented incorrectly requiring multiple passes to Resolve > Nested Expressions > - > > Key: SPARK-38512 > URL: https://issues.apache.org/jira/browse/SPARK-38512 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.2.1 >Reporter: Alexey Kudinkin >Priority: Critical > > ResolveFunctions Rule is implemented incorrectly requiring multiple passes to > Resolve Nested Expressions: > While Plan object is traversed correctly in post-order (bottoms-up, > `plan.resolveOperatorsUpWithPruning), internally, Plan children though are > traversed incorrectly in pre-order (top-down, using > `transformExpressionsWithPruning`): > > {code:java} > case q: LogicalPlan => > q.transformExpressionsWithPruning(...) { ... } {code} > > Traversing in pre-order means that attempt is taken to resolve the current > node, before its children are resolved, which is incorrect, since the node > itself could not be resolved before its children are. > While this is not leading to failures yet, this is taxing on performance – > most of the expressions in Spark should be able to be resolved in a *single > pass* (if resolved bottoms-up, take reproducible sample at the bottom). > Instead, it currently takes Spark at least *N* iterations to resolve such > expressions, where N is proportional to the depth of the Expression tree. > > Example to reproduce: > > {code:java} > def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: > StructType): Expression = { > val expr = spark.sessionState.sqlParser.parseExpression(exprStr) > val analyzer = spark.sessionState.analyzer > val schemaFields = tableSchema.fields > val resolvedExpr = { > val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, > schemaFields.drop(1): _*)) > val rules: Seq[Rule[LogicalPlan]] = { > analyzer.ResolveFunctions :: > analyzer.ResolveReferences :: > Nil > } > rules.foldRight(plan)((rule, plan) => rule.apply(plan)) > .asInstanceOf[Filter] > .condition > } > resolvedExpr > } > // Invoke with > resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), > 'MM/dd/')", StructType(StructField("B", StringType))){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions
[ https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38512: Assignee: Apache Spark > ResolveFunctions implemented incorrectly requiring multiple passes to Resolve > Nested Expressions > - > > Key: SPARK-38512 > URL: https://issues.apache.org/jira/browse/SPARK-38512 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.2.1 >Reporter: Alexey Kudinkin >Assignee: Apache Spark >Priority: Critical > > ResolveFunctions Rule is implemented incorrectly requiring multiple passes to > Resolve Nested Expressions: > While Plan object is traversed correctly in post-order (bottoms-up, > `plan.resolveOperatorsUpWithPruning), internally, Plan children though are > traversed incorrectly in pre-order (top-down, using > `transformExpressionsWithPruning`): > > {code:java} > case q: LogicalPlan => > q.transformExpressionsWithPruning(...) { ... } {code} > > Traversing in pre-order means that attempt is taken to resolve the current > node, before its children are resolved, which is incorrect, since the node > itself could not be resolved before its children are. > While this is not leading to failures yet, this is taxing on performance – > most of the expressions in Spark should be able to be resolved in a *single > pass* (if resolved bottoms-up, take reproducible sample at the bottom). > Instead, it currently takes Spark at least *N* iterations to resolve such > expressions, where N is proportional to the depth of the Expression tree. > > Example to reproduce: > > {code:java} > def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: > StructType): Expression = { > val expr = spark.sessionState.sqlParser.parseExpression(exprStr) > val analyzer = spark.sessionState.analyzer > val schemaFields = tableSchema.fields > val resolvedExpr = { > val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, > schemaFields.drop(1): _*)) > val rules: Seq[Rule[LogicalPlan]] = { > analyzer.ResolveFunctions :: > analyzer.ResolveReferences :: > Nil > } > rules.foldRight(plan)((rule, plan) => rule.apply(plan)) > .asInstanceOf[Filter] > .condition > } > resolvedExpr > } > // Invoke with > resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), > 'MM/dd/')", StructType(StructField("B", StringType))){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions
[ https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504617#comment-17504617 ] Apache Spark commented on SPARK-38512: -- User 'alexeykudinkin' has created a pull request for this issue: https://github.com/apache/spark/pull/35808 > ResolveFunctions implemented incorrectly requiring multiple passes to Resolve > Nested Expressions > - > > Key: SPARK-38512 > URL: https://issues.apache.org/jira/browse/SPARK-38512 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.2.1 >Reporter: Alexey Kudinkin >Priority: Critical > > ResolveFunctions Rule is implemented incorrectly requiring multiple passes to > Resolve Nested Expressions: > While Plan object is traversed correctly in post-order (bottoms-up, > `plan.resolveOperatorsUpWithPruning), internally, Plan children though are > traversed incorrectly in pre-order (top-down, using > `transformExpressionsWithPruning`): > > {code:java} > case q: LogicalPlan => > q.transformExpressionsWithPruning(...) { ... } {code} > > Traversing in pre-order means that attempt is taken to resolve the current > node, before its children are resolved, which is incorrect, since the node > itself could not be resolved before its children are. > While this is not leading to failures yet, this is taxing on performance – > most of the expressions in Spark should be able to be resolved in a *single > pass* (if resolved bottoms-up, take reproducible sample at the bottom). > Instead, it currently takes Spark at least *N* iterations to resolve such > expressions, where N is proportional to the depth of the Expression tree. > > Example to reproduce: > > {code:java} > def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: > StructType): Expression = { > val expr = spark.sessionState.sqlParser.parseExpression(exprStr) > val analyzer = spark.sessionState.analyzer > val schemaFields = tableSchema.fields > val resolvedExpr = { > val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, > schemaFields.drop(1): _*)) > val rules: Seq[Rule[LogicalPlan]] = { > analyzer.ResolveFunctions :: > analyzer.ResolveReferences :: > Nil > } > rules.foldRight(plan)((rule, plan) => rule.apply(plan)) > .asInstanceOf[Filter] > .condition > } > resolvedExpr > } > // Invoke with > resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), > 'MM/dd/')", StructType(StructField("B", StringType))){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark
[ https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-38492: Description: Currently, PySpark test coverage is around 91% according to codecov report: [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. was: Currently, PySpark test coverage is around 91% according to codecov report: [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. > Improve the test coverage for PySpark > - > > Key: SPARK-38492 > URL: https://issues.apache.org/jira/browse/SPARK-38492 > Project: Spark > Issue Type: Umbrella > Components: PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, PySpark test coverage is around 91% according to codecov report: > [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark] > Since there are still 9% missing tests, so I think it would be great to > improve our test coverage. > Of course we might not target to 100%, but as much as possible, to the level > that we can currently cover with CI. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions
Alexey Kudinkin created SPARK-38512: --- Summary: ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions Key: SPARK-38512 URL: https://issues.apache.org/jira/browse/SPARK-38512 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1, 3.2.0 Reporter: Alexey Kudinkin ResolveFunctions Rule is implemented incorrectly requiring multiple passes to Resolve Nested Expressions: While Plan object is traversed correctly in post-order (bottoms-up, `plan.resolveOperatorsUpWithPruning), internally, Plan children though are traversed incorrectly in pre-order (top-down, using `transformExpressionsWithPruning`): {code:java} case q: LogicalPlan => q.transformExpressionsWithPruning(...) { ... } {code} Traversing in pre-order means that attempt is taken to resolve the current node, before its children are resolved, which is incorrect, since the node itself could not be resolved before its children are. While this is not leading to failures yet, this is taxing on performance – most of the expressions in Spark should be able to be resolved in a *single pass* (if resolved bottoms-up, take reproducible sample at the bottom). Instead, it currently takes Spark at least *N* iterations to resolve such expressions, where N is proportional to the depth of the Expression tree. Example to reproduce: {code:java} def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: StructType): Expression = { val expr = spark.sessionState.sqlParser.parseExpression(exprStr) val analyzer = spark.sessionState.analyzer val schemaFields = tableSchema.fields val resolvedExpr = { val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, schemaFields.drop(1): _*)) val rules: Seq[Rule[LogicalPlan]] = { analyzer.ResolveFunctions :: analyzer.ResolveReferences :: Nil } rules.foldRight(plan)((rule, plan) => rule.apply(plan)) .asInstanceOf[Filter] .condition } resolvedExpr } // Invoke with resolveExpr(spark, "date_format(to_timestamp(B, '-MM-dd'), 'MM/dd/')", StructType(StructField("B", StringType))){code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark
[ https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-38492: Description: Currently, PySpark test coverage is around 91% according to codecov report: [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. was: Currently, PySpark test coverage is around 91% according to codecov report: [https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. > Improve the test coverage for PySpark > - > > Key: SPARK-38492 > URL: https://issues.apache.org/jira/browse/SPARK-38492 > Project: Spark > Issue Type: Umbrella > Components: PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, PySpark test coverage is around 91% according to codecov report: > [https://app.codecov.io/gh/apache/spark|https://app.codecov.io/gh/apache/spark).] > Since there are still 9% missing tests, so I think it would be great to > improve our test coverage. > Of course we might not target to 100%, but as much as possible, to the level > that we can currently cover with CI. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38492) Improve the test coverage for PySpark
[ https://issues.apache.org/jira/browse/SPARK-38492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-38492: Description: Currently, PySpark test coverage is around 91% according to codecov report: [https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. was: Currently, PySpark test coverage is around 91% according to codecov report ([https://app.codecov.io/gh/apache/spark).] Since there are still 9% missing tests, so I think it would be great to improve our test coverage. Of course we might not target to 100%, but as much as possible, to the level that we can currently cover with CI. > Improve the test coverage for PySpark > - > > Key: SPARK-38492 > URL: https://issues.apache.org/jira/browse/SPARK-38492 > Project: Spark > Issue Type: Umbrella > Components: PySpark, Tests >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > Currently, PySpark test coverage is around 91% according to codecov report: > [https://app.codecov.io/gh/apache/spark.|https://app.codecov.io/gh/apache/spark).] > Since there are still 9% missing tests, so I think it would be great to > improve our test coverage. > Of course we might not target to 100%, but as much as possible, to the level > that we can currently cover with CI. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504552#comment-17504552 ] Apache Spark commented on SPARK-38511: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35807 > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38511: Assignee: (was: Apache Spark) > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504550#comment-17504550 ] Apache Spark commented on SPARK-38511: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35807 > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
[ https://issues.apache.org/jira/browse/SPARK-38511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38511: Assignee: Apache Spark > Remove priorityClassName propagation in favor of explicit settings > -- > > Key: SPARK-38511 > URL: https://issues.apache.org/jira/browse/SPARK-38511 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38511) Remove priorityClassName propagation in favor of explicit settings
Dongjoon Hyun created SPARK-38511: - Summary: Remove priorityClassName propagation in favor of explicit settings Key: SPARK-38511 URL: https://issues.apache.org/jira/browse/SPARK-38511 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38510) Failure fetching JSON representation of Spark plans with Hive UDFs
Shardul Mahadik created SPARK-38510: --- Summary: Failure fetching JSON representation of Spark plans with Hive UDFs Key: SPARK-38510 URL: https://issues.apache.org/jira/browse/SPARK-38510 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Shardul Mahadik Repro: {code:java} scala> spark.sql("CREATE TEMPORARY FUNCTION test_udf AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFAesEncrypt'") scala> spark.sql("SELECT test_udf('a', 'b')").queryExecution.analyzed.toJSON scala.reflect.internal.Symbols$CyclicReference: illegal cyclic reference involving class InterfaceAudience java.lang.RuntimeException: error reading Scala signature of org.apache.spark.sql.hive.HiveGenericUDF: illegal cyclic reference involving class InterfaceAudience at scala.reflect.internal.pickling.UnPickler.unpickle(UnPickler.scala:51) at scala.reflect.runtime.JavaMirrors$JavaMirror.unpickleClass(JavaMirrors.scala:660) at scala.reflect.runtime.SymbolLoaders$TopClassCompleter.$anonfun$complete$2(SymbolLoaders.scala:37) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.reflect.internal.SymbolTable.slowButSafeEnteringPhaseNotLaterThan(SymbolTable.scala:333) at scala.reflect.runtime.SymbolLoaders$TopClassCompleter.complete(SymbolLoaders.scala:34) at scala.reflect.internal.Symbols$Symbol.completeInfo(Symbols.scala:1551) at scala.reflect.internal.Symbols$Symbol.info(Symbols.scala:1514) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$7.scala$reflect$runtime$SynchronizedSymbols$SynchronizedSymbol$$super$info(SynchronizedSymbols.scala:203) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.$anonfun$info$1(SynchronizedSymbols.scala:158) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info(SynchronizedSymbols.scala:149) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol.info$(SynchronizedSymbols.scala:158) at scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$7.info(SynchronizedSymbols.scala:203) at scala.reflect.internal.Symbols$Symbol.initialize(Symbols.scala:1698) at scala.reflect.internal.Symbols$SymbolContextApiImpl.selfType(Symbols.scala:151) at scala.reflect.internal.Symbols$ClassSymbol.selfType(Symbols.scala:3287) at org.apache.spark.sql.catalyst.ScalaReflection$.getConstructorParameterNames(ScalaReflection.scala:656) at org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:1019) at org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1009) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonValue$1(TreeNode.scala:1011) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonValue$1$adapted(TreeNode.scala:1011) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1011) at org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:1014) at org.apache.spark.sql.catalyst.trees.TreeNode.parseToJson(TreeNode.scala:1057) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$parseToJson$11(TreeNode.scala:1063) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.trees.TreeNode.parseToJson(TreeNode.scala:1063) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$jsonFields$2(TreeNode.scala:1033) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:1024) at org.apache.spark.sql.catalyst.trees.TreeNode.collectJsonValue$1(TreeNode.scala:1009) at org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:1014) at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:1000) ... 47 elided {code} This issue is due to [bug#12190 in Scala|https://github.com/scala/bug/issues/12190] which does not handle cyclic references in Java annotations correctly. The cyclic reference in this case comes from InterfaceAudience annotation which [annotates itself|https://github.com/apache/hadoop/blob/db8ae4b65448c506c9234641b2c1f9b8e894dc18/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/InterfaceAudience.java#L45]. This annotation class is present in the type hierarchy of {{{}HiveGenericUDF{}}}. A simple workaround for this issue, is to just retry the operation. It will succeed on the retry probably because the annotation is partially resolved from
[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504529#comment-17504529 ] Brian Schaefer edited comment on SPARK-38483 at 3/10/22, 7:13 PM: -- The column name does differ between the two when selecting a struct field. However I think it makes sense to print out the name that the column _would_ take if it were selected. Seems like this should be fairly straightforward to handle: {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} was (Author: JIRAUSER286367): The column name does differ between the two when selecting a struct field, but handling that case seems fairly straightforward. {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504529#comment-17504529 ] Brian Schaefer commented on SPARK-38483: The column name does differ between the two when selecting a struct field, but handling that case seems fairly straightforward. {code:python} >>> df = spark.createDataFrame([{"struct": {"outer_field": {"inner_field": >>> 1}}}]) >>> values = F.col("struct.outer_field.inner_field") >>> print(df.select(values).schema[0].name) inner_field >>> print(values._jc.toString()) struct.outer_field.inner_field >>> print(values._jc.toString().split(".")[-1]) inner_field{code} > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504522#comment-17504522 ] Apache Spark commented on SPARK-38509: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35805 > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504521#comment-17504521 ] Apache Spark commented on SPARK-38509: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/35805 > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38509: Assignee: Apache Spark > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
[ https://issues.apache.org/jira/browse/SPARK-38509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38509: Assignee: (was: Apache Spark) > Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF > --- > > Key: SPARK-38509 > URL: https://issues.apache.org/jira/browse/SPARK-38509 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > 1. Unregister the functions `timestampadd()` and `timestampdiff()` in > `FunctionRegistry.expressions`. > 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for > `timestampdiff()`. > 3. Align tests (regenerate golden files) to the syntax rules > where the first parameter `unit` can have one of the identifiers: >- YEAR >- QUARTER >- MONTH >- WEEK >- DAY, DAYOFYEAR (valid for timestampadd) >- HOUR >- MINUTE >- SECOND >- MILLISECOND >- MICROSECOND > h4. Why are the changes needed? > 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with > arbitrary string column as the first parameter is not require by any standard. > 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38509) Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF
Max Gekk created SPARK-38509: Summary: Unregister the TIMESTAMPADD/DIFF functions and remove DATE_ADD/DIFF Key: SPARK-38509 URL: https://issues.apache.org/jira/browse/SPARK-38509 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Max Gekk 1. Unregister the functions `timestampadd()` and `timestampdiff()` in `FunctionRegistry.expressions`. 2. Remove the aliases `date_add` for `timestampadd()` and `date_diff` for `timestampdiff()`. 3. Align tests (regenerate golden files) to the syntax rules where the first parameter `unit` can have one of the identifiers: - YEAR - QUARTER - MONTH - WEEK - DAY, DAYOFYEAR (valid for timestampadd) - HOUR - MINUTE - SECOND - MILLISECOND - MICROSECOND h4. Why are the changes needed? 1. The `timestampadd()`/`timestampdiff()` functions (and their aliases) with arbitrary string column as the first parameter is not require by any standard. 2. Remove the functions and aliases should reduce maintenance cost. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38508) Volcano feature doesn't work on EKS graviton instances
Dongjoon Hyun created SPARK-38508: - Summary: Volcano feature doesn't work on EKS graviton instances Key: SPARK-38508 URL: https://issues.apache.org/jira/browse/SPARK-38508 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 3.3.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504500#comment-17504500 ] Cheng Su commented on SPARK-34960: -- Thanks [~tgraves] and [~ahussein] for commenting, and yes, if any ORC file of table is missing statistics at file footer, the Spark query with aggregate push down would be failed loudly. I agree this is not good for user experience, and we are planning to work on runtime fallback to read from real rows in ORC file if no statistics. For now, if you have any concern to the feature, feel free to not enable in your environment, and that's the reason why we disable the feature by default to avoid failing any existing Spark workload. For now I will create a PR to add more documentation to mention the behavior i.e. fail the query if any file missing statistics. For Spark 3.4/next next release, the runtime fallback logic will probably be added as it's too tight to work on the feature for Spark 3.3 (we are doing branch cut in this month), and we have similar problem for Parquet aggregate push down as well. > Aggregate (Min/Max/Count) push down for ORC > --- > > Key: SPARK-34960 > URL: https://issues.apache.org/jira/browse/SPARK-34960 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > Attachments: file_no_stats-orc.tar.gz > > > Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we > can also push down certain aggregations into ORC. ORC exposes column > statistics in interface `org.apache.orc.Reader` > ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118] > ), where Spark can utilize for aggregation push down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes
[ https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-38379: -- Fix Version/s: 3.2.2 (was: 3.3.0) > Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes > -- > > Key: SPARK-38379 > URL: https://issues.apache.org/jira/browse/SPARK-38379 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.2.2 > > > I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in > client mode. I'm using persistent local volumes to mount nvme under /data in > the executors and on startup the driver always throws the warning below. > using these options: > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false > > > {code:java} > 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > java.util.NoSuchElementException: spark.app.id > at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.SparkConf.get(SparkConf.scala:245) > at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$ad
[jira] [Updated] (SPARK-37735) Add appId interface to KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37735: -- Fix Version/s: 3.2.2 > Add appId interface to KubernetesConf > - > > Key: SPARK-37735 > URL: https://issues.apache.org/jira/browse/SPARK-37735 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > The appId now can be only access in KuberntesDriverConf and > KubernetesExecutorConf, but can't be accesss in KubernetesConf. > > Some user featurestep are using KubernetesConf as init constructor parameter > in order to share the featurestep between driver and executor. So, we'd > better add appId to KubernetesConf to help such featurestep access appId. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38507) DataFrame withColumn method not adding or replacing columns when alias is used
Alexandros Mavrommatis created SPARK-38507: -- Summary: DataFrame withColumn method not adding or replacing columns when alias is used Key: SPARK-38507 URL: https://issues.apache.org/jira/browse/SPARK-38507 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2 Reporter: Alexandros Mavrommatis I have an input DataFrame *df* created as follows: {code:java} import spark.implicits._ val df = List((5, 10), (6, 20)).toDF("field1", "field2").alias("df") {code} When I execute either this command: {code:java} df.select("df.field2").show(2) {code} or that one: {code:java} df.withColumn("df.field2", lit(0)).select("df.field2").show(2) {code} I get the same result: {code:java} +--+ |field2| +--+ | 10| | 20| +--+ {code} Additionally, when I execute the following command: {code:java} df.withColumn("df.field3", lit(0)).select("df.field3").show(2){code} I get this exception: {code:java} org.apache.spark.sql.AnalysisException: cannot resolve '`df.field3`' given input columns: [df.field3, df.field1, df.field2]; 'Project ['df.field3] +- Project [field1#7, field2#8, 0 AS df.field3#31] +- SubqueryAlias df +- Project [_1#2 AS field1#7, _2#3 AS field2#8] +- LocalRelation [_1#2, _2#3] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:155) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:152) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUp$2(TreeNode.scala:342) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:342) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUp$1(QueryPlan.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:116) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:116) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:127) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$3(QueryPlan.scala:132) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:132) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:137) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:244) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:137) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:104) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:152) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:93) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:90) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:176) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:173) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143) at org.apache.spar
[jira] [Resolved] (SPARK-38501) Fix thriftserver test failures under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-38501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-38501. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35802 [https://github.com/apache/spark/pull/35802] > Fix thriftserver test failures under ANSI mode > -- > > Key: SPARK-38501 > URL: https://issues.apache.org/jira/browse/SPARK-38501 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504448#comment-17504448 ] Brian Schaefer commented on SPARK-38483: Could you provide an example of when the real column names would be different? At least for basic examples, it looks like the real column names match those found using {{{}Column._jc.toString(){}}}. With some careful regex it may also be possible to catch aliases. {code:python} >>> df = spark.createDataFrame([{"values": [1,2,3]}]) >>> values = F.col("values") >>> print(df.select(values).schema[0].name) values >>> print(values._jc.toString()) values >>> import re >>> aliased_values = F.col("values").alias("aliased") >>> print(df.select(aliased_values).schema[0].name) aliased >>> print(re.match(".*`(.*)`", aliased_values._jc.toString())[1]) aliased {code} > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38483) Column name or alias as an attribute of the PySpark Column class
[ https://issues.apache.org/jira/browse/SPARK-38483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503782#comment-17503782 ] Brian Schaefer edited comment on SPARK-38483 at 3/10/22, 5:08 PM: -- Extracting the column name from the {{Column.\_\_repr\_\_}} method has been discussed on StackExchange: [https://stackoverflow.com/a/43150264.] However, it would be useful to have the column name more easily accessible. was (Author: JIRAUSER286367): Extracting the column name from the {{Column.__repr__}} method has been discussed on StackExchange: [https://stackoverflow.com/a/43150264.] However, it would be useful to have the column name more easily accessible. > Column name or alias as an attribute of the PySpark Column class > > > Key: SPARK-38483 > URL: https://issues.apache.org/jira/browse/SPARK-38483 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Brian Schaefer >Priority: Minor > Labels: starter > > Having the name of a column as an attribute of PySpark {{Column}} class > instances can enable some convenient patterns, for example: > Applying a function to a column and aliasing with the original name: > {code:java} > values = F.col("values") > # repeating the column name as an alias > distinct_values = F.array_distinct(values).alias("values") > # re-using the existing column name > distinct_values = F.array_distinct(values).alias(values._name){code} > Checking the column name inside a custom function and applying conditional > logic on the name: > {code:java} > def custom_function(col: Column) -> Column: > if col._name == "my_column": > return col.astype("int") > return col.astype("string"){code} > The proposal in this issue is to add a property {{Column.\_name}} that > obtains the name or alias of a column in a similar way as currently done in > the {{Column.\_\_repr\_\_}} method: > [https://github.com/apache/spark/blob/master/python/pyspark/sql/column.py#L1062.] > The choice of {{_name}} intentionally avoids collision with the existing > {{Column.name}} method, which is an alias for {{{}Column.alias{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38451) Fix R tests under ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-38451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-38451. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35798 [https://github.com/apache/spark/pull/35798] > Fix R tests under ANSI mode > --- > > Key: SPARK-38451 > URL: https://issues.apache.org/jira/browse/SPARK-38451 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > [https://github.com/gengliangwang/spark/runs/5461227887?check_suite_focus=true] > > {quote}1. Error (test_sparkSQL.R:2064:3): SPARK-37108: expose make_date > expression i > 2022-03-08T10:06:54.9600113Z Error in `handleErrors(returnStatus, conn)`: > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 661.0 failed 1 times, most recent failure: Lost task 0.0 in stage 661.0 > (TID 570) (localhost executor driver): java.time.DateTimeException: Invalid > value for MonthOfYear (valid values 1 - 12): 13. If necessary set > spark.sql.ansi.enabled to false to bypass this error. > {quote} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504374#comment-17504374 ] Ahmed Hussein edited comment on SPARK-34960 at 3/10/22, 3:57 PM: - Thanks [~chengsu] for putting up the optimization on pushed aggregates. I am concerned that the changes introduced in this jira leads to inconsistent behavior in the following scenario: * Assume an ORC file with empty column statistics ([^file_no_stats-orc.tar.gz]). * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with default configuration. This will be fine. * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There will be an exception because the new code assumes that an ORC file must have file statistics. In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read jobs to fail on any ORC file with empty statistics. This is going to be problematic for users because they would have to identify all ORC files or they would risk failing their jobs at runtime. Note that according [ORC-specs|https://orc.apache.org/specification], the statistics are optional even for the futuristic ORCV2. I second [~tgraves] that there should be a way to recover safely if those fields are missing. was (Author: ahussein): Thanks [~chengsu] for putting up the optimization on pushed aggregates. I am concerned that the changes introduced in this jira leads to inconsistent behavior in the following scenario: * Assume an ORC file with empty column statistics (no_col_stats.orc). * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with default configuration. This will be fine. * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There will be an exception because the new code assumes that an ORC file must have file statistics. In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read jobs to fail on any ORC file with empty statistics. This is going to be problematic for users because they would have to identify all ORC files or they would risk failing their jobs at runtime. Note that according [ORC-specs|https://orc.apache.org/specification], the statistics are optional even for the futuristic ORCV2. I second [~tgraves] that there should be a way to recover safely if those fields are missing. > Aggregate (Min/Max/Count) push down for ORC > --- > > Key: SPARK-34960 > URL: https://issues.apache.org/jira/browse/SPARK-34960 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > Attachments: file_no_stats-orc.tar.gz > > > Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we > can also push down certain aggregations into ORC. ORC exposes column > statistics in interface `org.apache.orc.Reader` > ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118] > ), where Spark can utilize for aggregation push down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504374#comment-17504374 ] Ahmed Hussein commented on SPARK-34960: --- Thanks [~chengsu] for putting up the optimization on pushed aggregates. I am concerned that the changes introduced in this jira leads to inconsistent behavior in the following scenario: * Assume an ORC file with empty column statistics (no_col_stats.orc). * Run a read job as {{spark.read.orc(path).selectExpr('count(p)')}} with default configuration. This will be fine. * Now, enable {{'spark.sql.orc.aggregatePushdown': 'true'}} and re-run. There will be an exception because the new code assumes that an ORC file must have file statistics. In other words, enabling {{spark.sql.orc.aggregatePushdown}} will cause read jobs to fail on any ORC file with empty statistics. This is going to be problematic for users because they would have to identify all ORC files or they would risk failing their jobs at runtime. Note that according [ORC-specs|https://orc.apache.org/specification], the statistics are optional even for the futuristic ORCV2. I second [~tgraves] that there should be a way to recover safely if those fields are missing. > Aggregate (Min/Max/Count) push down for ORC > --- > > Key: SPARK-34960 > URL: https://issues.apache.org/jira/browse/SPARK-34960 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > Attachments: file_no_stats-orc.tar.gz > > > Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we > can also push down certain aggregations into ORC. ORC exposes column > statistics in interface `org.apache.orc.Reader` > ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118] > ), where Spark can utilize for aggregation push down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34960) Aggregate (Min/Max/Count) push down for ORC
[ https://issues.apache.org/jira/browse/SPARK-34960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated SPARK-34960: -- Attachment: file_no_stats-orc.tar.gz > Aggregate (Min/Max/Count) push down for ORC > --- > > Key: SPARK-34960 > URL: https://issues.apache.org/jira/browse/SPARK-34960 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Cheng Su >Assignee: Cheng Su >Priority: Minor > Fix For: 3.3.0 > > Attachments: file_no_stats-orc.tar.gz > > > Similar to Parquet (https://issues.apache.org/jira/browse/SPARK-34952), we > can also push down certain aggregations into ORC. ORC exposes column > statistics in interface `org.apache.orc.Reader` > ([https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/Reader.java#L118] > ), where Spark can utilize for aggregation push down. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38505) Make partial aggregation adaptive
[ https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504349#comment-17504349 ] Apache Spark commented on SPARK-38505: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35806 > Make partial aggregation adaptive > - > > Key: SPARK-38505 > URL: https://issues.apache.org/jira/browse/SPARK-38505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > We can skip do partial aggregation to avoid spilling if this step does not > reduce the number of rows too much. > https://github.com/trinodb/trino/pull/11011 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38505) Make partial aggregation adaptive
[ https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38505: Assignee: Apache Spark > Make partial aggregation adaptive > - > > Key: SPARK-38505 > URL: https://issues.apache.org/jira/browse/SPARK-38505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > We can skip do partial aggregation to avoid spilling if this step does not > reduce the number of rows too much. > https://github.com/trinodb/trino/pull/11011 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38505) Make partial aggregation adaptive
[ https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-38505: Assignee: (was: Apache Spark) > Make partial aggregation adaptive > - > > Key: SPARK-38505 > URL: https://issues.apache.org/jira/browse/SPARK-38505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > We can skip do partial aggregation to avoid spilling if this step does not > reduce the number of rows too much. > https://github.com/trinodb/trino/pull/11011 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38505) Make partial aggregation adaptive
[ https://issues.apache.org/jira/browse/SPARK-38505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504350#comment-17504350 ] Apache Spark commented on SPARK-38505: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/35806 > Make partial aggregation adaptive > - > > Key: SPARK-38505 > URL: https://issues.apache.org/jira/browse/SPARK-38505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > We can skip do partial aggregation to avoid spilling if this step does not > reduce the number of rows too much. > https://github.com/trinodb/trino/pull/11011 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38506) Push partial aggregation through join
Yuming Wang created SPARK-38506: --- Summary: Push partial aggregation through join Key: SPARK-38506 URL: https://issues.apache.org/jira/browse/SPARK-38506 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yuming Wang Please see https://docs.teradata.com/r/Teradata-VantageTM-SQL-Request-and-Transaction-Processing/March-2019/Join-Planning-and-Optimization/Partial-GROUP-BY-Block-Optimization for more details. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504336#comment-17504336 ] Apache Spark commented on SPARK-37735: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/35804 > Add appId interface to KubernetesConf > - > > Key: SPARK-37735 > URL: https://issues.apache.org/jira/browse/SPARK-37735 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > The appId now can be only access in KuberntesDriverConf and > KubernetesExecutorConf, but can't be accesss in KubernetesConf. > > Some user featurestep are using KubernetesConf as init constructor parameter > in order to share the featurestep between driver and executor. So, we'd > better add appId to KubernetesConf to help such featurestep access appId. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504337#comment-17504337 ] Apache Spark commented on SPARK-37735: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/35804 > Add appId interface to KubernetesConf > - > > Key: SPARK-37735 > URL: https://issues.apache.org/jira/browse/SPARK-37735 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > The appId now can be only access in KuberntesDriverConf and > KubernetesExecutorConf, but can't be accesss in KubernetesConf. > > Some user featurestep are using KubernetesConf as init constructor parameter > in order to share the featurestep between driver and executor. So, we'd > better add appId to KubernetesConf to help such featurestep access appId. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38505) Make partial aggregation adaptive
Yuming Wang created SPARK-38505: --- Summary: Make partial aggregation adaptive Key: SPARK-38505 URL: https://issues.apache.org/jira/browse/SPARK-38505 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yuming Wang We can skip do partial aggregation to avoid spilling if this step does not reduce the number of rows too much. https://github.com/trinodb/trino/pull/11011 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37735) Add appId interface to KubernetesConf
[ https://issues.apache.org/jira/browse/SPARK-37735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504335#comment-17504335 ] Apache Spark commented on SPARK-37735: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/35804 > Add appId interface to KubernetesConf > - > > Key: SPARK-37735 > URL: https://issues.apache.org/jira/browse/SPARK-37735 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > The appId now can be only access in KuberntesDriverConf and > KubernetesExecutorConf, but can't be accesss in KubernetesConf. > > Some user featurestep are using KubernetesConf as init constructor parameter > in order to share the featurestep between driver and executor. So, we'd > better add appId to KubernetesConf to help such featurestep access appId. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes
[ https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504334#comment-17504334 ] Apache Spark commented on SPARK-38379: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/35804 > Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes > -- > > Key: SPARK-38379 > URL: https://issues.apache.org/jira/browse/SPARK-38379 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.3.0 > > > I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in > client mode. I'm using persistent local volumes to mount nvme under /data in > the executors and on startup the driver always throws the warning below. > using these options: > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false > > > {code:java} > 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > java.util.NoSuchElementException: spark.app.id > at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.SparkConf.get(SparkConf.scala:245) > at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117
[jira] [Commented] (SPARK-38379) Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes
[ https://issues.apache.org/jira/browse/SPARK-38379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504333#comment-17504333 ] Apache Spark commented on SPARK-38379: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/35804 > Kubernetes: NoSuchElementException: spark.app.id when using PersistentVolumes > -- > > Key: SPARK-38379 > URL: https://issues.apache.org/jira/browse/SPARK-38379 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.2.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.3.0 > > > I'm using Spark 3.2.1 on a kubernetes cluster and starting a spark-shell in > client mode. I'm using persistent local volumes to mount nvme under /data in > the executors and on startup the driver always throws the warning below. > using these options: > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.claimName=OnDemand > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.storageClass=fast-disks > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.options.sizeLimit=500Gi > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.path=/data > \ > --conf > spark.kubernetes.executor.volumes.persistentVolumeClaim.spark-local-dir-1.mount.readOnly=false > > > {code:java} > 22/03/01 20:21:22 WARN ExecutorPodsSnapshotsStoreImpl: Exception when > notifying snapshot subscriber. > java.util.NoSuchElementException: spark.app.id > at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.SparkConf.get(SparkConf.scala:245) > at org.apache.spark.SparkConf.getAppId(SparkConf.scala:450) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.$anonfun$constructVolumes$4(MountVolumesFeatureStep.scala:88) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.constructVolumes(MountVolumesFeatureStep.scala:57) > at > org.apache.spark.deploy.k8s.features.MountVolumesFeatureStep.configurePod(MountVolumesFeatureStep.scala:34) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.$anonfun$buildFromFeatures$4(KubernetesExecutorBuilder.scala:64) > at > scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) > at > scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) > at scala.collection.immutable.List.foldLeft(List.scala:91) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesExecutorBuilder.buildFromFeatures(KubernetesExecutorBuilder.scala:63) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:391) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:382) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:346) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:339) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:339) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:117
[jira] [Commented] (SPARK-38330) Certificate doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]
[ https://issues.apache.org/jira/browse/SPARK-38330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504286#comment-17504286 ] Steve Loughran commented on SPARK-38330: this is a hadoop issue -create a Jira there and file as a causes. # the aws sdk bundled jar has its own httpclient, so upgrading that may fix it # and recent hadoop releases let you switch to openssl if it is on the system, so has it handling certs > Certificate doesn't match any of the subject alternative names: > [*.s3.amazonaws.com, s3.amazonaws.com] > -- > > Key: SPARK-38330 > URL: https://issues.apache.org/jira/browse/SPARK-38330 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 3.2.1 > Environment: Spark 3.2.1 built with `hadoop-cloud` flag. > Direct access to s3 using default file committer. > JDK8. > >Reporter: André F. >Priority: Major > > Trying to run any job after bumping our Spark version from 3.1.2 to 3.2.1, > lead us to the current exception while reading files on s3: > {code:java} > org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on > s3a:///.parquet: com.amazonaws.SdkClientException: Unable to > execute HTTP request: Certificate for doesn't match > any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com]: > Unable to execute HTTP request: Certificate for doesn't match any of > the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] at > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:208) at > org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) at > org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3351) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) > at > org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) > at scala.Option.getOrElse(Option.scala:189) at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at > org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:596) {code} > > {code:java} > Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for > doesn't match any of the subject alternative names: > [*.s3.amazonaws.com, s3.amazonaws.com] > at > com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:507) > at > com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:437) > at > com.amazonaws.thirdparty.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384) > at > com.amazonaws.thirdparty.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) > at > com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) > at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76) > at com.amazonaws.http.conn.$Proxy16.connect(Unknown Source) > at > com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) > at > com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) > at > com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) > at > com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) > at > com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) > at > com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) > at > com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) >