[jira] [Created] (SPARK-48631) Fix test case "error during accessing host local dirs for executors"
Bo Zhang created SPARK-48631: Summary: Fix test case "error during accessing host local dirs for executors" Key: SPARK-48631 URL: https://issues.apache.org/jira/browse/SPARK-48631 Project: Spark Issue Type: Test Components: Spark Core Affects Versions: 4.0.0 Reporter: Bo Zhang There is a logical error in test case "error during accessing host local dirs for executors" in ShuffleBlockFetcherIteratorSuite. It tries to test fetching host-local blocks, but the host-local BlockManagerId is configured incorrectly, and ShuffleBlockFetcherIterator will treat those blocks as remote blocks instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48325) Always specify messages in ExecutorRunner.killProcess
Bo Zhang created SPARK-48325: Summary: Always specify messages in ExecutorRunner.killProcess Key: SPARK-48325 URL: https://issues.apache.org/jira/browse/SPARK-48325 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Bo Zhang For some of the cases in ExecutorRunner.killProcess, the argument `message` is `None`. We should always specify the message so that we can get the occurrence rate for different cases, in order to analyze executor running stability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47764) Cleanup shuffle dependencies for Spark Connect SQL executions
Bo Zhang created SPARK-47764: Summary: Cleanup shuffle dependencies for Spark Connect SQL executions Key: SPARK-47764 URL: https://issues.apache.org/jira/browse/SPARK-47764 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Bo Zhang Shuffle dependencies are created by shuffle map stages, which consists of files on disks and the corresponding references in Spark JVM heap memory. Currently Spark cleanup unused shuffle dependencies through JVM GCs, and periodic GCs are triggered once every 30 minutes (see ContextCleaner). However, we still found cases in which the size of the shuffle data files are too large, which makes shuffle data migration slow. We do have chances to cleanup shuffle dependencies, especially for SQL queries created by Spark Connect, since we do have better control of the DataFrame instances there. Even if DataFrame instances are reused in the client side, on the server side the instances are still recreated. We might also provide the option to 1. cleanup eagerly after each query executions, or 2. only mark the shuffle executions and do not migrate them at node decommissions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions
[ https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-44635: - Description: Spark's decommission feature supports migration of shuffle data. However shuffle data fetcher will only look at the location (`BlockManagerId`) when it is initialized. This can lead to shuffle fetch failures when the shuffle read tasks are long. To mitigate this, shuffle data fetchers should be able to look for the updated locations after decommissions, and fetch from there instead. was:Spark's decommission feature supports migration of shuffle data. However shuffle data fetcher will only look at the location (`BlockManagerId`) when it is initialized. This can lead to shuffle fetch failures when the shuffle read tasks are long. > Handle shuffle fetch failures in decommissions > -- > > Key: SPARK-44635 > URL: https://issues.apache.org/jira/browse/SPARK-44635 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Bo Zhang >Priority: Major > > Spark's decommission feature supports migration of shuffle data. However > shuffle data fetcher will only look at the location (`BlockManagerId`) when > it is initialized. This can lead to shuffle fetch failures when the shuffle > read tasks are long. > > To mitigate this, shuffle data fetchers should be able to look for the > updated locations after decommissions, and fetch from there instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44635) Handle shuffle fetch failures in decommissions
Bo Zhang created SPARK-44635: Summary: Handle shuffle fetch failures in decommissions Key: SPARK-44635 URL: https://issues.apache.org/jira/browse/SPARK-44635 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Bo Zhang Spark's decommission feature supports migration of shuffle data. However shuffle data fetcher will only look at the location (`BlockManagerId`) when it is initialized. This can lead to shuffle fetch failures when the shuffle read tasks are long. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38476) Use error classes in org.apache.spark.storage
[ https://issues.apache.org/jira/browse/SPARK-38476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-38476: - Summary: Use error classes in org.apache.spark.storage (was: Use error classes in org.apache.spark.shuffle) > Use error classes in org.apache.spark.storage > - > > Key: SPARK-38476 > URL: https://issues.apache.org/jira/browse/SPARK-38476 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38477) Use error classes in org.apache.spark.shuffle
[ https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-38477: - Summary: Use error classes in org.apache.spark.shuffle (was: Use error classes in org.apache.spark.storage) > Use error classes in org.apache.spark.shuffle > - > > Key: SPARK-38477 > URL: https://issues.apache.org/jira/browse/SPARK-38477 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38471) Use error classes in org.apache.spark.rdd
[ https://issues.apache.org/jira/browse/SPARK-38471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17720380#comment-17720380 ] Bo Zhang commented on SPARK-38471: -- Hi [~blindcat] , thanks for working on this! I don't have permission to assign this ticket to you, maybe we need to ask a committer for that. For the code you posted in the link, I think we might not need to migrate that to error class, since that's a catch and re-throw, instead of constructing a new exception. > Use error classes in org.apache.spark.rdd > - > > Key: SPARK-38471 > URL: https://issues.apache.org/jira/browse/SPARK-38471 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43309) Extend INTERNAL_ERROR with category
Bo Zhang created SPARK-43309: Summary: Extend INTERNAL_ERROR with category Key: SPARK-43309 URL: https://issues.apache.org/jira/browse/SPARK-43309 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.5.0 Reporter: Bo Zhang This is to extend INTERNAL_ERROR with different categories / areas / modules (e.g. INTERNAL_ERROR_BROADCAST) so that we can better differentiate them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38478) Use error classes in org.apache.spark.ui
[ https://issues.apache.org/jira/browse/SPARK-38478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17707610#comment-17707610 ] Bo Zhang commented on SPARK-38478: -- Thanks! [~Wencong Liu] please feel free to submit a PR for this. > Use error classes in org.apache.spark.ui > > > Key: SPARK-38478 > URL: https://issues.apache.org/jira/browse/SPARK-38478 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42602) Provide more details in TaskEndReason s for tasks killed by TaskScheduler.cancelTasks
[ https://issues.apache.org/jira/browse/SPARK-42602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-42602: - Summary: Provide more details in TaskEndReason s for tasks killed by TaskScheduler.cancelTasks (was: Add reason string as an argument to TaskScheduler.cancelTasks) > Provide more details in TaskEndReason s for tasks killed by > TaskScheduler.cancelTasks > - > > Key: SPARK-42602 > URL: https://issues.apache.org/jira/browse/SPARK-42602 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Bo Zhang >Priority: Major > > Currently tasks killed by `TaskScheduler.cancelTasks` will have a > `TaskEndReason` "TaskKilled (Stage cancelled)". We should do better at > differentiating reasons for stage cancellations (e.g. user-initiated or > caused by task failures in the stage). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42602) Add reason string as an argument to TaskScheduler.cancelTasks
Bo Zhang created SPARK-42602: Summary: Add reason string as an argument to TaskScheduler.cancelTasks Key: SPARK-42602 URL: https://issues.apache.org/jira/browse/SPARK-42602 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.5.0 Reporter: Bo Zhang Currently tasks killed by `TaskScheduler.cancelTasks` will have a `TaskEndReason` "TaskKilled (Stage cancelled)". We should do better at differentiating reasons for stage cancellations (e.g. user-initiated or caused by task failures in the stage). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42358) Provide more details in ExecutorUpdated sent in Master.removeWorker
Bo Zhang created SPARK-42358: Summary: Provide more details in ExecutorUpdated sent in Master.removeWorker Key: SPARK-42358 URL: https://issues.apache.org/jira/browse/SPARK-42358 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.1 Reporter: Bo Zhang Currently field `message` in `ExecutorUpdated` sent in Master.removeWorker is always `Some("worker lost")`. We should provide more information in the message instead to better differentiate the cause of the worker removal. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41463) Ensure error class (and subclass) names contain only capital letters, numbers and underscores
Bo Zhang created SPARK-41463: Summary: Ensure error class (and subclass) names contain only capital letters, numbers and underscores Key: SPARK-41463 URL: https://issues.apache.org/jira/browse/SPARK-41463 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Bo Zhang We should add a unit test to ensure that error class (and subclass) names contain only capital letters, numbers and underscores. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang closed SPARK-41099. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17632048#comment-17632048 ] Bo Zhang edited comment on SPARK-41099 at 11/11/22 3:08 AM: To keep the exceptions exposed to users who use the RDD APIs, we will not change this. See https://github.com/apache/spark/pull/38602#issuecomment-1310755154 was (Author: bozhang): To keep the exceptions exposed to users who use the RDD APIs, we will not change this. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
[ https://issues.apache.org/jira/browse/SPARK-41099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang resolved SPARK-41099. -- Resolution: Won't Fix To keep the exceptions exposed to users who use the RDD APIs, we will not change this. > Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException > > > Key: SPARK-41099 > URL: https://issues.apache.org/jira/browse/SPARK-41099 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Bo Zhang >Priority: Major > > This is similar to https://issues.apache.org/jira/browse/SPARK-40488. > Exceptions thrown in SparkHadoopWriter.write are wrapped with > SparkException("Job aborted."). > This wrapping provides little extra information, but generates a long > stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41099) Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException
Bo Zhang created SPARK-41099: Summary: Do not wrap exceptions thrown in SparkHadoopWriter.write with SparkException Key: SPARK-41099 URL: https://issues.apache.org/jira/browse/SPARK-41099 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Bo Zhang This is similar to https://issues.apache.org/jira/browse/SPARK-40488. Exceptions thrown in SparkHadoopWriter.write are wrapped with SparkException("Job aborted."). This wrapping provides little extra information, but generates a long stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40596) Populate ExecutorDecommission with more informative messages
Bo Zhang created SPARK-40596: Summary: Populate ExecutorDecommission with more informative messages Key: SPARK-40596 URL: https://issues.apache.org/jira/browse/SPARK-40596 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Bo Zhang Currently the message in {{ExecutorDecommission}} is a fixed value {{{}"Executor decommission."{}}}, and it is the same for all cases, including spot instance interruptions and auto-scaling down. We should put a detailed message in {{ExecutorDecommission}} to better differentiate those cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40488) Do not wrap exceptions thrown in FileFormatWriter.write with SparkException
Bo Zhang created SPARK-40488: Summary: Do not wrap exceptions thrown in FileFormatWriter.write with SparkException Key: SPARK-40488 URL: https://issues.apache.org/jira/browse/SPARK-40488 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Bo Zhang Exceptions thrown in FileFormatWriter.write are wrapped with SparkException("Job aborted."). This wrapping provides little extra information, but generates a long stacktrace, which hinders debugging when error happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38468) Use error classes in org.apache.spark.metrics
[ https://issues.apache.org/jira/browse/SPARK-38468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506925#comment-17506925 ] Bo Zhang commented on SPARK-38468: -- Yes this is duplicated. Please close this one. Thanks [~Ngone51] ! > Use error classes in org.apache.spark.metrics > - > > Key: SPARK-38468 > URL: https://issues.apache.org/jira/browse/SPARK-38468 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38468) Use error classes in org.apache.spark.metrics
[ https://issues.apache.org/jira/browse/SPARK-38468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang resolved SPARK-38468. -- Resolution: Duplicate > Use error classes in org.apache.spark.metrics > - > > Key: SPARK-38468 > URL: https://issues.apache.org/jira/browse/SPARK-38468 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-38468) Use error classes in org.apache.spark.metrics
[ https://issues.apache.org/jira/browse/SPARK-38468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506925#comment-17506925 ] Bo Zhang edited comment on SPARK-38468 at 3/15/22, 2:12 PM: Yes this is duplicated. Thanks [~Ngone51] ! was (Author: bozhang): Yes this is duplicated. Please close this one. Thanks [~Ngone51] ! > Use error classes in org.apache.spark.metrics > - > > Key: SPARK-38468 > URL: https://issues.apache.org/jira/browse/SPARK-38468 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38477) Use error classes in org.apache.spark.storage
Bo Zhang created SPARK-38477: Summary: Use error classes in org.apache.spark.storage Key: SPARK-38477 URL: https://issues.apache.org/jira/browse/SPARK-38477 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38474) Use error classes in org.apache.spark.security
Bo Zhang created SPARK-38474: Summary: Use error classes in org.apache.spark.security Key: SPARK-38474 URL: https://issues.apache.org/jira/browse/SPARK-38474 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38475) Use error classes in org.apache.spark.serializer
Bo Zhang created SPARK-38475: Summary: Use error classes in org.apache.spark.serializer Key: SPARK-38475 URL: https://issues.apache.org/jira/browse/SPARK-38475 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38476) Use error classes in org.apache.spark.shuffle
Bo Zhang created SPARK-38476: Summary: Use error classes in org.apache.spark.shuffle Key: SPARK-38476 URL: https://issues.apache.org/jira/browse/SPARK-38476 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38478) Use error classes in org.apache.spark.ui
Bo Zhang created SPARK-38478: Summary: Use error classes in org.apache.spark.ui Key: SPARK-38478 URL: https://issues.apache.org/jira/browse/SPARK-38478 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38473) Use error classes in org.apache.spark.scheduler
Bo Zhang created SPARK-38473: Summary: Use error classes in org.apache.spark.scheduler Key: SPARK-38473 URL: https://issues.apache.org/jira/browse/SPARK-38473 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38472) Use error classes in org.apache.spark.rpc
Bo Zhang created SPARK-38472: Summary: Use error classes in org.apache.spark.rpc Key: SPARK-38472 URL: https://issues.apache.org/jira/browse/SPARK-38472 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38471) Use error classes in org.apache.spark.rdd
Bo Zhang created SPARK-38471: Summary: Use error classes in org.apache.spark.rdd Key: SPARK-38471 URL: https://issues.apache.org/jira/browse/SPARK-38471 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38470) Use error classes in org.apache.spark.partial
Bo Zhang created SPARK-38470: Summary: Use error classes in org.apache.spark.partial Key: SPARK-38470 URL: https://issues.apache.org/jira/browse/SPARK-38470 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38467) Use error classes in org.apache.spark.memory
Bo Zhang created SPARK-38467: Summary: Use error classes in org.apache.spark.memory Key: SPARK-38467 URL: https://issues.apache.org/jira/browse/SPARK-38467 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38468) Use error classes in org.apache.spark.metrics
Bo Zhang created SPARK-38468: Summary: Use error classes in org.apache.spark.metrics Key: SPARK-38468 URL: https://issues.apache.org/jira/browse/SPARK-38468 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38469) Use error classes in org.apache.spark.network
Bo Zhang created SPARK-38469: Summary: Use error classes in org.apache.spark.network Key: SPARK-38469 URL: https://issues.apache.org/jira/browse/SPARK-38469 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38465) Use error classes in org.apache.spark.launcher
Bo Zhang created SPARK-38465: Summary: Use error classes in org.apache.spark.launcher Key: SPARK-38465 URL: https://issues.apache.org/jira/browse/SPARK-38465 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38466) Use error classes in org.apache.spark.mapred
Bo Zhang created SPARK-38466: Summary: Use error classes in org.apache.spark.mapred Key: SPARK-38466 URL: https://issues.apache.org/jira/browse/SPARK-38466 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38464) Use error classes in org.apache.spark.io
Bo Zhang created SPARK-38464: Summary: Use error classes in org.apache.spark.io Key: SPARK-38464 URL: https://issues.apache.org/jira/browse/SPARK-38464 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38463) Use error classes in org.apache.spark.input
Bo Zhang created SPARK-38463: Summary: Use error classes in org.apache.spark.input Key: SPARK-38463 URL: https://issues.apache.org/jira/browse/SPARK-38463 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38462) Use error classes in org.apache.spark.executor
Bo Zhang created SPARK-38462: Summary: Use error classes in org.apache.spark.executor Key: SPARK-38462 URL: https://issues.apache.org/jira/browse/SPARK-38462 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38461) Use error classes in org.apache.spark.broadcast
Bo Zhang created SPARK-38461: Summary: Use error classes in org.apache.spark.broadcast Key: SPARK-38461 URL: https://issues.apache.org/jira/browse/SPARK-38461 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38312) Use error classes in org.apache.spark.metrics
[ https://issues.apache.org/jira/browse/SPARK-38312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-38312: - Summary: Use error classes in org.apache.spark.metrics (was: Use error classes in spark-core) > Use error classes in org.apache.spark.metrics > - > > Key: SPARK-38312 > URL: https://issues.apache.org/jira/browse/SPARK-38312 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Assignee: Bo Zhang >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38312) Use error classes in spark-core
Bo Zhang created SPARK-38312: Summary: Use error classes in spark-core Key: SPARK-38312 URL: https://issues.apache.org/jira/browse/SPARK-38312 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.3.0 Reporter: Bo Zhang -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38236) Absolute file paths specified in create/alter table are treated as relative
Bo Zhang created SPARK-38236: Summary: Absolute file paths specified in create/alter table are treated as relative Key: SPARK-38236 URL: https://issues.apache.org/jira/browse/SPARK-38236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.2.0, 3.1.2, 3.1.1 Reporter: Bo Zhang After https://github.com/apache/spark/pull/28527 we change to create table under the database location when the table location specified is relative. However the criteria to determine if a table location is relative/absolute is URI.isAbsolute, which basically checks if the table location URI has a scheme defined. So table URIs like /table/path are treated as relative and the scheme and authority of the database location URI are used to create the table. For example, when the database location URI is s3a://bucket/db, the table will be created at s3a://bucket/table/path, while it should be created under the file system defined in SessionCatalog.hadoopConf instead. This also applies to alter table. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38202) Invalid URL in SparkContext.addedJars will constantly fails Executor.run()
Bo Zhang created SPARK-38202: Summary: Invalid URL in SparkContext.addedJars will constantly fails Executor.run() Key: SPARK-38202 URL: https://issues.apache.org/jira/browse/SPARK-38202 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.1 Reporter: Bo Zhang When an invalid URL is used in SparkContext.addJar(), all subsequent query executions will fail since downloading the jar is in the critical path of Executor.run(), even when the query has noting to do with the jar. A simple reproduce of the issue: {code:java} sc.addJar("http://invalid/library.jar;) (0 to 1).toDF.count {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37626) Upgrade libthrift to 0.15.0
[ https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458178#comment-17458178 ] Bo Zhang edited comment on SPARK-37626 at 12/13/21, 7:26 AM: - 0.16.0 is not release yet. Could we upgrade to 0.15.0 first? was (Author: bozhang): 0.16.0 is not release yet. Could we upgrade to 0.15.0 first? Here is the PR for that: https://github.com/apache/spark/pull/34878 > Upgrade libthrift to 0.15.0 > --- > > Key: SPARK-37626 > URL: https://issues.apache.org/jira/browse/SPARK-37626 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > Fix For: 3.3.0 > > > Upgrade libthrift to 1.15.0 in order to avoid > https://nvd.nist.gov/vuln/detail/CVE-2020-13949. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37626) Upgrade libthrift to 0.15.0
[ https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17458178#comment-17458178 ] Bo Zhang commented on SPARK-37626: -- 0.16.0 is not release yet. Could we upgrade to 0.15.0 first? Here is the PR for that: https://github.com/apache/spark/pull/34878 > Upgrade libthrift to 0.15.0 > --- > > Key: SPARK-37626 > URL: https://issues.apache.org/jira/browse/SPARK-37626 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > Fix For: 3.3.0 > > > Upgrade libthrift to 1.15.0 in order to avoid > https://nvd.nist.gov/vuln/detail/CVE-2020-13949. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37626) Upgrade libthrift to 0.15.0
[ https://issues.apache.org/jira/browse/SPARK-37626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bo Zhang updated SPARK-37626: - Summary: Upgrade libthrift to 0.15.0 (was: Upgrade libthrift to 1.15.0) > Upgrade libthrift to 0.15.0 > --- > > Key: SPARK-37626 > URL: https://issues.apache.org/jira/browse/SPARK-37626 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > Fix For: 3.3.0 > > > Upgrade libthrift to 1.15.0 in order to avoid > https://nvd.nist.gov/vuln/detail/CVE-2020-13949. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37626) Upgrade libthrift to 1.15.0
Bo Zhang created SPARK-37626: Summary: Upgrade libthrift to 1.15.0 Key: SPARK-37626 URL: https://issues.apache.org/jira/browse/SPARK-37626 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Bo Zhang Fix For: 3.3.0 Upgrade libthrift to 1.15.0 in order to avoid https://nvd.nist.gov/vuln/detail/CVE-2020-13949. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36533) Allow streaming queries with Trigger.Once run in multiple batches
Bo Zhang created SPARK-36533: Summary: Allow streaming queries with Trigger.Once run in multiple batches Key: SPARK-36533 URL: https://issues.apache.org/jira/browse/SPARK-36533 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.2.0 Reporter: Bo Zhang Currently streaming queries with Trigger.Once will always load all of the available data in a single batch. Because of this, the amount of data the queries can process is limited, or Spark driver will be out of memory. We should allow streaming queries with Trigger.Once run in multiple batches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35457) Update ANTLR runtime version to 4.8
Bo Zhang created SPARK-35457: Summary: Update ANTLR runtime version to 4.8 Key: SPARK-35457 URL: https://issues.apache.org/jira/browse/SPARK-35457 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.1.1 Reporter: Bo Zhang As a follow-up of SPARK-33475, this is to change the antlr4-runtime version to 4.8, which is an official release version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35227) Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit
Bo Zhang created SPARK-35227: Summary: Replace Bintray with the new repository service for the spark-packages resolver in SparkSubmit Key: SPARK-35227 URL: https://issues.apache.org/jira/browse/SPARK-35227 Project: Spark Issue Type: Task Components: Build Affects Versions: 3.1.1, 3.1.0, 3.0.2, 3.0.1, 3.0.0, 3.0.3, 3.1.2, 3.2.0 Reporter: Bo Zhang As Bintray is being shut down, we have setup a new repository service at repos.spark-packages.org. We need to replace Bintray with the new service for the spark-packages resolver in SparkSumit. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34757) Spark submit should ignore cache for SNAPSHOT dependencies
Bo Zhang created SPARK-34757: Summary: Spark submit should ignore cache for SNAPSHOT dependencies Key: SPARK-34757 URL: https://issues.apache.org/jira/browse/SPARK-34757 Project: Spark Issue Type: Bug Components: Deploy, Spark Core Affects Versions: 3.1.1 Reporter: Bo Zhang When spark-submit is executed with --packages, it will not download the dependency jars when they are available in cache (e.g. ivy cache), even when the dependencies are SNAPSHOTs. This might block developers who work on external modules in Spark (e.g. spark-avro), since they need to remove the cache manually every time when they update the code during developments (which generates SNAPSHOT jars). Without knowing this, they could be blocked wondering why their code changes are not reflected in spark-submit executions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34471) Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide
Bo Zhang created SPARK-34471: Summary: Document DataStreamReader/Writer table APIs in Structured Streaming Programming Guide Key: SPARK-34471 URL: https://issues.apache.org/jira/browse/SPARK-34471 Project: Spark Issue Type: Documentation Components: Documentation, Structured Streaming Affects Versions: 3.1.1 Reporter: Bo Zhang We added APIs to enable read/write with tables in SPARK-32885, SPARK-32896 and SPARK-33836. We need to update the Structured Streaming Programming Guide with the changes above. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33316) Support nullable Avro schemas for non-nullable data in Avro writing
Bo Zhang created SPARK-33316: Summary: Support nullable Avro schemas for non-nullable data in Avro writing Key: SPARK-33316 URL: https://issues.apache.org/jira/browse/SPARK-33316 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.1, 3.0.0, 2.4.0 Reporter: Bo Zhang Currently when users try to use nullable Avro schemas for non-nullable data in Avro writing, Spark will throw a IncompatibleSchemaException. There are some cases when users do not have full control over the nullability of the data, or the nullability of the Avro schemas they have to use. We should support nullable Avro schemas for non-nullable data in Avro writing for better usability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org