[jira] [Resolved] (SPARK-44548) Add support for pandas DataFrame assertDataFrameEqual
[ https://issues.apache.org/jira/browse/SPARK-44548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44548. -- Resolution: Fixed Issue resolved by pull request 42158 [https://github.com/apache/spark/pull/42158] > Add support for pandas DataFrame assertDataFrameEqual > - > > Key: SPARK-44548 > URL: https://issues.apache.org/jira/browse/SPARK-44548 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44580) RocksDB crashed when testing in GitHub Actions
Yang Jie created SPARK-44580: Summary: RocksDB crashed when testing in GitHub Actions Key: SPARK-44580 URL: https://issues.apache.org/jira/browse/SPARK-44580 Project: Spark Issue Type: Bug Components: SQL, Tests Affects Versions: 3.5.0, 4.0.0 Reporter: Yang Jie [https://github.com/LuciferYang/spark/actions/runs/5666554831/job/15395578871] {code:java} # 17177# A fatal error has been detected by the Java Runtime Environment: 17178# 17179# SIGSEGV (0xb) at pc=0x7f8a077d2743, pid=4403, tid=0x7f89cadff640 17180# 17181# JRE version: OpenJDK Runtime Environment (8.0_372-b07) (build 1.8.0_372-b07) 17182# Java VM: OpenJDK 64-Bit Server VM (25.372-b07 mixed mode linux-amd64 compressed oops) 17183# Problematic frame: 17184# C [librocksdbjni886380103972770161.so+0x3d2743] rocksdb::DBImpl::FailIfCfHasTs(rocksdb::ColumnFamilyHandle const*) const+0x23 17185# 17186# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again 17187# 17188# An error report file with more information is saved as: 17189# /home/runner/work/spark/spark/sql/core/hs_err_pid4403.log 17190# 17191# If you would like to submit a bug report, please visit: 17192# https://github.com/adoptium/adoptium-support/issues 17193# The crash happened outside the Java Virtual Machine in native code. 17194# See problematic frame for where to report the bug. 17195# {code} This is my first time encountering this problem, and I am unsure of the root cause now -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-42098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-42098: --- Assignee: Jia Fan > ResolveInlineTables should handle RuntimeReplaceable > > > Key: SPARK-42098 > URL: https://issues.apache.org/jira/browse/SPARK-42098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > spark-sql> VALUES (try_divide(5, 0)); > cannot evaluate expression try_divide(5, 0) in inline table definition; line > 1 pos 8 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable
[ https://issues.apache.org/jira/browse/SPARK-42098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-42098. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42110 [https://github.com/apache/spark/pull/42110] > ResolveInlineTables should handle RuntimeReplaceable > > > Key: SPARK-42098 > URL: https://issues.apache.org/jira/browse/SPARK-42098 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Wenchen Fan >Priority: Major > Fix For: 3.5.0 > > > spark-sql> VALUES (try_divide(5, 0)); > cannot evaluate expression try_divide(5, 0) in inline table definition; line > 1 pos 8 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43242) diagnoseCorruption should not throw Unexpected type of BlockId for ShuffleBlockBatchId
[ https://issues.apache.org/jira/browse/SPARK-43242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748400#comment-17748400 ] Snoot.io commented on SPARK-43242: -- User 'CavemanIV' has created a pull request for this issue: https://github.com/apache/spark/pull/40921 > diagnoseCorruption should not throw Unexpected type of BlockId for > ShuffleBlockBatchId > -- > > Key: SPARK-43242 > URL: https://issues.apache.org/jira/browse/SPARK-43242 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.4 >Reporter: Zhang Liang >Priority: Minor > > Some of our spark app throw "Unexpected type of BlockId" exception as shown > below > According to BlockId.scala, we can found format such as > *shuffle_12_5868_518_523* is type of `ShuffleBlockBatchId`, which is not > handled properly in `ShuffleBlockFetcherIterator.diagnoseCorruption`. > > Moreover, the new exception thrown in `diagnose` swallow the real exception > in certain cases. Since diagnoseCorruption is always used in exception > handling as a side dish, I think it shouldn't throw exception at all > > {code:java} > 23/03/07 03:01:24,485 [task-result-getter-1] WARN TaskSetManager: Lost task > 104.0 in stage 36.0 (TID 6169): java.lang.IllegalArgumentException: > Unexpected type of BlockId, shuffle_12_5868_518_523 at > org.apache.spark.storage.ShuffleBlockFetcherIterator.diagnoseCorruption(ShuffleBlockFetcherIterator.scala:1079)at > > org.apache.spark.storage.BufferReleasingInputStream.$anonfun$tryOrFetchFailedException$1(ShuffleBlockFetcherIterator.scala:1314) > at scala.Option.map(Option.scala:230)at > org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException(ShuffleBlockFetcherIterator.scala:1313) > at > org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:1299) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at > java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at > java.io.BufferedInputStream.read(BufferedInputStream.java:345) at > java.io.DataInputStream.read(DataInputStream.java:149) at > org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899) at > org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733) at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:496) at > scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) > at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.sort_addToSorter_0$(Unknown > Source) at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at > org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:82) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:1065) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:1024) > at > org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:1201) > at > org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:1240) > at > org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:67) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown > Source) at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225) > at > org.apache.spark.sql.execution.SortExec.$anonfun$doExecute$1(SortExec.scala:119) > at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898) > at
[jira] [Commented] (SPARK-44579) Support Interrupt On Cancel in SQLExecution
[ https://issues.apache.org/jira/browse/SPARK-44579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748398#comment-17748398 ] Snoot.io commented on SPARK-44579: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/42199 > Support Interrupt On Cancel in SQLExecution > --- > > Key: SPARK-44579 > URL: https://issues.apache.org/jira/browse/SPARK-44579 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Kent Yao >Priority: Major > > Currently, we support interrupting task threads for users by 1) APIs of the > spark core module, 2) a thrift config for the SQL module. Other Spark SQL > Apps are limited to use this functionality. Specifically, the built-in > spark-sql-shell lacks a user-controlled knob for interrupting task threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44239) Free memory allocated by large vectors when vectors are reset
[ https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748396#comment-17748396 ] Snoot.io commented on SPARK-44239: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/41782 > Free memory allocated by large vectors when vectors are reset > - > > Key: SPARK-44239 > URL: https://issues.apache.org/jira/browse/SPARK-44239 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wan Kun >Priority: Major > Attachments: image-2023-06-29-12-58-12-256.png, > image-2023-06-29-13-03-15-470.png > > > When spark reads a data file into a WritableColumnVector, the memory > allocated by the WritableColumnVectors is not freed until the > VectorizedColumnReader completes. > It will save memory allocation time by reusing the allocated array objects. > But it also takes up too many unused memory after the current large vector > batch has been read. > Add a memory reserve policy for this scenario which will reuse the allocated > array object for small column vectors and free the memory for huge column > vectors. > !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44239) Free memory allocated by large vectors when vectors are reset
[ https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748395#comment-17748395 ] Snoot.io commented on SPARK-44239: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/41782 > Free memory allocated by large vectors when vectors are reset > - > > Key: SPARK-44239 > URL: https://issues.apache.org/jira/browse/SPARK-44239 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Wan Kun >Priority: Major > Attachments: image-2023-06-29-12-58-12-256.png, > image-2023-06-29-13-03-15-470.png > > > When spark reads a data file into a WritableColumnVector, the memory > allocated by the WritableColumnVectors is not freed until the > VectorizedColumnReader completes. > It will save memory allocation time by reusing the allocated array objects. > But it also takes up too many unused memory after the current large vector > batch has been read. > Add a memory reserve policy for this scenario which will reuse the allocated > array object for small column vectors and free the memory for huge column > vectors. > !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44579) Support Interrupt On Cancel in SQLExecution
Kent Yao created SPARK-44579: Summary: Support Interrupt On Cancel in SQLExecution Key: SPARK-44579 URL: https://issues.apache.org/jira/browse/SPARK-44579 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Kent Yao Currently, we support interrupting task threads for users by 1) APIs of the spark core module, 2) a thrift config for the SQL module. Other Spark SQL Apps are limited to use this functionality. Specifically, the built-in spark-sql-shell lacks a user-controlled knob for interrupting task threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44554) Install different Python linter dependencies for daily testing of different Spark versions
[ https://issues.apache.org/jira/browse/SPARK-44554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748390#comment-17748390 ] Snoot.io commented on SPARK-44554: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/42167 > Install different Python linter dependencies for daily testing of different > Spark versions > -- > > Key: SPARK-44554 > URL: https://issues.apache.org/jira/browse/SPARK-44554 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > > Fix daily test python lint check failure for branches 3.3 and 3.4 > > 3.4 : > [https://github.com/apache/spark/actions/runs/5654787844/job/15318633266] > 3.3 : https://github.com/apache/spark/actions/runs/5653655970/job/15315236052 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44287) Define the computing logic through PartitionEvaluator API and use it in RowToColumnarExec & ColumnarToRowExec SQL operators.
[ https://issues.apache.org/jira/browse/SPARK-44287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748389#comment-17748389 ] Snoot.io commented on SPARK-44287: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/42185 > Define the computing logic through PartitionEvaluator API and use it in > RowToColumnarExec & ColumnarToRowExec SQL operators. > - > > Key: SPARK-44287 > URL: https://issues.apache.org/jira/browse/SPARK-44287 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Vinod KC >Assignee: Vinod KC >Priority: Major > Fix For: 3.5.0 > > > > Define the computing logic through PartitionEvaluator API and use it in > RowToColumnarExec & ColumnarToRowExec SQL operators. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44567) Daily GA for Maven testing
[ https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748387#comment-17748387 ] Snoot.io commented on SPARK-44567: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/42197 > Daily GA for Maven testing > -- > > Key: SPARK-44567 > URL: https://issues.apache.org/jira/browse/SPARK-44567 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44567) Daily GA for Maven testing
[ https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748388#comment-17748388 ] Snoot.io commented on SPARK-44567: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/42197 > Daily GA for Maven testing > -- > > Key: SPARK-44567 > URL: https://issues.apache.org/jira/browse/SPARK-44567 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44558) Export Pyspark's Spark Connect Log Level
[ https://issues.apache.org/jira/browse/SPARK-44558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44558. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42175 [https://github.com/apache/spark/pull/42175] > Export Pyspark's Spark Connect Log Level > > > Key: SPARK-44558 > URL: https://issues.apache.org/jira/browse/SPARK-44558 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.1 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Minor > Fix For: 3.5.0, 4.0.0 > > > Export spark connect log level as API function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44558) Export Pyspark's Spark Connect Log Level
[ https://issues.apache.org/jira/browse/SPARK-44558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44558: Assignee: Alice Sayutina > Export Pyspark's Spark Connect Log Level > > > Key: SPARK-44558 > URL: https://issues.apache.org/jira/browse/SPARK-44558 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.4.1 >Reporter: Alice Sayutina >Assignee: Alice Sayutina >Priority: Minor > > Export spark connect log level as API function -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44542) eagerly load SparkExitCode class in SparkUncaughtExceptionHandler
[ https://issues.apache.org/jira/browse/SPARK-44542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YE updated SPARK-44542: --- Summary: eagerly load SparkExitCode class in SparkUncaughtExceptionHandler (was: easily load SparkExitCode class in SparkUncaughtExceptionHandler) > eagerly load SparkExitCode class in SparkUncaughtExceptionHandler > - > > Key: SPARK-44542 > URL: https://issues.apache.org/jira/browse/SPARK-44542 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.3, 3.3.2, 3.4.1 >Reporter: YE >Priority: Major > Attachments: image-2023-07-25-16-46-03-989.png, > image-2023-07-25-16-46-28-158.png, image-2023-07-25-16-46-42-522.png > > > There are two background for this improvement proposal: > 1. When running spark on yarn, the disk might be corrupted during application > running. The corrupted disk might contain the spark jars(cache archive from > spark.yarn.archive). In that case , the executor JVM cannot load any spark > related classes any more. > 2. Spark leverages the OutputCommitCoordinator to avoid data race between > speculate tasks so that no tasks could commit the same partition in the same > time. In other words, once a task's commit request is allowed, other commit > requests would be denied until the committing task is failed. > > We encountered a corner case combined the above two cases, which makes the > spark hangs. A short timeline could be described as below: > # task 5372(tid: 21662) starts running in 21:55 > # the disk contains the spark archive for that task/executor is corrupted, > thus making the archive inaccessible from executor's JVM perspective, it > happened around 22:00 > # the task continues running, at 22:05, it requests commit from coordinator > and performs the commit. > # however due the corrupted disk, some exception raised in the executor JVM. > # The SparkUncaughtExceptionHandler kicks in, however as the jar/disk is > corrupted, the handler itself throws an exception, and the halt process > throws an exception too. > # The executor is hanging there, no more tasks are running. However the > authorized commit request is still valid in the driver side > # Speculate tasks start to click in, due to no commit permission, all > speculate tasks are killed/denied. > # The job is hanging until our SRE killed the container from outside. > Some screenshot are provided below. > !image-2023-07-25-16-46-03-989.png! > !image-2023-07-25-16-46-28-158.png! > !image-2023-07-25-16-46-42-522.png! > For this specific case: I'd like to the propose to eagerly load SparkExitCode > class in the > SparkUncaughtExceptionHandler, so that the halt process could be executed > rather than throws an exception as SparkExitCode is not loadable during the > previous scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44578) Support pushing down UDFs in DSv2
Holden Karau created SPARK-44578: Summary: Support pushing down UDFs in DSv2 Key: SPARK-44578 URL: https://issues.apache.org/jira/browse/SPARK-44578 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Holden Karau Assignee: Holden Karau We should consider trying to add support for pushing down UDFS to the storage engine. While most of the time this might not make sense, some storage engines expose their own UDFS like bucketing or day transformers, which we would ideally push down to them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44198) Support propagation of the log level to the executors
[ https://issues.apache.org/jira/browse/SPARK-44198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros resolved SPARK-44198. Fix Version/s: 4.0.0 Assignee: Vinod KC Resolution: Fixed Issue resolved by pull request 41746 https://github.com/apache/spark/pull/41746 > Support propagation of the log level to the executors > - > > Key: SPARK-44198 > URL: https://issues.apache.org/jira/browse/SPARK-44198 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Vinod KC >Assignee: Vinod KC >Priority: Minor > Fix For: 4.0.0 > > > Currently, the *sc.setLogLevel()* method only sets the log level on the Spark > driver, failing to reflect the desired log level on the executors. This > inconsistency can lead to difficulties in debugging and monitoring Spark > applications, as log messages from the executors may not align with the > expected log level set on the user code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44577) INSERT BY NAME returns non-sensical error message
Serge Rielau created SPARK-44577: Summary: INSERT BY NAME returns non-sensical error message Key: SPARK-44577 URL: https://issues.apache.org/jira/browse/SPARK-44577 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Serge Rielau CREATE TABLE bug(c1 INT); INSERT INTO bug BY NAME SELECT 1 AS c2; ==> Multi-part identifier cannot be empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44425) Validate that session_id is an UUID
[ https://issues.apache.org/jira/browse/SPARK-44425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44425: - Fix Version/s: 4.0.0 > Validate that session_id is an UUID > --- > > Key: SPARK-44425 > URL: https://issues.apache.org/jira/browse/SPARK-44425 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Add validation that session_id is an UUID. This is currently the case in the > clients, so we could make it an requirement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44425) Validate that session_id is an UUID
[ https://issues.apache.org/jira/browse/SPARK-44425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44425. -- Fix Version/s: 3.5.0 Assignee: Juliusz Sompolski Resolution: Fixed Fixed in https://github.com/apache/spark/pull/42150 > Validate that session_id is an UUID > --- > > Key: SPARK-44425 > URL: https://issues.apache.org/jira/browse/SPARK-44425 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Fix For: 3.5.0 > > > Add validation that session_id is an UUID. This is currently the case in the > clients, so we could make it an requirement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44547) BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
[ https://issues.apache.org/jira/browse/SPARK-44547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748327#comment-17748327 ] Ignite TC Bot commented on SPARK-44547: --- User 'ukby1234' has created a pull request for this issue: https://github.com/apache/spark/pull/42155 > BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks > to fallback storage > - > > Key: SPARK-44547 > URL: https://issues.apache.org/jira/browse/SPARK-44547 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Frank Yin >Priority: Major > Attachments: spark-error.log > > > Looks like the RDD cache doesn't support fallback storage and we should stop > the migration if the only viable peer is the fallback storage. > [^spark-error.log] 23/07/25 05:12:58 WARN BlockManager: Failed to replicate > rdd_18_25 to BlockManagerId(fallback, remote, 7337, None), failure #0 > java.io.IOException: Failed to connect to remote:7337 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) > at > org.apache.spark.network.netty.NettyBlockTransferService.uploadBlock(NettyBlockTransferService.scala:168) > at > org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:121) > at > org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1784) > at > org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2(BlockManager.scala:1721) > at > org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2$adapted(BlockManager.scala:1707) > at scala.Option.forall(Option.scala:390) > at > org.apache.spark.storage.BlockManager.replicateBlock(BlockManager.scala:1707) > at > org.apache.spark.storage.BlockManagerDecommissioner.migrateBlock(BlockManagerDecommissioner.scala:356) > at > org.apache.spark.storage.BlockManagerDecommissioner.$anonfun$decommissionRddCacheBlocks$3(BlockManagerDecommissioner.scala:340) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.storage.BlockManagerDecommissioner.decommissionRddCacheBlocks(BlockManagerDecommissioner.scala:339) > at > org.apache.spark.storage.BlockManagerDecommissioner$$anon$1.run(BlockManagerDecommissioner.scala:214) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.base/java.util.concurrent.FutureTask.run(Unknown Source) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.net.UnknownHostException: remote > at java.base/java.net.InetAddress$CachedAddresses.get(Unknown Source) > at java.base/java.net.InetAddress.getAllByName0(Unknown Source) > at java.base/java.net.InetAddress.getAllByName(Unknown Source) > at java.base/java.net.InetAddress.getAllByName(Unknown Source) > at java.base/java.net.InetAddress.getByName(Unknown Source) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at > io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153) > at > io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41) > at > io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61) > at > io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31) > at >
[jira] [Resolved] (SPARK-44560) Improve tests and documentation for Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-44560. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42178 [https://github.com/apache/spark/pull/42178] > Improve tests and documentation for Arrow Python UDF > > > Key: SPARK-44560 > URL: https://issues.apache.org/jira/browse/SPARK-44560 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Test on complex return type > Remove complex return type constraints for Arrow Python UDF on Spark Connect > Update documentation of the related Spark conf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44560) Improve tests and documentation for Arrow Python UDF
[ https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng reassigned SPARK-44560: Assignee: Xinrong Meng > Improve tests and documentation for Arrow Python UDF > > > Key: SPARK-44560 > URL: https://issues.apache.org/jira/browse/SPARK-44560 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > > Test on complex return type > Remove complex return type constraints for Arrow Python UDF on Spark Connect > Update documentation of the related Spark conf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44576) Session Artifact update breaks XXWithState methods in KVGDS
Zhen Li created SPARK-44576: --- Summary: Session Artifact update breaks XXWithState methods in KVGDS Key: SPARK-44576 URL: https://issues.apache.org/jira/browse/SPARK-44576 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 3.5.0 Reporter: Zhen Li When changing the client test jar from system classloader to session classloader (https://github.com/apache/spark/compare/master...zhenlineo:spark:streaming-artifacts?expand=1), all XXWithState test suite failed with class loader errors: e.g. ``` 23/07/25 16:13:14 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 16) (10.8.132.125 executor driver): TaskKilled (Stage cancelled: Job aborted due to stage failure: Task 170 in stage 2.0 failed 1 times, most recent failure: Lost task 170.0 in stage 2.0 (TID 14) (10.8.132.125 executor driver): java.lang.ClassCastException: class org.apache.spark.sql.streaming.ClickState cannot be cast to class org.apache.spark.sql.streaming.ClickState (org.apache.spark.sql.streaming.ClickState is in unnamed module of loader org.apache.spark.util.MutableURLClassLoader @2c604965; org.apache.spark.sql.streaming.ClickState is in unnamed module of loader java.net.URLClassLoader @57751f4) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1514) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486) at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425) at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491) at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:592) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:595) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Driver stacktrace:) 23/07/25 16:13:14 ERROR Utils: Aborting task java.lang.IllegalStateException: Error committing version 1 into HDFSStateStore[id=(op=0,part=5),dir=file:/private/var/folders/b0/f9jmmrrx5js7xsswxyf58nwrgp/T/temporary-02cca002-e189-4e32-afd8-964d6f8d5056/state/0/5] at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:148) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$4(FlatMapGroupsWithStateExec.scala:183) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:611) at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:179) at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:179) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.timeTakenMs(FlatMapGroupsWithStateExec.scala:374) at org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$3(FlatMapGroupsWithStateExec.scala:183) at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at
[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema
[ https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-44479: -- Fix Version/s: 3.5.0 > Support Python UDTFs with empty schema > -- > > Key: SPARK-44479 > URL: https://issues.apache.org/jira/browse/SPARK-44479 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.5.0 > > > Support UDTFs with empty schema, for example: > {code:python} > >>> class TestUDTF: > ... def eval(self): > ... yield tuple() > {code} > Currently it fails with `useArrow=True`: > {code:python} > >>> udtf(TestUDTF, returnType=StructType())().collect() > Traceback (most recent call last): > ... > ValueError: not enough values to unpack (expected 2, got 0) > {code} > whereas without Arrow: > {code:python} > >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect() > [Row()] > {code} > Otherwise, we should raise an error without Arrow, too, to be consistent. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43968) Improve error messages for Python UDTFs with wrong number of outputs
[ https://issues.apache.org/jira/browse/SPARK-43968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-43968. --- Fix Version/s: 4.0.0 Assignee: Allison Wang Resolution: Fixed Issue resolved by pull request 42157 https://github.com/apache/spark/pull/42157 > Improve error messages for Python UDTFs with wrong number of outputs > > > Key: SPARK-43968 > URL: https://issues.apache.org/jira/browse/SPARK-43968 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > Improve the error messages for Python UDTFs when the number of outputs > mismatches the number of outputs specified in the return type of the UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44575) Implement Error Translation
Yihong He created SPARK-44575: - Summary: Implement Error Translation Key: SPARK-44575 URL: https://issues.apache.org/jira/browse/SPARK-44575 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Yihong He -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44559) Improve error messages for Python UDTF arrow type casts
[ https://issues.apache.org/jira/browse/SPARK-44559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44559: - Summary: Improve error messages for Python UDTF arrow type casts (was: Improve error messages for invalid Python UDTF arrow type casts) > Improve error messages for Python UDTF arrow type casts > --- > > Key: SPARK-44559 > URL: https://issues.apache.org/jira/browse/SPARK-44559 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Priority: Major > > Currently, if a Python UDTF outputs a type that is incompatible with the > specified output schema, Spark will throw the following confusing error > message: > {code:java} > File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas > File "pyarrow/array.pxi", line 316, in pyarrow.lib.array > File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array > File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Could not convert [1, 2] with type list: tried to > convert to int32{code} > We should improve this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44574) Errors that moved into sq/api should also use Analysis
Rui Wang created SPARK-44574: Summary: Errors that moved into sq/api should also use Analysis Key: SPARK-44574 URL: https://issues.apache.org/jira/browse/SPARK-44574 Project: Spark Issue Type: Sub-task Components: Connect, SQL Affects Versions: 3.5.0 Reporter: Rui Wang Assignee: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44507) SCSC does not depend on AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-44507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang resolved SPARK-44507. -- Resolution: Won't Fix > SCSC does not depend on AnalysisException > - > > Key: SPARK-44507 > URL: https://issues.apache.org/jira/browse/SPARK-44507 > Project: Spark > Issue Type: Sub-task > Components: Connect, SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
Siddaraju G C created SPARK-44573: - Summary: Couldn't submit Spark application to Kubenetes in versions v1.27.3 Key: SPARK-44573 URL: https://issues.apache.org/jira/browse/SPARK-44573 Project: Spark Issue Type: Bug Components: Kubernetes, Spark Submit Affects Versions: 3.4.1 Reporter: Siddaraju G C Spark-submit ( cluster mode on Kubernetes ) results error *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s cluster. Steps followed: * using IBM cloud, created 3 Instances * 1st Instance act as master node and another two acts as worker nodes {noformat} root@vsi-spark-master:/opt# kubectl get nodes NAME STATUS ROLES AGE VERSION vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 vsi-spark-worker-2 Ready 47h v1.27.3+k3s1{noformat} * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder * Ran spark by using below command {noformat} root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master k8s://http://:6443 --conf spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=5 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} * And getting below error message. {noformat} 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image. 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" first. It should be yes. Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Connection reset at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:711) at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93) at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) ... 15 more Caused by: java.net.SocketException: Connection reset at
[jira] [Assigned] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain
[ https://issues.apache.org/jira/browse/SPARK-44505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44505: --- Assignee: Martin Grund > DataSource v2 Scans should not require planning the input partitions on > explain > --- > > Key: SPARK-44505 > URL: https://issues.apache.org/jira/browse/SPARK-44505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > > Right now, we will always call `planInputPartitions()` for a DSv2 > implementation even if there is no spark job run but only explain. > We should provide a way to avoid scanning all input partitions just to > determine if the input is columnar or not. The scan should provide an > override. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain
[ https://issues.apache.org/jira/browse/SPARK-44505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44505. - Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 42099 [https://github.com/apache/spark/pull/42099] > DataSource v2 Scans should not require planning the input partitions on > explain > --- > > Key: SPARK-44505 > URL: https://issues.apache.org/jira/browse/SPARK-44505 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.5.0 > > > Right now, we will always call `planInputPartitions()` for a DSv2 > implementation even if there is no spark job run but only explain. > We should provide a way to avoid scanning all input partitions just to > determine if the input is columnar or not. The scan should provide an > override. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44567) Daily GA for Maven testing
[ https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748109#comment-17748109 ] Yang Jie commented on SPARK-44567: -- I've tried this [before|https://github.com/apache/spark/pull/41529], let me take a look at this again > Daily GA for Maven testing > -- > > Key: SPARK-44567 > URL: https://issues.apache.org/jira/browse/SPARK-44567 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44566) Spark CI Improvement
[ https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748101#comment-17748101 ] Ruifeng Zheng commented on SPARK-44566: --- also cc [~panbingkun] [~dongjoon] [~yikunkero] > Spark CI Improvement > > > Key: SPARK-44566 > URL: https://issues.apache.org/jira/browse/SPARK-44566 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > > I have an offline discussion with [~gurwls223] and [~LuciferYang], and we > think that several points should be improved: > # it should be tested with Maven > # all supported Python Versions should be tested > # clean up unused files ASAP, since the testing resource is quite limited > To avoid increase the workload too much, we can add daily GA first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44566) Spark CI Improvement
[ https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-44566: -- Description: I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think that several points should be improved: # it should be tested with Maven # all supported Python Versions should be tested # clean up unused files ASAP, since the testing resource is quite limited To avoid increase the workload too much, we can add daily GA first. was: I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think that several points should be improved: # it should be tested with Maven # all supported Python Versions should be tested To avoid increase the workload too much, we can add daily GA first. > Spark CI Improvement > > > Key: SPARK-44566 > URL: https://issues.apache.org/jira/browse/SPARK-44566 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > > I have an offline discussion with [~gurwls223] and [~LuciferYang], and we > think that several points should be improved: > # it should be tested with Maven > # all supported Python Versions should be tested > # clean up unused files ASAP, since the testing resource is quite limited > To avoid increase the workload too much, we can add daily GA first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44572) Clean up unused files ASAP
Ruifeng Zheng created SPARK-44572: - Summary: Clean up unused files ASAP Key: SPARK-44572 URL: https://issues.apache.org/jira/browse/SPARK-44572 Project: Spark Issue Type: Sub-task Components: Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44566) Spark CI Improvement
[ https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-44566: -- Description: I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think that several points should be improved: # it should be tested with Maven # all supported Python Versions should be tested To avoid increase the workload too much, we can add daily GA first. > Spark CI Improvement > > > Key: SPARK-44566 > URL: https://issues.apache.org/jira/browse/SPARK-44566 > Project: Spark > Issue Type: Umbrella > Components: Build, Project Infra, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > > I have an offline discussion with [~gurwls223] and [~LuciferYang], and we > think that several points should be improved: > # it should be tested with Maven > # all supported Python Versions should be tested > To avoid increase the workload too much, we can add daily GA first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44571) Eliminate the Join by combine multiple Aggregates
[ https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-44571: --- Summary: Eliminate the Join by combine multiple Aggregates (was: Eliminate the Join by Combine multiple Aggregates) > Eliminate the Join by combine multiple Aggregates > - > > Key: SPARK-44571 > URL: https://issues.apache.org/jira/browse/SPARK-44571 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > Recently, I investigate the test case q28 which is belong to the TPC-DS > queries. > The query contains multiple scalar subquery with aggregation and connected > with inner join. > If we can merge the filters and aggregates, we can scan data source only once > and eliminate the join so as avoid shuffle. Obviously, this change will > improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates
[ https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748084#comment-17748084 ] jiaan.geng commented on SPARK-44571: I'm working on. > Eliminate the Join by Combine multiple Aggregates > - > > Key: SPARK-44571 > URL: https://issues.apache.org/jira/browse/SPARK-44571 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > Recently, I investigate the test case q28 which is belong to the TPC-DS > queries. > The query contains multiple scalar subquery with aggregation and connected > with inner join. > If we can merge the filters and aggregates, we can scan data source only once > and eliminate the join so as avoid shuffle. Obviously, this change will > improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates
jiaan.geng created SPARK-44571: -- Summary: Eliminate the Join by Combine multiple Aggregates Key: SPARK-44571 URL: https://issues.apache.org/jira/browse/SPARK-44571 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng Recently, I investigate the test case q28 which is belong to the TPC-DS queries. The query contains multiple scalar subquery with aggregation and connected with inner join. If we can merge the filters and aggregates, we can scan data source only once and eliminate the join so as avoid shuffle. Obviously, this change will improve the performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44570) Reuse spark build among pyspark-* modules
Ruifeng Zheng created SPARK-44570: - Summary: Reuse spark build among pyspark-* modules Key: SPARK-44570 URL: https://issues.apache.org/jira/browse/SPARK-44570 Project: Spark Issue Type: Sub-task Components: Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng In every `PySpark-*` test modules, it needs to build the spark with sbt/maven and normally takes 20~30 minutes. Maybe we could build it once, and then use it in all related test modules. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44569) Daily GA for Python 3.11
Ruifeng Zheng created SPARK-44569: - Summary: Daily GA for Python 3.11 Key: SPARK-44569 URL: https://issues.apache.org/jira/browse/SPARK-44569 Project: Spark Issue Type: Sub-task Components: Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44568) Daily GA for Python 3.10
Ruifeng Zheng created SPARK-44568: - Summary: Daily GA for Python 3.10 Key: SPARK-44568 URL: https://issues.apache.org/jira/browse/SPARK-44568 Project: Spark Issue Type: Sub-task Components: Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44567) Daily GA for Maven testing
Ruifeng Zheng created SPARK-44567: - Summary: Daily GA for Maven testing Key: SPARK-44567 URL: https://issues.apache.org/jira/browse/SPARK-44567 Project: Spark Issue Type: Sub-task Components: Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44566) Spark CI Improvement
Ruifeng Zheng created SPARK-44566: - Summary: Spark CI Improvement Key: SPARK-44566 URL: https://issues.apache.org/jira/browse/SPARK-44566 Project: Spark Issue Type: Umbrella Components: Build, Project Infra, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44454) HiveShim getTablesByType support fallback
[ https://issues.apache.org/jira/browse/SPARK-44454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-44454: --- Assignee: dzcxzl > HiveShim getTablesByType support fallback > - > > Key: SPARK-44454 > URL: https://issues.apache.org/jira/browse/SPARK-44454 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > > When we use a high version of Hive Client to communicate with a low version > of Hive meta store, we may encounter Invalid method name: > 'get_tables_by_type'. > > {code:java} > 23/07/17 12:45:24,391 [main] DEBUG SparkSqlParser: Parsing command: show views > 23/07/17 12:45:24,489 [main] ERROR log: Got exception: > org.apache.thrift.TApplicationException Invalid method name: > 'get_tables_by_type' > org.apache.thrift.TApplicationException: Invalid method name: > 'get_tables_by_type' > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables_by_type(ThriftHiveMetastore.java:1433) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1418) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1411) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) > at com.sun.proxy.$Proxy23.getTables(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2344) > at com.sun.proxy.$Proxy23.getTables(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1427) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.sql.hive.client.Shim_v2_3.getTablesByType(HiveShim.scala:1408) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$listTablesByType$1(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274) > at > org.apache.spark.sql.hive.client.HiveClientImpl.listTablesByType(HiveClientImpl.scala:785) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listViews$1(HiveExternalCatalog.scala:895) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108) > at > org.apache.spark.sql.hive.HiveExternalCatalog.listViews(HiveExternalCatalog.scala:893) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listViews(ExternalCatalogWithListener.scala:158) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listViews(SessionCatalog.scala:1040) > at > org.apache.spark.sql.execution.command.ShowViewsCommand.$anonfun$run$5(views.scala:407) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.command.ShowViewsCommand.run(views.scala:407) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44454) HiveShim getTablesByType support fallback
[ https://issues.apache.org/jira/browse/SPARK-44454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-44454. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42033 [https://github.com/apache/spark/pull/42033] > HiveShim getTablesByType support fallback > - > > Key: SPARK-44454 > URL: https://issues.apache.org/jira/browse/SPARK-44454 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.1 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Minor > Fix For: 4.0.0 > > > When we use a high version of Hive Client to communicate with a low version > of Hive meta store, we may encounter Invalid method name: > 'get_tables_by_type'. > > {code:java} > 23/07/17 12:45:24,391 [main] DEBUG SparkSqlParser: Parsing command: show views > 23/07/17 12:45:24,489 [main] ERROR log: Got exception: > org.apache.thrift.TApplicationException Invalid method name: > 'get_tables_by_type' > org.apache.thrift.TApplicationException: Invalid method name: > 'get_tables_by_type' > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables_by_type(ThriftHiveMetastore.java:1433) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1418) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1411) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173) > at com.sun.proxy.$Proxy23.getTables(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2344) > at com.sun.proxy.$Proxy23.getTables(Unknown Source) > at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1427) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.sql.hive.client.Shim_v2_3.getTablesByType(HiveShim.scala:1408) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$listTablesByType$1(HiveClientImpl.scala:789) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274) > at > org.apache.spark.sql.hive.client.HiveClientImpl.listTablesByType(HiveClientImpl.scala:785) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listViews$1(HiveExternalCatalog.scala:895) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108) > at > org.apache.spark.sql.hive.HiveExternalCatalog.listViews(HiveExternalCatalog.scala:893) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listViews(ExternalCatalogWithListener.scala:158) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listViews(SessionCatalog.scala:1040) > at > org.apache.spark.sql.execution.command.ShowViewsCommand.$anonfun$run$5(views.scala:407) > at scala.Option.getOrElse(Option.scala:189) > at > org.apache.spark.sql.execution.command.ShowViewsCommand.run(views.scala:407) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44536) Upgrade sbt to 1.9.3
[ https://issues.apache.org/jira/browse/SPARK-44536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44536: Assignee: BingKun Pan > Upgrade sbt to 1.9.3 > > > Key: SPARK-44536 > URL: https://issues.apache.org/jira/browse/SPARK-44536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44536) Upgrade sbt to 1.9.3
[ https://issues.apache.org/jira/browse/SPARK-44536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44536. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42141 [https://github.com/apache/spark/pull/42141] > Upgrade sbt to 1.9.3 > > > Key: SPARK-44536 > URL: https://issues.apache.org/jira/browse/SPARK-44536 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44482) Connect server should can specify the bind address
[ https://issues.apache.org/jira/browse/SPARK-44482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44482: Assignee: BingKun Pan > Connect server should can specify the bind address > -- > > Key: SPARK-44482 > URL: https://issues.apache.org/jira/browse/SPARK-44482 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44482) Connect server should can specify the bind address
[ https://issues.apache.org/jira/browse/SPARK-44482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44482. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42073 [https://github.com/apache/spark/pull/42073] > Connect server should can specify the bind address > -- > > Key: SPARK-44482 > URL: https://issues.apache.org/jira/browse/SPARK-44482 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44513) Upgrade snappy-java to 1.1.10.3
[ https://issues.apache.org/jira/browse/SPARK-44513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44513: Fix Version/s: 3.4.2 > Upgrade snappy-java to 1.1.10.3 > --- > > Key: SPARK-44513 > URL: https://issues.apache.org/jira/browse/SPARK-44513 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.1 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 3.4.2, 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44538) Remove ToJsonUtil
[ https://issues.apache.org/jira/browse/SPARK-44538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747779#comment-17747779 ] Nikita Awasthi commented on SPARK-44538: User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/42164 > Remove ToJsonUtil > - > > Key: SPARK-44538 > URL: https://issues.apache.org/jira/browse/SPARK-44538 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org