[jira] [Resolved] (SPARK-44548) Add support for pandas DataFrame assertDataFrameEqual

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44548.
--
Resolution: Fixed

Issue resolved by pull request 42158
[https://github.com/apache/spark/pull/42158]

> Add support for pandas DataFrame assertDataFrameEqual
> -
>
> Key: SPARK-44548
> URL: https://issues.apache.org/jira/browse/SPARK-44548
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44580) RocksDB crashed when testing in GitHub Actions

2023-07-27 Thread Yang Jie (Jira)
Yang Jie created SPARK-44580:


 Summary: RocksDB crashed when testing in GitHub Actions
 Key: SPARK-44580
 URL: https://issues.apache.org/jira/browse/SPARK-44580
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.5.0, 4.0.0
Reporter: Yang Jie


[https://github.com/LuciferYang/spark/actions/runs/5666554831/job/15395578871]

 
{code:java}
#
17177# A fatal error has been detected by the Java Runtime Environment:
17178#
17179#  SIGSEGV (0xb) at pc=0x7f8a077d2743, pid=4403, tid=0x7f89cadff640
17180#
17181# JRE version: OpenJDK Runtime Environment (8.0_372-b07) (build 
1.8.0_372-b07)
17182# Java VM: OpenJDK 64-Bit Server VM (25.372-b07 mixed mode linux-amd64 
compressed oops)
17183# Problematic frame:
17184# C  [librocksdbjni886380103972770161.so+0x3d2743]  
rocksdb::DBImpl::FailIfCfHasTs(rocksdb::ColumnFamilyHandle const*) const+0x23
17185#
17186# Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
17187#
17188# An error report file with more information is saved as:
17189# /home/runner/work/spark/spark/sql/core/hs_err_pid4403.log
17190#
17191# If you would like to submit a bug report, please visit:
17192#   https://github.com/adoptium/adoptium-support/issues
17193# The crash happened outside the Java Virtual Machine in native code.
17194# See problematic frame for where to report the bug.
17195# {code}
 

This is my first time encountering this problem, and I am  unsure of the root 
cause now

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable

2023-07-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42098:
---

Assignee: Jia Fan

> ResolveInlineTables should handle RuntimeReplaceable
> 
>
> Key: SPARK-42098
> URL: https://issues.apache.org/jira/browse/SPARK-42098
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> spark-sql> VALUES (try_divide(5, 0));
> cannot evaluate expression try_divide(5, 0) in inline table definition; line 
> 1 pos 8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42098) ResolveInlineTables should handle RuntimeReplaceable

2023-07-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42098.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42110
[https://github.com/apache/spark/pull/42110]

> ResolveInlineTables should handle RuntimeReplaceable
> 
>
> Key: SPARK-42098
> URL: https://issues.apache.org/jira/browse/SPARK-42098
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.1
>Reporter: Wenchen Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> spark-sql> VALUES (try_divide(5, 0));
> cannot evaluate expression try_divide(5, 0) in inline table definition; line 
> 1 pos 8



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43242) diagnoseCorruption should not throw Unexpected type of BlockId for ShuffleBlockBatchId

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748400#comment-17748400
 ] 

Snoot.io commented on SPARK-43242:
--

User 'CavemanIV' has created a pull request for this issue:
https://github.com/apache/spark/pull/40921

> diagnoseCorruption should not throw Unexpected type of BlockId for 
> ShuffleBlockBatchId
> --
>
> Key: SPARK-43242
> URL: https://issues.apache.org/jira/browse/SPARK-43242
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.4
>Reporter: Zhang Liang
>Priority: Minor
>
> Some of our spark app throw "Unexpected type of BlockId" exception as shown 
> below
> According to BlockId.scala, we can found format such as 
> *shuffle_12_5868_518_523* is type of `ShuffleBlockBatchId`, which is not 
> handled properly in `ShuffleBlockFetcherIterator.diagnoseCorruption`.
>  
> Moreover, the new exception thrown in `diagnose` swallow the real exception 
> in certain cases. Since diagnoseCorruption is always used in exception 
> handling as a side dish, I think it shouldn't throw exception at all
>  
> {code:java}
> 23/03/07 03:01:24,485 [task-result-getter-1] WARN TaskSetManager: Lost task 
> 104.0 in stage 36.0 (TID 6169): java.lang.IllegalArgumentException: 
> Unexpected type of BlockId, shuffle_12_5868_518_523 at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.diagnoseCorruption(ShuffleBlockFetcherIterator.scala:1079)at
>  
> org.apache.spark.storage.BufferReleasingInputStream.$anonfun$tryOrFetchFailedException$1(ShuffleBlockFetcherIterator.scala:1314)
>  at scala.Option.map(Option.scala:230)at 
> org.apache.spark.storage.BufferReleasingInputStream.tryOrFetchFailedException(ShuffleBlockFetcherIterator.scala:1313)
>  at 
> org.apache.spark.storage.BufferReleasingInputStream.read(ShuffleBlockFetcherIterator.scala:1299)
>  at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) at 
> java.io.BufferedInputStream.read1(BufferedInputStream.java:286) at 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345) at 
> java.io.DataInputStream.read(DataInputStream.java:149) at 
> org.sparkproject.guava.io.ByteStreams.read(ByteStreams.java:899) at 
> org.sparkproject.guava.io.ByteStreams.readFully(ByteStreams.java:733) at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127)
>  at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:496) at 
> scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40) 
> at scala.collection.Iterator$$anon$10.next(Iterator.scala:461) at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.sort_addToSorter_0$(Unknown
>  Source) at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>  at 
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:82)
>  at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:1065)
>  at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextOuterJoinRows(SortMergeJoinExec.scala:1024)
>  at 
> org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceStream(SortMergeJoinExec.scala:1201)
>  at 
> org.apache.spark.sql.execution.joins.OneSideOuterIterator.advanceNext(SortMergeJoinExec.scala:1240)
>  at 
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:67)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage9.processNext(Unknown
>  Source) at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
>  at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:225)
>  at 
> org.apache.spark.sql.execution.SortExec.$anonfun$doExecute$1(SortExec.scala:119)
>  at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898) 
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
>  at 

[jira] [Commented] (SPARK-44579) Support Interrupt On Cancel in SQLExecution

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748398#comment-17748398
 ] 

Snoot.io commented on SPARK-44579:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/42199

> Support Interrupt On Cancel in SQLExecution
> ---
>
> Key: SPARK-44579
> URL: https://issues.apache.org/jira/browse/SPARK-44579
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Priority: Major
>
> Currently, we support interrupting task threads for users by 1) APIs of the 
> spark core module, 2) a thrift config for the SQL module.  Other Spark SQL 
> Apps are limited to use this functionality. Specifically,  the built-in 
> spark-sql-shell lacks a user-controlled knob for interrupting task threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44239) Free memory allocated by large vectors when vectors are reset

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748396#comment-17748396
 ] 

Snoot.io commented on SPARK-44239:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/41782

> Free memory allocated by large vectors when vectors are reset
> -
>
> Key: SPARK-44239
> URL: https://issues.apache.org/jira/browse/SPARK-44239
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wan Kun
>Priority: Major
> Attachments: image-2023-06-29-12-58-12-256.png, 
> image-2023-06-29-13-03-15-470.png
>
>
> When spark reads a data file into a WritableColumnVector, the memory 
> allocated by the WritableColumnVectors is not freed until the 
> VectorizedColumnReader completes.
> It will save memory allocation time by reusing the allocated array objects. 
> But it also takes up too many unused memory after the current large vector 
> batch has been read.
> Add a memory reserve policy for this scenario which will reuse the allocated 
> array object for small column vectors and free the memory for huge column 
> vectors.
> !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44239) Free memory allocated by large vectors when vectors are reset

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748395#comment-17748395
 ] 

Snoot.io commented on SPARK-44239:
--

User 'wankunde' has created a pull request for this issue:
https://github.com/apache/spark/pull/41782

> Free memory allocated by large vectors when vectors are reset
> -
>
> Key: SPARK-44239
> URL: https://issues.apache.org/jira/browse/SPARK-44239
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wan Kun
>Priority: Major
> Attachments: image-2023-06-29-12-58-12-256.png, 
> image-2023-06-29-13-03-15-470.png
>
>
> When spark reads a data file into a WritableColumnVector, the memory 
> allocated by the WritableColumnVectors is not freed until the 
> VectorizedColumnReader completes.
> It will save memory allocation time by reusing the allocated array objects. 
> But it also takes up too many unused memory after the current large vector 
> batch has been read.
> Add a memory reserve policy for this scenario which will reuse the allocated 
> array object for small column vectors and free the memory for huge column 
> vectors.
> !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44579) Support Interrupt On Cancel in SQLExecution

2023-07-27 Thread Kent Yao (Jira)
Kent Yao created SPARK-44579:


 Summary: Support Interrupt On Cancel in SQLExecution
 Key: SPARK-44579
 URL: https://issues.apache.org/jira/browse/SPARK-44579
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Kent Yao


Currently, we support interrupting task threads for users by 1) APIs of the 
spark core module, 2) a thrift config for the SQL module.  Other Spark SQL Apps 
are limited to use this functionality. Specifically,  the built-in 
spark-sql-shell lacks a user-controlled knob for interrupting task threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44554) Install different Python linter dependencies for daily testing of different Spark versions

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748390#comment-17748390
 ] 

Snoot.io commented on SPARK-44554:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42167

> Install different Python linter dependencies for daily testing of different 
> Spark versions
> --
>
> Key: SPARK-44554
> URL: https://issues.apache.org/jira/browse/SPARK-44554
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>
> Fix daily test python lint check failure for branches 3.3 and 3.4
>  
> 3.4 : 
> [https://github.com/apache/spark/actions/runs/5654787844/job/15318633266]
> 3.3 : https://github.com/apache/spark/actions/runs/5653655970/job/15315236052



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44287) Define the computing logic through PartitionEvaluator API and use it in RowToColumnarExec & ColumnarToRowExec SQL operators.

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748389#comment-17748389
 ] 

Snoot.io commented on SPARK-44287:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/42185

> Define the computing logic through  PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.
> -
>
> Key: SPARK-44287
> URL: https://issues.apache.org/jira/browse/SPARK-44287
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Assignee: Vinod KC
>Priority: Major
> Fix For: 3.5.0
>
>
>    
> Define the computing logic through PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44567) Daily GA for Maven testing

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748387#comment-17748387
 ] 

Snoot.io commented on SPARK-44567:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42197

> Daily GA for Maven testing
> --
>
> Key: SPARK-44567
> URL: https://issues.apache.org/jira/browse/SPARK-44567
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44567) Daily GA for Maven testing

2023-07-27 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748388#comment-17748388
 ] 

Snoot.io commented on SPARK-44567:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42197

> Daily GA for Maven testing
> --
>
> Key: SPARK-44567
> URL: https://issues.apache.org/jira/browse/SPARK-44567
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44558) Export Pyspark's Spark Connect Log Level

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44558.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42175
[https://github.com/apache/spark/pull/42175]

> Export Pyspark's Spark Connect Log Level
> 
>
> Key: SPARK-44558
> URL: https://issues.apache.org/jira/browse/SPARK-44558
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.1
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Minor
> Fix For: 3.5.0, 4.0.0
>
>
> Export spark connect log level as API function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44558) Export Pyspark's Spark Connect Log Level

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44558:


Assignee: Alice Sayutina

> Export Pyspark's Spark Connect Log Level
> 
>
> Key: SPARK-44558
> URL: https://issues.apache.org/jira/browse/SPARK-44558
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.1
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Minor
>
> Export spark connect log level as API function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44542) eagerly load SparkExitCode class in SparkUncaughtExceptionHandler

2023-07-27 Thread YE (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

YE updated SPARK-44542:
---
Summary: eagerly load SparkExitCode class in SparkUncaughtExceptionHandler  
(was: easily load SparkExitCode class in SparkUncaughtExceptionHandler)

> eagerly load SparkExitCode class in SparkUncaughtExceptionHandler
> -
>
> Key: SPARK-44542
> URL: https://issues.apache.org/jira/browse/SPARK-44542
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.3.2, 3.4.1
>Reporter: YE
>Priority: Major
> Attachments: image-2023-07-25-16-46-03-989.png, 
> image-2023-07-25-16-46-28-158.png, image-2023-07-25-16-46-42-522.png
>
>
> There are two background for this improvement proposal:
> 1. When running spark on yarn, the disk might be corrupted during application 
> running. The corrupted disk might contain the spark jars(cache archive from 
> spark.yarn.archive). In that case , the executor JVM cannot load any spark 
> related classes any more.
> 2. Spark leverages the OutputCommitCoordinator to avoid data race between 
> speculate tasks so that no tasks could commit the same partition in the same 
> time. In other words, once a task's commit request is allowed, other commit 
> requests would be denied until the committing task is failed.
>  
> We encountered a corner case combined the above two cases, which makes the 
> spark hangs.  A short timeline could be described as below:
>  # task 5372(tid: 21662) starts running in 21:55
>  # the disk contains the spark archive for that task/executor is corrupted, 
> thus making the archive inaccessible from executor's JVM perspective, it 
> happened around 22:00
>  # the task continues running, at 22:05, it requests commit from coordinator 
> and performs the commit. 
>  # however due the corrupted disk, some exception raised in the executor JVM.
>  # The SparkUncaughtExceptionHandler kicks in, however as the jar/disk is 
> corrupted, the handler itself throws an exception, and the halt process 
> throws an exception too.
>  # The executor is hanging there, no more tasks are running. However the 
> authorized commit request is still valid in the driver side
>  # Speculate tasks start to click in, due to no commit permission, all 
> speculate tasks are killed/denied.
>  # The job is hanging until our SRE killed the container from outside.
> Some screenshot are provided below.
> !image-2023-07-25-16-46-03-989.png!
> !image-2023-07-25-16-46-28-158.png!
> !image-2023-07-25-16-46-42-522.png!
> For this specific case: I'd like to the propose to eagerly load SparkExitCode 
> class in the 
> SparkUncaughtExceptionHandler, so that the halt process could be executed 
> rather than throws an exception as SparkExitCode is not loadable during the 
> previous scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44578) Support pushing down UDFs in DSv2

2023-07-27 Thread Holden Karau (Jira)
Holden Karau created SPARK-44578:


 Summary: Support pushing down UDFs in DSv2
 Key: SPARK-44578
 URL: https://issues.apache.org/jira/browse/SPARK-44578
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0, 4.0.0
Reporter: Holden Karau
Assignee: Holden Karau


We should consider trying to add support for pushing down UDFS to the storage 
engine. While most of the time this might not make sense, some storage engines 
expose their own UDFS like bucketing or day transformers, which we would 
ideally push down to them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44198) Support propagation of the log level to the executors

2023-07-27 Thread Attila Zsolt Piros (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Zsolt Piros resolved SPARK-44198.

Fix Version/s: 4.0.0
 Assignee: Vinod KC
   Resolution: Fixed

Issue resolved by pull request 41746
https://github.com/apache/spark/pull/41746

> Support propagation of the log level to the executors
> -
>
> Key: SPARK-44198
> URL: https://issues.apache.org/jira/browse/SPARK-44198
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Assignee: Vinod KC
>Priority: Minor
> Fix For: 4.0.0
>
>
> Currently, the *sc.setLogLevel()* method only sets the log level on the Spark 
> driver, failing to reflect the desired log level on the executors. This 
> inconsistency can lead to difficulties in debugging and monitoring Spark 
> applications, as log messages from the executors may not align with the 
> expected log level set on the user code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44577) INSERT BY NAME returns non-sensical error message

2023-07-27 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44577:


 Summary: INSERT BY NAME returns non-sensical error message
 Key: SPARK-44577
 URL: https://issues.apache.org/jira/browse/SPARK-44577
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


CREATE TABLE bug(c1 INT);

INSERT INTO bug BY NAME SELECT 1 AS c2;

==> Multi-part identifier cannot be empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44425) Validate that session_id is an UUID

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-44425:
-
Fix Version/s: 4.0.0

> Validate that session_id is an UUID
> ---
>
> Key: SPARK-44425
> URL: https://issues.apache.org/jira/browse/SPARK-44425
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Add validation that session_id is an UUID. This is currently the case in the 
> clients, so we could make it an requirement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44425) Validate that session_id is an UUID

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44425.
--
Fix Version/s: 3.5.0
 Assignee: Juliusz Sompolski
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/42150

> Validate that session_id is an UUID
> ---
>
> Key: SPARK-44425
> URL: https://issues.apache.org/jira/browse/SPARK-44425
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.5.0
>
>
> Add validation that session_id is an UUID. This is currently the case in the 
> clients, so we could make it an requirement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44547) BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage

2023-07-27 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748327#comment-17748327
 ] 

Ignite TC Bot commented on SPARK-44547:
---

User 'ukby1234' has created a pull request for this issue:
https://github.com/apache/spark/pull/42155

> BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks 
> to fallback storage
> -
>
> Key: SPARK-44547
> URL: https://issues.apache.org/jira/browse/SPARK-44547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Frank Yin
>Priority: Major
> Attachments: spark-error.log
>
>
> Looks like the RDD cache doesn't support fallback storage and we should stop 
> the migration if the only viable peer is the fallback storage. 
>   [^spark-error.log] 23/07/25 05:12:58 WARN BlockManager: Failed to replicate 
> rdd_18_25 to BlockManagerId(fallback, remote, 7337, None), failure #0
> java.io.IOException: Failed to connect to remote:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
>   at 
> org.apache.spark.network.netty.NettyBlockTransferService.uploadBlock(NettyBlockTransferService.scala:168)
>   at 
> org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:121)
>   at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1784)
>   at 
> org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2(BlockManager.scala:1721)
>   at 
> org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2$adapted(BlockManager.scala:1707)
>   at scala.Option.forall(Option.scala:390)
>   at 
> org.apache.spark.storage.BlockManager.replicateBlock(BlockManager.scala:1707)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.migrateBlock(BlockManagerDecommissioner.scala:356)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.$anonfun$decommissionRddCacheBlocks$3(BlockManagerDecommissioner.scala:340)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.decommissionRddCacheBlocks(BlockManagerDecommissioner.scala:339)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner$$anon$1.run(BlockManagerDecommissioner.scala:214)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>   at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.net.UnknownHostException: remote
>   at java.base/java.net.InetAddress$CachedAddresses.get(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName0(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName(Unknown Source)
>   at java.base/java.net.InetAddress.getByName(Unknown Source)
>   at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
>   at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at 
> io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
>   at 
> io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
>   at 
> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
>   at 
> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
>   at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
>   at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
>   at 
> 

[jira] [Resolved] (SPARK-44560) Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng resolved SPARK-44560.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42178
[https://github.com/apache/spark/pull/42178]

> Improve tests and documentation for Arrow Python UDF
> 
>
> Key: SPARK-44560
> URL: https://issues.apache.org/jira/browse/SPARK-44560
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> Test on complex return type
> Remove complex return type constraints for Arrow Python UDF on Spark Connect
> Update documentation of the related Spark conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44560) Improve tests and documentation for Arrow Python UDF

2023-07-27 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng reassigned SPARK-44560:


Assignee: Xinrong Meng

> Improve tests and documentation for Arrow Python UDF
> 
>
> Key: SPARK-44560
> URL: https://issues.apache.org/jira/browse/SPARK-44560
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test on complex return type
> Remove complex return type constraints for Arrow Python UDF on Spark Connect
> Update documentation of the related Spark conf



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44576) Session Artifact update breaks XXWithState methods in KVGDS

2023-07-27 Thread Zhen Li (Jira)
Zhen Li created SPARK-44576:
---

 Summary: Session Artifact update breaks XXWithState methods in 
KVGDS
 Key: SPARK-44576
 URL: https://issues.apache.org/jira/browse/SPARK-44576
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Zhen Li


When changing the client test jar from system classloader to session 
classloader 
(https://github.com/apache/spark/compare/master...zhenlineo:spark:streaming-artifacts?expand=1),
 all XXWithState test suite failed with class loader errors: e.g.
```
23/07/25 16:13:14 WARN TaskSetManager: Lost task 1.0 in stage 2.0 (TID 16) 
(10.8.132.125 executor driver): TaskKilled (Stage cancelled: Job aborted due to 
stage failure: Task 170 in stage 2.0 failed 1 times, most recent failure: Lost 
task 170.0 in stage 2.0 (TID 14) (10.8.132.125 executor driver): 
java.lang.ClassCastException: class org.apache.spark.sql.streaming.ClickState 
cannot be cast to class org.apache.spark.sql.streaming.ClickState 
(org.apache.spark.sql.streaming.ClickState is in unnamed module of loader 
org.apache.spark.util.MutableURLClassLoader @2c604965; 
org.apache.spark.sql.streaming.ClickState is in unnamed module of loader 
java.net.URLClassLoader @57751f4)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.$anonfun$run$1(WriteToDataSourceV2Exec.scala:441)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1514)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:486)
at 
org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:425)
at 
org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:491)
at 
org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:388)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:592)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1480)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:595)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)

Driver stacktrace:)
23/07/25 16:13:14 ERROR Utils: Aborting task
java.lang.IllegalStateException: Error committing version 1 into 
HDFSStateStore[id=(op=0,part=5),dir=file:/private/var/folders/b0/f9jmmrrx5js7xsswxyf58nwrgp/T/temporary-02cca002-e189-4e32-afd8-964d6f8d5056/state/0/5]
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.commit(HDFSBackedStateStoreProvider.scala:148)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$4(FlatMapGroupsWithStateExec.scala:183)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:611)
at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:179)
at 
org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:179)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExec.timeTakenMs(FlatMapGroupsWithStateExec.scala:374)
at 
org.apache.spark.sql.execution.streaming.FlatMapGroupsWithStateExecBase.$anonfun$processDataWithPartition$3(FlatMapGroupsWithStateExec.scala:183)
at 
org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:47)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:36)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage4.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 

[jira] [Updated] (SPARK-44479) Support Python UDTFs with empty schema

2023-07-27 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44479:
--
Fix Version/s: 3.5.0

> Support Python UDTFs with empty schema
> --
>
> Key: SPARK-44479
> URL: https://issues.apache.org/jira/browse/SPARK-44479
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.5.0
>
>
> Support UDTFs with empty schema, for example:
> {code:python}
> >>> class TestUDTF:
> ...   def eval(self):
> ... yield tuple()
> {code}
> Currently it fails with `useArrow=True`:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType())().collect()
> Traceback (most recent call last):
> ...
> ValueError: not enough values to unpack (expected 2, got 0)
> {code}
> whereas without Arrow:
> {code:python}
> >>> udtf(TestUDTF, returnType=StructType(), useArrow=False)().collect()
> [Row()]
> {code}
> Otherwise, we should raise an error without Arrow, too, to be consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43968) Improve error messages for Python UDTFs with wrong number of outputs

2023-07-27 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-43968.
---
Fix Version/s: 4.0.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 42157
https://github.com/apache/spark/pull/42157

> Improve error messages for Python UDTFs with wrong number of outputs
> 
>
> Key: SPARK-43968
> URL: https://issues.apache.org/jira/browse/SPARK-43968
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> Improve the error messages for Python UDTFs when the number of outputs 
> mismatches the number of outputs specified in the return type of the UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44575) Implement Error Translation

2023-07-27 Thread Yihong He (Jira)
Yihong He created SPARK-44575:
-

 Summary: Implement Error Translation
 Key: SPARK-44575
 URL: https://issues.apache.org/jira/browse/SPARK-44575
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.5.0
Reporter: Yihong He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44559) Improve error messages for Python UDTF arrow type casts

2023-07-27 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44559:
-
Summary: Improve error messages for Python UDTF arrow type casts  (was: 
Improve error messages for invalid Python UDTF arrow type casts)

> Improve error messages for Python UDTF arrow type casts
> ---
>
> Key: SPARK-44559
> URL: https://issues.apache.org/jira/browse/SPARK-44559
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> Currently, if a Python UDTF outputs a type that is incompatible with the 
> specified output schema, Spark will throw the following confusing error 
> message:
> {code:java}
>   File "pyarrow/array.pxi", line 1044, in pyarrow.lib.Array.from_pandas
>   File "pyarrow/array.pxi", line 316, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
>   File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
> pyarrow.lib.ArrowInvalid: Could not convert [1, 2] with type list: tried to 
> convert to int32{code}
> We should improve this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44574) Errors that moved into sq/api should also use Analysis

2023-07-27 Thread Rui Wang (Jira)
Rui Wang created SPARK-44574:


 Summary: Errors that moved into sq/api should also use Analysis
 Key: SPARK-44574
 URL: https://issues.apache.org/jira/browse/SPARK-44574
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, SQL
Affects Versions: 3.5.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44507) SCSC does not depend on AnalysisException

2023-07-27 Thread Rui Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved SPARK-44507.
--
Resolution: Won't Fix

> SCSC does not depend on AnalysisException
> -
>
> Key: SPARK-44507
> URL: https://issues.apache.org/jira/browse/SPARK-44507
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3

2023-07-27 Thread Siddaraju G C (Jira)
Siddaraju G C created SPARK-44573:
-

 Summary: Couldn't submit Spark application to Kubenetes in 
versions v1.27.3
 Key: SPARK-44573
 URL: https://issues.apache.org/jira/browse/SPARK-44573
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Spark Submit
Affects Versions: 3.4.1
Reporter: Siddaraju G C


Spark-submit ( cluster mode on Kubernetes ) results error 
*io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s 
cluster.

Steps followed:
 * using IBM cloud, created 3 Instances
 * 1st Instance act as master node and another two acts as worker nodes

 
{noformat}
root@vsi-spark-master:/opt# kubectl get nodes
NAME                 STATUS   ROLES                  AGE   VERSION
vsi-spark-master     Ready    control-plane,master   2d    v1.27.3+k3s1
vsi-spark-worker-1   Ready                     47h   v1.27.3+k3s1
vsi-spark-worker-2   Ready                     47h   
v1.27.3+k3s1{noformat}
 * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder 
 * Ran spark by using below command

 
{noformat}
root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master 
k8s://http://:6443 --conf 
spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode 
cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf 
spark.executor.instances=5 --conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark  --conf 
spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB 
local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat}
 * And getting below error message.

{noformat}
3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS.
23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified a 
krb5.conf file locally or via a ConfigMap. Make sure that you have the 
krb5.conf locally on the driver image.
23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" 
first. It should be yes.
Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
    at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129)
    at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122)
    at 
io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44)
    at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
    at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
    at 
org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153)
    at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250)
    at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786)
    at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244)
    at 
org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216)
    at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
    at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Connection reset
    at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535)
    at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
    at 
io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349)
    at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:711)
    at 
io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93)
    at 
io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
    ... 15 more
Caused by: java.net.SocketException: Connection reset
    at 

[jira] [Assigned] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain

2023-07-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44505:
---

Assignee: Martin Grund

> DataSource v2 Scans should not require planning the input partitions on 
> explain
> ---
>
> Key: SPARK-44505
> URL: https://issues.apache.org/jira/browse/SPARK-44505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>
> Right now, we will always call `planInputPartitions()` for a DSv2 
> implementation even if there is no spark job run but only explain.
> We should provide a way to avoid scanning all input partitions just to 
> determine if the input is columnar or not. The scan should provide an 
> override.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44505) DataSource v2 Scans should not require planning the input partitions on explain

2023-07-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44505.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 42099
[https://github.com/apache/spark/pull/42099]

> DataSource v2 Scans should not require planning the input partitions on 
> explain
> ---
>
> Key: SPARK-44505
> URL: https://issues.apache.org/jira/browse/SPARK-44505
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.5.0
>
>
> Right now, we will always call `planInputPartitions()` for a DSv2 
> implementation even if there is no spark job run but only explain.
> We should provide a way to avoid scanning all input partitions just to 
> determine if the input is columnar or not. The scan should provide an 
> override.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44567) Daily GA for Maven testing

2023-07-27 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748109#comment-17748109
 ] 

Yang Jie commented on SPARK-44567:
--

I've tried this [before|https://github.com/apache/spark/pull/41529], let me 
take a look at this again

> Daily GA for Maven testing
> --
>
> Key: SPARK-44567
> URL: https://issues.apache.org/jira/browse/SPARK-44567
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44566) Spark CI Improvement

2023-07-27 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748101#comment-17748101
 ] 

Ruifeng Zheng commented on SPARK-44566:
---

also cc [~panbingkun] [~dongjoon] [~yikunkero]

> Spark CI Improvement
> 
>
> Key: SPARK-44566
> URL: https://issues.apache.org/jira/browse/SPARK-44566
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> I have an offline discussion with [~gurwls223] and [~LuciferYang], and we 
> think that several points should be improved:
> # it should be tested with Maven
> # all supported Python Versions should be tested
> # clean up unused files ASAP, since the testing resource is quite limited
> To avoid increase the workload too much, we can add daily GA first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44566) Spark CI Improvement

2023-07-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44566:
--
Description: 
I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think 
that several points should be improved:

# it should be tested with Maven
# all supported Python Versions should be tested
# clean up unused files ASAP, since the testing resource is quite limited


To avoid increase the workload too much, we can add daily GA first.

  was:
I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think 
that several points should be improved:

# it should be tested with Maven
# all supported Python Versions should be tested


To avoid increase the workload too much, we can add daily GA first.


> Spark CI Improvement
> 
>
> Key: SPARK-44566
> URL: https://issues.apache.org/jira/browse/SPARK-44566
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> I have an offline discussion with [~gurwls223] and [~LuciferYang], and we 
> think that several points should be improved:
> # it should be tested with Maven
> # all supported Python Versions should be tested
> # clean up unused files ASAP, since the testing resource is quite limited
> To avoid increase the workload too much, we can add daily GA first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44572) Clean up unused files ASAP

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44572:
-

 Summary: Clean up unused files ASAP
 Key: SPARK-44572
 URL: https://issues.apache.org/jira/browse/SPARK-44572
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44566) Spark CI Improvement

2023-07-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-44566:
--
Description: 
I have an offline discussion with [~gurwls223] and [~LuciferYang], and we think 
that several points should be improved:

# it should be tested with Maven
# all supported Python Versions should be tested


To avoid increase the workload too much, we can add daily GA first.

> Spark CI Improvement
> 
>
> Key: SPARK-44566
> URL: https://issues.apache.org/jira/browse/SPARK-44566
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Project Infra, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> I have an offline discussion with [~gurwls223] and [~LuciferYang], and we 
> think that several points should be improved:
> # it should be tested with Maven
> # all supported Python Versions should be tested
> To avoid increase the workload too much, we can add daily GA first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44571) Eliminate the Join by combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44571:
---
Summary: Eliminate the Join by combine multiple Aggregates  (was: Eliminate 
the Join by Combine multiple Aggregates)

> Eliminate the Join by combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748084#comment-17748084
 ] 

jiaan.geng commented on SPARK-44571:


I'm working on.

> Eliminate the Join by Combine multiple Aggregates
> -
>
> Key: SPARK-44571
> URL: https://issues.apache.org/jira/browse/SPARK-44571
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Recently, I investigate the test case q28 which is belong to the TPC-DS 
> queries.
> The query contains multiple scalar subquery with aggregation and connected 
> with inner join.
> If we can merge the filters and aggregates, we can scan data source only once 
> and eliminate the join so as avoid shuffle. Obviously, this change will 
> improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44571) Eliminate the Join by Combine multiple Aggregates

2023-07-27 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44571:
--

 Summary: Eliminate the Join by Combine multiple Aggregates
 Key: SPARK-44571
 URL: https://issues.apache.org/jira/browse/SPARK-44571
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Recently, I investigate the test case q28 which is belong to the TPC-DS queries.

The query contains multiple scalar subquery with aggregation and connected with 
inner join.
If we can merge the filters and aggregates, we can scan data source only once 
and eliminate the join so as avoid shuffle. Obviously, this change will improve 
the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44570) Reuse spark build among pyspark-* modules

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44570:
-

 Summary: Reuse spark build among pyspark-* modules
 Key: SPARK-44570
 URL: https://issues.apache.org/jira/browse/SPARK-44570
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng


In every `PySpark-*` test modules, it needs to build the spark with sbt/maven 
and normally takes 20~30 minutes.

Maybe we could build it once, and then use it in all related test modules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44569) Daily GA for Python 3.11

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44569:
-

 Summary: Daily GA for Python 3.11
 Key: SPARK-44569
 URL: https://issues.apache.org/jira/browse/SPARK-44569
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44568) Daily GA for Python 3.10

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44568:
-

 Summary: Daily GA for Python 3.10
 Key: SPARK-44568
 URL: https://issues.apache.org/jira/browse/SPARK-44568
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44567) Daily GA for Maven testing

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44567:
-

 Summary: Daily GA for Maven testing
 Key: SPARK-44567
 URL: https://issues.apache.org/jira/browse/SPARK-44567
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44566) Spark CI Improvement

2023-07-27 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44566:
-

 Summary: Spark CI Improvement
 Key: SPARK-44566
 URL: https://issues.apache.org/jira/browse/SPARK-44566
 Project: Spark
  Issue Type: Umbrella
  Components: Build, Project Infra, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44454) HiveShim getTablesByType support fallback

2023-07-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-44454:
---

Assignee: dzcxzl

> HiveShim getTablesByType support fallback
> -
>
> Key: SPARK-44454
> URL: https://issues.apache.org/jira/browse/SPARK-44454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
>
> When we use a high version of Hive Client to communicate with a low version 
> of Hive meta store, we may encounter Invalid method name: 
> 'get_tables_by_type'.
>  
> {code:java}
> 23/07/17 12:45:24,391 [main] DEBUG SparkSqlParser: Parsing command: show views
> 23/07/17 12:45:24,489 [main] ERROR log: Got exception: 
> org.apache.thrift.TApplicationException Invalid method name: 
> 'get_tables_by_type'
> org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_tables_by_type'
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables_by_type(ThriftHiveMetastore.java:1433)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1418)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1411)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2344)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1427)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.sql.hive.client.Shim_v2_3.getTablesByType(HiveShim.scala:1408)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$listTablesByType$1(HiveClientImpl.scala:789)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.listTablesByType(HiveClientImpl.scala:785)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listViews$1(HiveExternalCatalog.scala:895)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listViews(HiveExternalCatalog.scala:893)
>     at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listViews(ExternalCatalogWithListener.scala:158)
>     at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listViews(SessionCatalog.scala:1040)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.$anonfun$run$5(views.scala:407)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.run(views.scala:407) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44454) HiveShim getTablesByType support fallback

2023-07-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-44454.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42033
[https://github.com/apache/spark/pull/42033]

> HiveShim getTablesByType support fallback
> -
>
> Key: SPARK-44454
> URL: https://issues.apache.org/jira/browse/SPARK-44454
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Minor
> Fix For: 4.0.0
>
>
> When we use a high version of Hive Client to communicate with a low version 
> of Hive meta store, we may encounter Invalid method name: 
> 'get_tables_by_type'.
>  
> {code:java}
> 23/07/17 12:45:24,391 [main] DEBUG SparkSqlParser: Parsing command: show views
> 23/07/17 12:45:24,489 [main] ERROR log: Got exception: 
> org.apache.thrift.TApplicationException Invalid method name: 
> 'get_tables_by_type'
> org.apache.thrift.TApplicationException: Invalid method name: 
> 'get_tables_by_type'
>     at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_tables_by_type(ThriftHiveMetastore.java:1433)
>     at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_tables_by_type(ThriftHiveMetastore.java:1418)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:1411)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2344)
>     at com.sun.proxy.$Proxy23.getTables(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByType(Hive.java:1427)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.spark.sql.hive.client.Shim_v2_3.getTablesByType(HiveShim.scala:1408)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$listTablesByType$1(HiveClientImpl.scala:789)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274)
>     at 
> org.apache.spark.sql.hive.client.HiveClientImpl.listTablesByType(HiveClientImpl.scala:785)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listViews$1(HiveExternalCatalog.scala:895)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
>     at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listViews(HiveExternalCatalog.scala:893)
>     at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listViews(ExternalCatalogWithListener.scala:158)
>     at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listViews(SessionCatalog.scala:1040)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.$anonfun$run$5(views.scala:407)
>     at scala.Option.getOrElse(Option.scala:189)
>     at 
> org.apache.spark.sql.execution.command.ShowViewsCommand.run(views.scala:407) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44536) Upgrade sbt to 1.9.3

2023-07-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44536:


Assignee: BingKun Pan

> Upgrade sbt to 1.9.3
> 
>
> Key: SPARK-44536
> URL: https://issues.apache.org/jira/browse/SPARK-44536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44536) Upgrade sbt to 1.9.3

2023-07-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44536.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42141
[https://github.com/apache/spark/pull/42141]

> Upgrade sbt to 1.9.3
> 
>
> Key: SPARK-44536
> URL: https://issues.apache.org/jira/browse/SPARK-44536
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44482) Connect server should can specify the bind address

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44482:


Assignee: BingKun Pan

> Connect server should can specify the bind address
> --
>
> Key: SPARK-44482
> URL: https://issues.apache.org/jira/browse/SPARK-44482
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44482) Connect server should can specify the bind address

2023-07-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44482.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42073
[https://github.com/apache/spark/pull/42073]

> Connect server should can specify the bind address
> --
>
> Key: SPARK-44482
> URL: https://issues.apache.org/jira/browse/SPARK-44482
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44513) Upgrade snappy-java to 1.1.10.3

2023-07-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44513:

Fix Version/s: 3.4.2

> Upgrade snappy-java to 1.1.10.3
> ---
>
> Key: SPARK-44513
> URL: https://issues.apache.org/jira/browse/SPARK-44513
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Trivial
> Fix For: 3.4.2, 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44538) Remove ToJsonUtil

2023-07-27 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747779#comment-17747779
 ] 

Nikita Awasthi commented on SPARK-44538:


User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/42164

> Remove ToJsonUtil
> -
>
> Key: SPARK-44538
> URL: https://issues.apache.org/jira/browse/SPARK-44538
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org