date:20231017

[spark] branch master updated: [SPARK-44649][SQL] Runtime Filter supports passing equivalent creation side expressions

2023-10-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 103de914a5f [SPARK-44649][SQL] Runtime Filter supports passing 
equivalent creation side expressions
103de914a5f is described below

commit 103de914a5f96fccbe722663ee69c8ee7d9c8135
Author: Jiaan Geng 
AuthorDate: Wed Oct 18 14:55:51 2023 +0800

[SPARK-44649][SQL] Runtime Filter supports passing equivalent creation side 
expressions

### What changes were proposed in this pull request?
Currently, Spark runtime filter supports multi level shuffle join side as 
filter creation side. Please see: https://github.com/apache/spark/pull/39170. 
Although this feature adds the adaptive scene and improves the performance, 
there are still need to support other case.

**Optimization of Expression Transitivity on the Creation Side of Spark 
Runtime Filter**

**Principle**
Association expressions are transitive in some Joins, such as:
`Tab1.col1A = Tab2.col2B` and `Tab2.col2B = Tab3.col3C`
Actually, it can be inferred that `Tab1.col1A = Tab3.col3C`.

**Optimization points**
Currently, the runtime filter's creation side expression only uses directly 
associated keys. If the transitivity of association conditions is utilized, 
runtime filters can be injected into many scenarios, such as:

```
SELECT *
FROM (
  SELECT *
  FROM tab1
JOIN tab2 ON tab1.c1 = tab2.c2
  WHERE tab2.a2 = 5
) AS a
  JOIN tab3 ON tab3.c3 = a.c1
```

The `tab3.c3` here is only associated with `tab1.c1` and not with 
`tab2.c2`. Although there is selective filtering on tab2 (`tab2.a2 = 5`), Spark 
is currently unable to inject a Runtime Filter.
As long as transitivity is considered, we can know that `tab3.c3` and 
`tab2.c2` are related, so we can still inject Runtime Filter and improve 
performance.

For the current implementation, Spark only inject runtime filter into tab1 
with bloom filter based on `bf2.a2 = 5`.
Because there is no the join between tab3 and tab2, so Spark can't inject 
runtime filter into tab3 with the same bloom filter.
But the above SQL have the join condition `tab3.c3 = a.c1(tab1.c1)` between 
tab3 and tab2, and also have the join condition `tab1.c1 = tab2.c2`. We can 
rely on the transitivity of the join condition to get the virtual join 
condition `tab3.c3 = tab2.c2`, then we can inject the bloom filter based on 
`bf2.a2 = 5` into tab3.

### Why are the changes needed?
Enhance the Spark runtime filter and improve performance.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update the inner implementation.

### How was this patch tested?
New tests.
Micro benchmark for q75 in TPC-DS.
**2TB TPC-DS**
| TPC-DS Query   | Before(Seconds)  | After(Seconds)  | Speedup(Percent)  |
|    |   |   |   |
| q75 | 129.664 | 81.562 | 58.98% |

Closes #42317 from beliefer/SPARK-44649.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
---
 .../catalyst/optimizer/InjectRuntimeFilter.scala   | 64 +++---
 .../spark/sql/InjectRuntimeFilterSuite.scala   | 38 +++--
 2 files changed, 75 insertions(+), 27 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
index 8737082e571..30526bd8106 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
@@ -125,14 +125,14 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with 
PredicateHelper with J
*/
   private def extractSelectiveFilterOverScan(
   plan: LogicalPlan,
-  filterCreationSideKey: Expression): Option[LogicalPlan] = {
-@tailrec
+  filterCreationSideKey: Expression): Option[(Expression, LogicalPlan)] = {
 def extract(
 p: LogicalPlan,
 predicateReference: AttributeSet,
 hasHitFilter: Boolean,
 hasHitSelectiveFilter: Boolean,
-currentPlan: LogicalPlan): Option[LogicalPlan] = p match {
+currentPlan: LogicalPlan,
+targetKey: Expression): Option[(Expression, LogicalPlan)] = p match {
   case Project(projectList, child) if hasHitFilter =>
 // We need to make sure all expressions referenced by filter 
predicates are simple
 // expressions.
@@ -143,41 +143,62 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with 
PredicateHelper with J
 referencedExprs.map(_.references).foldLeft(AttributeSet.empty)(_ 
++ _),
 hasHitFilter,
 hasHitSel

[spark] branch master updated: [SPARK-45558][SS] Introduce a metadata file for streaming stateful operator

2023-10-17 Thread kabhwan

This is an automated email from the ASF dual-hosted git repository.

kabhwan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3005dc89084 [SPARK-45558][SS] Introduce a metadata file for streaming 
stateful operator
3005dc89084 is described below

commit 3005dc8908486f63a3e471cd05189881b833daf1
Author: Chaoqin Li 
AuthorDate: Wed Oct 18 15:49:43 2023 +0900

[SPARK-45558][SS] Introduce a metadata file for streaming stateful operator

### What changes were proposed in this pull request?
Introduce a metadata file for streaming stateful operator, write metadata 
for stateful operator during planning.
The information to store in the metadata file:
- operator name (no need to be unique among stateful operators in the query)
- state store name
- numColumnsPrefixKey: > 0 if prefix scan is enabled, 0 otherwise
The body of metadata file will be in json format. The metadata file will be 
versioned just as other streaming metadata file to be future proof.

### Why are the changes needed?
The metadata file will improve expose more information about the state 
store, improves debugability and facilitate the development of state related 
feature such as reading and writing state and state repartitioning.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add unit test and integration tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43393 from chaoqin-li1123/state_metadata.

Authored-by: Chaoqin Li 
Signed-off-by: Jungtaek Lim 
---
 .../spark/sql/execution/QueryExecution.scala   |   2 +-
 .../execution/streaming/IncrementalExecution.scala |  22 ++-
 .../execution/streaming/MicroBatchExecution.scala  |   4 +-
 .../streaming/StreamingSymmetricHashJoinExec.scala |  10 ++
 .../streaming/continuous/ContinuousExecution.scala |   3 +-
 .../streaming/state/OperatorStateMetadata.scala| 136 
 .../execution/streaming/statefulOperators.scala|  21 ++-
 .../state/OperatorStateMetadataSuite.scala | 181 +
 8 files changed, 374 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
index b3c97a83970..3d35300773b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
@@ -272,7 +272,7 @@ class QueryExecution(
   new IncrementalExecution(
 sparkSession, logical, OutputMode.Append(), "",
 UUID.randomUUID, UUID.randomUUID, 0, None, OffsetSeqMetadata(0, 0),
-WatermarkPropagator.noop())
+WatermarkPropagator.noop(), false)
 } else {
   this
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
index ebdb9caf09e..a67097f6e96 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala
@@ -20,6 +20,8 @@ package org.apache.spark.sql.execution.streaming
 import java.util.UUID
 import java.util.concurrent.atomic.AtomicInteger
 
+import org.apache.hadoop.fs.Path
+
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{SparkSession, Strategy}
 import org.apache.spark.sql.catalyst.QueryPlanningTracker
@@ -32,6 +34,7 @@ import 
org.apache.spark.sql.execution.aggregate.{HashAggregateExec, MergingSessi
 import org.apache.spark.sql.execution.exchange.ShuffleExchangeLike
 import org.apache.spark.sql.execution.python.FlatMapGroupsInPandasWithStateExec
 import 
org.apache.spark.sql.execution.streaming.sources.WriteToMicroBatchDataSourceV1
+import 
org.apache.spark.sql.execution.streaming.state.OperatorStateMetadataWriter
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.streaming.OutputMode
 import org.apache.spark.util.Utils
@@ -50,7 +53,8 @@ class IncrementalExecution(
 val currentBatchId: Long,
 val prevOffsetSeqMetadata: Option[OffsetSeqMetadata],
 val offsetSeqMetadata: OffsetSeqMetadata,
-val watermarkPropagator: WatermarkPropagator)
+val watermarkPropagator: WatermarkPropagator,
+val isFirstBatch: Boolean)
   extends QueryExecution(sparkSession, logicalPlan) with Logging {
 
   // Modified planner with stateful operations.
@@ -71,6 +75,8 @@ class IncrementalExecution(
   StreamingGlobalLimitStrategy(outputMode) :: Nil
   }
 
+  private lazy val hadoopConf = sparkSession.sessionState.newHadoopConf()
+
   private[sql] val numStateStores

[spark] branch master updated (b1e57a2b359 -> d7d38fbc184)

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b1e57a2b359 [SPARK-45534][CORE] Use java.lang.ref.Cleaner instead of 
finalize for RemoteBlockPushResolver
 add d7d38fbc184 [SPARK-45587][INFRA] Skip UNIDOC and MIMA in `build` 
GitHub Action job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45534][CORE] Use java.lang.ref.Cleaner instead of finalize for RemoteBlockPushResolver

2023-10-17 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b1e57a2b359 [SPARK-45534][CORE] Use java.lang.ref.Cleaner instead of 
finalize for RemoteBlockPushResolver
b1e57a2b359 is described below

commit b1e57a2b359d7d9fbf07adfba10db97f38b99bde
Author: zhaomin 
AuthorDate: Wed Oct 18 01:20:08 2023 -0500

[SPARK-45534][CORE] Use java.lang.ref.Cleaner instead of finalize for 
RemoteBlockPushResolver

### What changes were proposed in this pull request?

use java.lang.ref.Cleaner instead of finalize() for RemoteBlockPushResolver

### Why are the changes needed?

The finalize() method has been marked as deprecated since Java 9 and will 
be removed in the future, java.lang.ref.Cleaner is the more recommended 
solution.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43371 from zhaomin1423/45315.

Authored-by: zhaomin 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../network/shuffle/RemoteBlockPushResolver.java   | 101 +
 .../network/shuffle/ShuffleTestAccessor.scala  |   2 +-
 2 files changed, 64 insertions(+), 39 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
index a915d0eccb0..14fefebe089 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
@@ -21,6 +21,7 @@ import java.io.DataOutputStream;
 import java.io.File;
 import java.io.FileOutputStream;
 import java.io.IOException;
+import java.lang.ref.Cleaner;
 import java.nio.ByteBuffer;
 import java.nio.channels.FileChannel;
 import java.nio.charset.StandardCharsets;
@@ -94,6 +95,7 @@ import org.apache.spark.network.util.TransportConf;
  */
 public class RemoteBlockPushResolver implements MergedShuffleFileManager {
 
+  private static final Cleaner CLEANER = Cleaner.create();
   private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
 
   public static final String MERGED_SHUFFLE_FILE_NAME_PREFIX = "shuffleMerged";
@@ -481,7 +483,7 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 appShuffleInfo.shuffles.forEach((shuffleId, shuffleInfo) -> 
shuffleInfo.shuffleMergePartitions
   .forEach((shuffleMergeId, partitionInfo) -> {
 synchronized (partitionInfo) {
-  partitionInfo.closeAllFilesAndDeleteIfNeeded(false);
+  partitionInfo.cleanable.clean();
 }
   }));
 if (cleanupLocalDirs) {
@@ -537,7 +539,8 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 partitions
   .forEach((partitionId, partitionInfo) -> {
 synchronized (partitionInfo) {
-  partitionInfo.closeAllFilesAndDeleteIfNeeded(true);
+  partitionInfo.cleanable.clean();
+  partitionInfo.deleteAllFiles();
 }
   });
   }
@@ -822,7 +825,7 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 msg.appAttemptId, msg.shuffleId, msg.shuffleMergeId, 
partition.reduceId,
 ioe.getMessage());
   } finally {
-partition.closeAllFilesAndDeleteIfNeeded(false);
+partition.cleanable.clean();
   }
 }
   }
@@ -1720,6 +1723,7 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
 // The meta file for a particular merged shuffle contains all the map 
indices that belong to
 // every chunk. The entry per chunk is a serialized bitmap.
 private final MergeShuffleFile metaFile;
+private final Cleaner.Cleanable cleanable;
 // Location offset of the last successfully merged block for this shuffle 
partition
 private long dataFilePos;
 // Track the map index whose block is being merged for this shuffle 
partition
@@ -1756,6 +1760,8 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
   this.dataFilePos = 0;
   this.mapTracker = new RoaringBitmap();
   this.chunkTracker = new RoaringBitmap();
+  this.cleanable = CLEANER.register(this, new ResourceCleaner(dataChannel, 
indexFile,
+metaFile, appAttemptShuffleMergeId, reduceId));
 }
 
 public long getDataFilePos() {
@@ -1864,36 +1870,13 @@ public class RemoteBlockPushResolver implements 
MergedShuffleFileManager {
   metaFile.getChannel().truncate(metaFile.getPos());

[spark] branch master updated: [SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error

2023-10-17 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 11e7ea4f11d [SPARK-45035][SQL] Fix 
ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will report error
11e7ea4f11d is described below

commit 11e7ea4f11df71e2942322b01fcaab57dac20c83
Author: Jia Fan 
AuthorDate: Wed Oct 18 11:06:43 2023 +0500

[SPARK-45035][SQL] Fix ignoreCorruptFiles/ignoreMissingFiles with multiline 
CSV/JSON will report error

### What changes were proposed in this pull request?
Fix ignoreCorruptFiles/ignoreMissingFiles with multiline CSV/JSON will 
report error, it would be like:
```log
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 4940.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
4940.0 (TID 4031) (10.68.177.106 executor 0): 
com.univocity.parsers.common.TextParsingException: 
java.lang.IllegalStateException - Error reading from input
Parser Configuration: CsvParserSettings:
Auto configuration enabled=true
Auto-closing enabled=true
Autodetect column delimiter=false
Autodetect quotes=false
Column reordering enabled=true
Delimiters for detection=null
Empty value=
Escape unquoted values=false
Header extraction enabled=null
Headers=null
Ignore leading whitespaces=false
Ignore leading whitespaces in quotes=false
Ignore trailing whitespaces=false
Ignore trailing whitespaces in quotes=false
Input buffer size=1048576
Input reading on separate thread=false
Keep escape sequences=false
Keep quotes=false
Length of content displayed on error=1000
Line separator detection enabled=true
Maximum number of characters per column=-1
Maximum number of columns=20480
Normalize escaped line separators=true
Null value=
Number of records to read=all
Processor=none
Restricting data in exceptions=false
RowProcessor error handler=null
Selected fields=none
Skip bits as whitespace=true
Skip empty lines=true
Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
CsvFormat:
Comment character=#
Field delimiter=,
Line separator (normalized)=\n
Line separator sequence=\n
Quote character="
Quote escape character=\
Quote escape escape character=null
Internal state when error was thrown: line=0, column=0, record=0
at 
com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:402)
at 
com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:277)
at 
com.univocity.parsers.common.AbstractParser.beginParsing(AbstractParser.java:843)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser$$anon$1.(UnivocityParser.scala:463)
at 
org.apache.spark.sql.catalyst.csv.UnivocityParser$.convertStream(UnivocityParser.scala:46...
```
Because multiline CSV/JSON use `BinaryFileRDD` not `FileScanRDD`. Unlike 
`FileScanRDD`, when met corrupt files will check `ignoreCorruptFiles` config to 
avoid report IOException, `BinaryFileRDD` will not report error because it 
return normal `PortableDataStream`. So we should catch it when infer schema in 
lambda function. Also do same thing for `ignoreMissingFiles`.

### Why are the changes needed?
Fix the bug when use mulitline mode with 
ignoreCorruptFiles/ignoreMissingFiles config.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #42979 from Hisoka-X/SPARK-45035_csv_multi_line.

Authored-by: Jia Fan 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/json/JsonInferSchema.scala  | 18 +--
 .../execution/datasources/csv/CSVDataSource.scala  | 28 ---
 .../datasources/CommonFileDataSourceSuite.scala| 28 +++
 .../sql/execution/datasources/csv/CSVSuite.scala   | 58 +-
 .../sql/execution/datasources/json/JsonSuite.scala | 46 -
 5 files changed, 142 insertions(+), 36 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 4123c5290b6..4d04b34876c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catal

[spark] branch master updated: [SPARK-45576][CORE][FOLLOWUP] Remove unused imports to fix Java linter errors

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fbf8b7be090 [SPARK-45576][CORE][FOLLOWUP] Remove unused imports to fix 
Java linter errors
fbf8b7be090 is described below

commit fbf8b7be090bf20f7a6e81c62ba5fe3f9f9a801b
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 22:23:42 2023 -0700

[SPARK-45576][CORE][FOLLOWUP] Remove unused imports to fix Java linter 
errors

### What changes were proposed in this pull request?

This PR aims to remove the unused imports.

### Why are the changes needed?

To recover `master` branch by fixing Java linter errors.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs. Or, manually checked like the following.
```
$ dev/lint-java
Using `mvn` from path: 
/Users/dongjoon/APACHE/spark-merge/build/apache-maven-3.9.5/bin/mvn
Using SPARK_LOCAL_IP=localhost
Checkstyle checks passed.
```

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43423 from dongjoon-hyun/SPARK-45576.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java| 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
 
b/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
index dbd71f987d4..0526fcb11be 100644
--- 
a/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
+++ 
b/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
@@ -28,8 +28,6 @@ import java.util.HashMap;
 import java.util.Map;
 
 import org.junit.jupiter.api.Test;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 
 import static org.junit.jupiter.api.Assertions.*;
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45500][CORE][WEBUI][FOLLOWUP] Show `RELAUNCHING` drivers too

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73f9f5296e3 [SPARK-45500][CORE][WEBUI][FOLLOWUP] Show `RELAUNCHING` 
drivers too
73f9f5296e3 is described below

commit 73f9f5296e36541db78ab10c4c01a56fbc17cca8
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 21:56:30 2023 -0700

[SPARK-45500][CORE][WEBUI][FOLLOWUP] Show `RELAUNCHING` drivers too

### What changes were proposed in this pull request?

This is a follow-up of #43328 to show `RELAUNCHING` drivers too.

### Why are the changes needed?

When we submit with `--supervise` option, the abnormally-completed driver 
is in `RELAUNCHING` status and newly launched driver is in `SUBMITTED` status.


https://github.com/apache/spark/blob/0cb4a84f6ab0c1bd101e6bc72be82987bbc02e9b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala#L995

![Screenshot 2023-10-17 at 8 01 01 
PM](https://github.com/apache/spark/assets/9700541/22c614cb-3f5c-44a0-b5f5-8edf4a20c580)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a new UI item.

### How was this patch tested?

Manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43418 from dongjoon-hyun/SPARK-45500-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
index 5c1887be5b8..48c0c9601c1 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/ui/MasterPage.scala
@@ -156,7 +156,8 @@ private[ui] class MasterPage(parent: MasterWebUI) extends 
WebUIPage("") {
 {state.completedDrivers.length} Completed
 ({state.completedDrivers.count(_.state == DriverState.KILLED)} 
Killed,
 {state.completedDrivers.count(_.state == DriverState.FAILED)} 
Failed,
-{state.completedDrivers.count(_.state == DriverState.ERROR)} 
Error)
+{state.completedDrivers.count(_.state == DriverState.ERROR)} 
Error,
+{state.completedDrivers.count(_.state == 
DriverState.RELAUNCHING)} Relaunching)
   
   Status: {state.status}
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4901548d4c3 -> f4bd99da12f)

2023-10-17 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4901548d4c3 [SPARK-45576][CORE] Remove unnecessary debug logs in 
ReloadingX509TrustManagerSuite
 add f4bd99da12f [SPARK-45009][SQL][FOLLOW UP] Turn off decorrelation in 
join conditions for AQE InSubquery test

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala  | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45576][CORE] Remove unnecessary debug logs in ReloadingX509TrustManagerSuite

2023-10-17 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4901548d4c3 [SPARK-45576][CORE] Remove unnecessary debug logs in 
ReloadingX509TrustManagerSuite
4901548d4c3 is described below

commit 4901548d4c36ac5988bcc04501057de40712e66d
Author: Hasnain Lakhani 
AuthorDate: Tue Oct 17 23:48:19 2023 -0500

[SPARK-45576][CORE] Remove unnecessary debug logs in 
ReloadingX509TrustManagerSuite

### What changes were proposed in this pull request?

Remove debug logs that were left in by accident.

### Why are the changes needed?

These were not intended to be committed

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

CI

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43404 from hasnain-db/remove-logs.

Authored-by: Hasnain Lakhani 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java| 6 --
 1 file changed, 6 deletions(-)

diff --git 
a/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
 
b/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
index 7e2cc38e70b..dbd71f987d4 100644
--- 
a/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
+++ 
b/common/network-common/src/test/java/org/apache/spark/network/ssl/ReloadingX509TrustManagerSuite.java
@@ -37,8 +37,6 @@ import static org.apache.spark.network.ssl.SslSampleConfigs.*;
 
 public class ReloadingX509TrustManagerSuite {
 
-  private final Logger logger = 
LoggerFactory.getLogger(ReloadingX509TrustManagerSuite.class);
-
   /**
* Waits until reload count hits the requested value, sleeping 100ms at a 
time.
* If the maximum number of attempts is hit, throws a RuntimeException
@@ -280,8 +278,6 @@ public class ReloadingX509TrustManagerSuite {
 new ReloadingX509TrustManager("jks", trustStoreSymlink, 
"password", 1);
 assertEquals(1, tm.getReloadInterval());
 assertEquals(0, tm.reloadCount);
-logger.info("TRUST STORE 1 IS" + trustStore1);
-logger.info("TRUST STORE 2 IS " + trustStore2);
 try {
   tm.init();
   assertEquals(1, tm.getAcceptedIssuers().length);
@@ -289,10 +285,8 @@ public class ReloadingX509TrustManagerSuite {
   assertEquals(0, tm.reloadCount);
 
   // Repoint to trustStore2, which has another cert
-  logger.info("REPOINTING SYMLINK!!!");
   trustStoreSymlink.delete();
   Files.createSymbolicLink(trustStoreSymlink.toPath(), 
trustStore2.toPath());
-  logger.info("REPOINTED!!!");
 
   // Wait up to 5s until we reload
   waitForReloadCount(tm, 1, 50);


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45546][BUILD][FOLLOW-UP] Remove snapshot repo mistakenly added

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74dc5a3d8c0 [SPARK-45546][BUILD][FOLLOW-UP] Remove snapshot repo 
mistakenly added
74dc5a3d8c0 is described below

commit 74dc5a3d8c0ffe425dfadec44e41615a8f3f8367
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 17 20:11:46 2023 -0700

[SPARK-45546][BUILD][FOLLOW-UP] Remove snapshot repo mistakenly added

### What changes were proposed in this pull request?

This PR removes snapshot repo mistakenly added in `pom.xml`

### Why are the changes needed?

To clean up.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43415 from HyukjinKwon/SPARK-45546-followup.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 pom.xml | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/pom.xml b/pom.xml
index ade6537c2a1..824ae49f6da 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3840,11 +3840,4 @@
   
 
   
-  
-
-  internal.snapshot
-  Internal Snapshot Repository
-  http://localhost:8081/repository/maven-snapshots/
-
-  
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][INFRA] Skip if JIRA ID is an empty string

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2189f9bf289 [MINOR][INFRA] Skip if JIRA ID is an empty string
2189f9bf289 is described below

commit 2189f9bf2894dfb9a6ae1e74b0863486aaf49621
Author: Hyukjin Kwon 
AuthorDate: Tue Oct 17 20:10:10 2023 -0700

[MINOR][INFRA] Skip if JIRA ID is an empty string

### What changes were proposed in this pull request?

This PR skips when an empty string is provided as a JIRA ID.

### Why are the changes needed?

When you merge a minor PR that does not have a JIRA ID, and you say `y` to 
update associated JIRA, you face an error as below:

```
...
Would you like to update an associated JIRA? (y/n): y
Enter a JIRA id []:
ASF JIRA could not find
JiraError HTTP 405 url: https://issues.apache.org/jira/rest/api/2/issue/

response headers = {'Date': 'Wed, 18 Oct 2023 02:44:24 GMT', 
'Server': 'Apache', 'X-AREQUESTID': '164x103019855x17', 'X-ASESSIONID': 
'1bxvboa', 'Referrer-Policy': 'strict-origin-when-cross-origin', 
'X-XSS-Protection': '1; mode=block', 'X-Content-Type-Options': 'nosniff', 
'X-Frame-Options': 'SAMEORIGIN', 'Content-Security-Policy': "frame-ancestors 
'self'", 'Strict-Transport-Security': 'max-age=31536000', 
'X-Seraph-LoginReason': 'OK', 'X-AUSERNAME': 'gurwls223', 'Allow': 'POST,O [...]
response text =
...
```

After this PR, it doesn't fail but shows a warning `JIRA ID not found, 
skipping`.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

I am going to test this change against this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43417 from HyukjinKwon/minor-merge-script.

Authored-by: Hyukjin Kwon 
Signed-off-by: Dongjoon Hyun 
---
 dev/merge_spark_pr.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/dev/merge_spark_pr.py b/dev/merge_spark_pr.py
index 4021999f19b..41ea921bb86 100755
--- a/dev/merge_spark_pr.py
+++ b/dev/merge_spark_pr.py
@@ -246,6 +246,9 @@ def resolve_jira_issue(merge_branches, comment, 
default_jira_id=""):
 jira_id = input("Enter a JIRA id [%s]: " % default_jira_id)
 if jira_id == "":
 jira_id = default_jira_id
+if jira_id == "":
+print("JIRA ID not found, skipping.")
+return
 
 try:
 issue = asf_jira.issue(jira_id)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3ef18e2d00f -> 0cb4a84f6ab)

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3ef18e2d00f [SPARK-45546][BUILD][INFRA] Make `publish-snapshot` 
support `package` first then `deploy`
 add 0cb4a84f6ab [MINOR][DOCS] Update the docs for 
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning configuration

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [MINOR][DOCS] Update the docs for spark.sql.optimizer.canChangeCachedPlanOutputPartitioning configuration

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new ed2a4cc6033 [MINOR][DOCS] Update the docs for 
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning configuration
ed2a4cc6033 is described below

commit ed2a4cc6033ac35faa7b19eb236a4c953543d519
Author: Hyukjin Kwon 
AuthorDate: Wed Oct 18 11:43:59 2023 +0900

[MINOR][DOCS] Update the docs for 
spark.sql.optimizer.canChangeCachedPlanOutputPartitioning configuration

### What changes were proposed in this pull request?

This PR fixes the documentation for 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` configuration by 
saying this is enabled by default. This is a followup of 
https://github.com/apache/spark/pull/40390 (but did not use a JIRA due to fixed 
versions properties in the JIRA).

### Why are the changes needed?

To mention that this is enabled, to the end users.

### Does this PR introduce _any_ user-facing change?

No, it's an internal conf, not documented.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43411 from HyukjinKwon/fix-docs.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 0cb4a84f6ab0c1bd101e6bc72be82987bbc02e9b)
Signed-off-by: Hyukjin Kwon 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 427d0480190..4ea0cd5bcc1 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -1529,7 +1529,7 @@ object SQLConf {
   .doc("Whether to forcibly enable some optimization rules that can change 
the output " +
 "partitioning of a cached query when executing it for caching. If it 
is set to true, " +
 "queries may need an extra shuffle to read the cached data. This 
configuration is " +
-"disabled by default. Currently, the optimization rules enabled by 
this configuration " +
+"enabled by default. The optimization rules enabled by this 
configuration " +
 s"are ${ADAPTIVE_EXECUTION_ENABLED.key} and 
${AUTO_BUCKETED_SCAN_ENABLED.key}.")
   .version("3.2.0")
   .booleanConf


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45546][BUILD][INFRA] Make `publish-snapshot` support `package` first then `deploy`

2023-10-17 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ef18e2d00f [SPARK-45546][BUILD][INFRA] Make `publish-snapshot` 
support `package` first then `deploy`
3ef18e2d00f is described below

commit 3ef18e2d00f386196292f0c768816626bc903d47
Author: yangjie01 
AuthorDate: Wed Oct 18 10:15:27 2023 +0800

[SPARK-45546][BUILD][INFRA] Make `publish-snapshot` support `package` first 
then `deploy`

### What changes were proposed in this pull request?
This pr adds an environment variable `PACKAGE_BEFORE_DEPLOY` to the 
`publish-snapshot` process. When `PACKAGE_BEFORE_DEPLOY` is true, the publish 
process will be split into two steps: the first step is to package with mvn 
package, and the second step is to deploy the packaged jar.

At the same time, this PR sets `PACKAGE_BEFORE_DEPLOY` to true in the 
`publish_snapshot.yml` configuration.

### Why are the changes needed?
Make the `publish-snapshot` task in GitHub Action to be divided into two 
steps, which can alleviate the resource pressure brought by direct deploy.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43378 from LuciferYang/no-doc-deploy.

Lead-authored-by: yangjie01 
Co-authored-by: YangJie 
Signed-off-by: yangjie01 
---
 .github/workflows/publish_snapshot.yml |  4 
 dev/create-release/release-build.sh| 14 --
 pom.xml|  7 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/.github/workflows/publish_snapshot.yml 
b/.github/workflows/publish_snapshot.yml
index 7ed836f016b..476d41d0cf1 100644
--- a/.github/workflows/publish_snapshot.yml
+++ b/.github/workflows/publish_snapshot.yml
@@ -66,4 +66,8 @@ jobs:
 GPG_KEY: "not_used"
 GPG_PASSPHRASE: "not_used"
 GIT_REF: ${{ matrix.branch }}
+# SPARK-45546 adds this environment variable to split the publish 
snapshot process into two steps:
+# first package, then deploy. This is intended to reduce the resource 
pressure of deploy.
+# When PACKAGE_BEFORE_DEPLOY is not set to true, it will revert to the 
one-step deploy method.
+PACKAGE_BEFORE_DEPLOY: true
   run: ./dev/create-release/release-build.sh publish-snapshot
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index f3571c4e48c..3776c64e31e 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -432,14 +432,24 @@ if [[ "$1" == "publish-snapshot" ]]; then
   echo "" >> $tmp_settings
 
   if [[ $PUBLISH_SCALA_2_12 = 1 ]]; then
-$MVN --settings $tmp_settings -DskipTests $SCALA_2_12_PROFILES 
$PUBLISH_PROFILES clean deploy
+if [ "$PACKAGE_BEFORE_DEPLOY" = "true" ]; then
+  $MVN -DskipTests $SCALA_2_12_PROFILES $PUBLISH_PROFILES clean package
+  $MVN --settings $tmp_settings -DskipTests $SCALA_2_12_PROFILES 
$PUBLISH_PROFILES deploy
+else
+  $MVN --settings $tmp_settings -DskipTests $SCALA_2_12_PROFILES 
$PUBLISH_PROFILES clean deploy
+fi
   fi
 
   if [[ $PUBLISH_SCALA_2_13 = 1 ]]; then
 if [[ $SPARK_VERSION < "4.0" ]]; then
   ./dev/change-scala-version.sh 2.13
 fi
-$MVN --settings $tmp_settings -DskipTests $SCALA_2_13_PROFILES 
$PUBLISH_PROFILES clean deploy
+if [ "$PACKAGE_BEFORE_DEPLOY" = "true" ]; then
+  $MVN -DskipTests $SCALA_2_13_PROFILES $PUBLISH_PROFILES clean package
+  $MVN --settings $tmp_settings -DskipTests $SCALA_2_13_PROFILES 
$PUBLISH_PROFILES deploy
+else
+  $MVN --settings $tmp_settings -DskipTests $SCALA_2_13_PROFILES 
$PUBLISH_PROFILES clean deploy
+fi
   fi
 
   rm $tmp_settings
diff --git a/pom.xml b/pom.xml
index 824ae49f6da..ade6537c2a1 100644
--- a/pom.xml
+++ b/pom.xml
@@ -3840,4 +3840,11 @@
   
 
   
+  
+
+  internal.snapshot
+  Internal Snapshot Repository
+  http://localhost:8081/repository/maven-snapshots/
+
+  
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45578][CORE] Remove `InaccessibleObjectException` usage by using `trySetAccessible`

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3a3b8c14b8c [SPARK-45578][CORE] Remove `InaccessibleObjectException` 
usage by using `trySetAccessible`
3a3b8c14b8c is described below

commit 3a3b8c14b8c0e056554f11a37e31d8add3087e28
Author: Dongjoon Hyun 
AuthorDate: Tue Oct 17 18:36:05 2023 -0700

[SPARK-45578][CORE] Remove `InaccessibleObjectException` usage by using 
`trySetAccessible`

### What changes were proposed in this pull request?

This PR aims to remove `InaccessibleObjectException` usage by using 
`trySetAccessible` instead of `setAccessible`.

### Why are the changes needed?

`trySetAccessible` is available on Java 9+

- 
https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/reflect/AccessibleObject.html#trySetAccessible()

We can simplify the code for Apache Spark 4.0.0 because we support only 
Java 17 and 21 .

**BEFORE**
```
$ git grep InaccessibleObjectException
common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java:
if ("InaccessibleObjectException".equals(re.getClass().getSimpleName())) {
core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:
// Java 9+ can throw InaccessibleObjectException but the class is Java 9+-only
core/src/main/scala/org/apache/spark/util/SizeEstimator.scala:  
  if re.getClass.getSimpleName == "InaccessibleObjectException" =>
```

**AFTER**
```
$ git grep InaccessibleObjectException
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43406 from dongjoon-hyun/SPARK-45578.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../main/java/org/apache/spark/unsafe/Platform.java| 18 +-
 .../scala/org/apache/spark/util/SizeEstimator.scala| 10 +++---
 2 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java 
b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
index e02346c4773..dfa5734ccbc 100644
--- a/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
+++ b/common/unsafe/src/main/java/org/apache/spark/unsafe/Platform.java
@@ -72,19 +72,11 @@ public final class Platform {
 cls.getDeclaredConstructor(Long.TYPE, Integer.TYPE) :
 cls.getDeclaredConstructor(Long.TYPE, Long.TYPE);
   Field cleanerField = cls.getDeclaredField("cleaner");
-  try {
-constructor.setAccessible(true);
-cleanerField.setAccessible(true);
-  } catch (RuntimeException re) {
-// This is a Java 9+ exception, so needs to be handled without 
importing it
-if 
("InaccessibleObjectException".equals(re.getClass().getSimpleName())) {
-  // Continue, but the constructor/field are not available
-  // See comment below for more context
-  constructor = null;
-  cleanerField = null;
-} else {
-  throw re;
-}
+  if (!constructor.trySetAccessible()) {
+constructor = null;
+  }
+  if (!cleanerField.trySetAccessible()) {
+cleanerField = null;
   }
   // Have to set these values no matter what:
   DBB_CONSTRUCTOR = constructor;
diff --git a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala 
b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
index 39e071616f2..10ff80143b7 100644
--- a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
+++ b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
@@ -333,19 +333,15 @@ object SizeEstimator extends Logging {
 if (fieldClass.isPrimitive) {
   sizeCount(primitiveSize(fieldClass)) += 1
 } else {
-  // Note: in Java 9+ this would be better with trySetAccessible and 
canAccess
   try {
-field.setAccessible(true) // Enable future get()'s on this field
-pointerFields = field :: pointerFields
+if (field.trySetAccessible()) { // Enable future get()'s on this 
field
+  pointerFields = field :: pointerFields
+}
   } catch {
 // If the field isn't accessible, we can still record the pointer 
size
 // but can't know more about the field, so ignore it
 case _: SecurityException =>
   // do nothing
-// Java 9+ can throw InaccessibleObjectException but the class is 
Java 9+-only
-case re: RuntimeException
-if re.getClass.getSimpleName == "InaccessibleObjectException" 
=>
-

[spark] branch branch-3.3 updated: [MINOR][SQL] Remove signature from Hive thriftserver exception

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 746f936f4b5 [MINOR][SQL] Remove signature from Hive thriftserver 
exception
746f936f4b5 is described below

commit 746f936f4b5d233264ee31e4298074355bc28fda
Author: Sean Owen 
AuthorDate: Tue Oct 17 16:10:56 2023 -0700

[MINOR][SQL] Remove signature from Hive thriftserver exception

### What changes were proposed in this pull request?

Don't return expected signature to caller in Hive thriftserver exception

### Why are the changes needed?

Please see private discussion

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43402 from srowen/HiveCookieSigner.

Authored-by: Sean Owen 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cf59b1f51c16301f689b4e0f17ba4dbd140e1b19)
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/hive/service/CookieSigner.java| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
index 782e47a6cd9..4b8d2cb1536 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
@@ -81,8 +81,7 @@ public class CookieSigner {
   LOG.debug("Signature generated for " + rawValue + " inside verify is " + 
currentSignature);
 }
 if (!MessageDigest.isEqual(originalSignature.getBytes(), 
currentSignature.getBytes())) {
-  throw new IllegalArgumentException("Invalid sign, original = " + 
originalSignature +
-" current = " + currentSignature);
+  throw new IllegalArgumentException("Invalid sign");
 }
 return rawValue;
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.4 updated: [MINOR][SQL] Remove signature from Hive thriftserver exception

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new e2911e7c208 [MINOR][SQL] Remove signature from Hive thriftserver 
exception
e2911e7c208 is described below

commit e2911e7c208f49f4fb7575bdd33c92e0a3b645a2
Author: Sean Owen 
AuthorDate: Tue Oct 17 16:10:56 2023 -0700

[MINOR][SQL] Remove signature from Hive thriftserver exception

### What changes were proposed in this pull request?

Don't return expected signature to caller in Hive thriftserver exception

### Why are the changes needed?

Please see private discussion

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43402 from srowen/HiveCookieSigner.

Authored-by: Sean Owen 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cf59b1f51c16301f689b4e0f17ba4dbd140e1b19)
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/hive/service/CookieSigner.java| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
index 782e47a6cd9..4b8d2cb1536 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
@@ -81,8 +81,7 @@ public class CookieSigner {
   LOG.debug("Signature generated for " + rawValue + " inside verify is " + 
currentSignature);
 }
 if (!MessageDigest.isEqual(originalSignature.getBytes(), 
currentSignature.getBytes())) {
-  throw new IllegalArgumentException("Invalid sign, original = " + 
originalSignature +
-" current = " + currentSignature);
+  throw new IllegalArgumentException("Invalid sign");
 }
 return rawValue;
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [MINOR][SQL] Remove signature from Hive thriftserver exception

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 18599ea750f [MINOR][SQL] Remove signature from Hive thriftserver 
exception
18599ea750f is described below

commit 18599ea750f50e07a910487fb3a871ed69fb9cab
Author: Sean Owen 
AuthorDate: Tue Oct 17 16:10:56 2023 -0700

[MINOR][SQL] Remove signature from Hive thriftserver exception

### What changes were proposed in this pull request?

Don't return expected signature to caller in Hive thriftserver exception

### Why are the changes needed?

Please see private discussion

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43402 from srowen/HiveCookieSigner.

Authored-by: Sean Owen 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit cf59b1f51c16301f689b4e0f17ba4dbd140e1b19)
Signed-off-by: Dongjoon Hyun 
---
 .../src/main/java/org/apache/hive/service/CookieSigner.java| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
index 782e47a6cd9..4b8d2cb1536 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java
@@ -81,8 +81,7 @@ public class CookieSigner {
   LOG.debug("Signature generated for " + rawValue + " inside verify is " + 
currentSignature);
 }
 if (!MessageDigest.isEqual(originalSignature.getBytes(), 
currentSignature.getBytes())) {
-  throw new IllegalArgumentException("Invalid sign, original = " + 
originalSignature +
-" current = " + currentSignature);
+  throw new IllegalArgumentException("Invalid sign");
 }
 return rawValue;
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4a0ed9cd725 -> cf59b1f51c1)

2023-10-17 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 4a0ed9cd725 [SPARK-45577][PYTHON] Fix 
UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named 
arguments
 add cf59b1f51c1 [MINOR][SQL] Remove signature from Hive thriftserver 
exception

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/hive/service/CookieSigner.java| 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (b10fea96b5b -> 4a0ed9cd725)

2023-10-17 Thread ueshin

This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from b10fea96b5b [SPARK-45566][PS] Support Pandas-like testing utils for 
Pandas API on Spark
 add 4a0ed9cd725 [SPARK-45577][PYTHON] Fix 
UserDefinedPythonTableFunctionAnalyzeRunner to pass folded values from named 
arguments

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/test_udtf.py   | 17 +++--
 .../execution/python/UserDefinedPythonFunction.scala| 12 
 2 files changed, 23 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45566][PS] Support Pandas-like testing utils for Pandas API on Spark

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b10fea96b5b [SPARK-45566][PS] Support Pandas-like testing utils for 
Pandas API on Spark
b10fea96b5b is described below

commit b10fea96b5b0fd6c3623b0463d17dc583de3e995
Author: Haejoon Lee 
AuthorDate: Wed Oct 18 06:59:24 2023 +0900

[SPARK-45566][PS] Support Pandas-like testing utils for Pandas API on Spark

### What changes were proposed in this pull request?

This PR proposes to support utility functions `assert_frame_equal`, 
`assert_series_equal`, and `assert_index_equal` in the Pandas API on Spark to 
aid users in testing.

See 
[pd.assert_frame_equal](https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_frame_equal.html),
 
[pd.assert_series_equal](https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_series_equal.html),
 
[pd.assert_index_equal](https://pandas.pydata.org/docs/reference/api/pandas.testing.assert_index_equal.html)
 for more detail.

### Why are the changes needed?

These utility functions allow users to efficiently test the equality of 
`DataFrames`, `Series`, and `Indexes` in the Pandas API on Spark. Ensuring 
accurate testing helps in maintaining code quality and user trust in the 
platform.

e.g.
```python
from pyspark.pandas.testing import assert_frame_equal

df1 = spark.createDataFrame([('Alice', 1), ('Bob', 2)], ["name", "age"])
df2 = spark.createDataFrame([('Alice', 1), ('Bob', 2)], ["name", "age"])

assert_frame_equal(df1, df2)
```

### Does this PR introduce _any_ user-facing change?

Yes. Users will now have access to `assert_frame_equal`, 
`assert_series_equal`, `and assert_index_equal` functions for testing purposes.

### How was this patch tested?

Added doctests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43398 from itholic/SPARK-45566.

Authored-by: Haejoon Lee 
Signed-off-by: Hyukjin Kwon 
---
 .../docs/source/reference/pyspark.pandas/index.rst |   1 +
 .../pyspark.pandas/{index.rst => testing.rst}  |  31 +-
 python/pyspark/pandas/testing.py   | 328 +
 3 files changed, 341 insertions(+), 19 deletions(-)

diff --git a/python/docs/source/reference/pyspark.pandas/index.rst 
b/python/docs/source/reference/pyspark.pandas/index.rst
index 31fc95e95f1..0d45ba64b4d 100644
--- a/python/docs/source/reference/pyspark.pandas/index.rst
+++ b/python/docs/source/reference/pyspark.pandas/index.rst
@@ -38,3 +38,4 @@ This page gives an overview of all public pandas API on Spark.
resampling
ml
extensions
+   testing
diff --git a/python/docs/source/reference/pyspark.pandas/index.rst 
b/python/docs/source/reference/pyspark.pandas/testing.rst
similarity index 69%
copy from python/docs/source/reference/pyspark.pandas/index.rst
copy to python/docs/source/reference/pyspark.pandas/testing.rst
index 31fc95e95f1..67589fb019a 100644
--- a/python/docs/source/reference/pyspark.pandas/index.rst
+++ b/python/docs/source/reference/pyspark.pandas/testing.rst
@@ -16,25 +16,18 @@
 under the License.
 
 
-===
-Pandas API on Spark
-===
+.. _api.testing:
 
-This page gives an overview of all public pandas API on Spark.
+===
+Testing
+===
+.. currentmodule:: pyspark.pandas
 
-.. note::
-   pandas API on Spark follows the API specifications of latest pandas release.
+Assertion functions
+---
+.. autosummary::
+   :toctree: api/
 
-.. toctree::
-   :maxdepth: 2
-
-   io
-   general_functions
-   series
-   frame
-   indexing
-   window
-   groupby
-   resampling
-   ml
-   extensions
+   testing.assert_frame_equal
+   testing.assert_series_equal
+   testing.assert_index_equal
diff --git a/python/pyspark/pandas/testing.py b/python/pyspark/pandas/testing.py
new file mode 100644
index 000..49ec6081338
--- /dev/null
+++ b/python/pyspark/pandas/testing.py
@@ -0,0 +1,328 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitat

[spark] branch master updated: [MINOR][DOCS] Fix one typo

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f1ae56b152b [MINOR][DOCS] Fix one typo
f1ae56b152b is described below

commit f1ae56b152bdf19246d698b65e553790ad54306b
Author: Ruifeng Zheng 
AuthorDate: Tue Oct 17 13:49:41 2023 -0500

[MINOR][DOCS] Fix one typo

### What changes were proposed in this pull request?
Fix one typo

### Why are the changes needed?
for doc

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
I didn't find other similar typos in this page, so only one fix

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43401 from zhengruifeng/minor_typo_connect_overview.

Authored-by: Ruifeng Zheng 
Signed-off-by: Sean Owen 
---
 docs/spark-connect-overview.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/spark-connect-overview.md b/docs/spark-connect-overview.md
index 82d84f39ca1..c7bad0994a8 100644
--- a/docs/spark-connect-overview.md
+++ b/docs/spark-connect-overview.md
@@ -261,7 +261,7 @@ spark-connect-repl --host myhost.com --port 443 --token 
ABCDEFG
 
 The supported list of CLI arguments may be found 
[here](https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L48).
 
- Configure programmatically with a connection ctring
+ Configure programmatically with a connection string
 
 The connection may also be programmatically created using 
_SparkSession#builder_ as in this example:
 {% highlight scala %}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 922844fff65 [SPARK-45564][SQL] Simplify 
'DataFrameStatFunctions.bloomFilter' with 'BloomFilterAggregate' expression
922844fff65 is described below

commit 922844fff65ac38fd93bd0c914dcc7e5cf879996
Author: Ruifeng Zheng 
AuthorDate: Tue Oct 17 10:11:36 2023 -0500

[SPARK-45564][SQL] Simplify 'DataFrameStatFunctions.bloomFilter' with 
'BloomFilterAggregate' expression

### What changes were proposed in this pull request?
Simplify 'DataFrameStatFunctions.bloomFilter' function with 
'BloomFilterAggregate' expression

### Why are the changes needed?
existing implementation was based on RDD, and it can be simplified by 
dataframe operations

### Does this PR introduce _any_ user-facing change?
when the input parameters or datatypes are invalid, throw 
`AnalysisException` instead of `IllegalArgumentException`

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43391 from zhengruifeng/sql_reimpl_stat_bloomFilter.

Authored-by: Ruifeng Zheng 
Signed-off-by: Sean Owen 
---
 .../apache/spark/sql/DataFrameStatFunctions.scala  | 68 +-
 1 file changed, 14 insertions(+), 54 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index 9d4f83c53a3..de3b100cd6a 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -23,6 +23,8 @@ import scala.jdk.CollectionConverters._
 
 import org.apache.spark.annotation.Stable
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.Literal
+import org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate
 import org.apache.spark.sql.execution.stat._
 import org.apache.spark.sql.functions.col
 import org.apache.spark.sql.types._
@@ -535,7 +537,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, -1L, fpp)
+bloomFilter(Column(colName), expectedNumItems, fpp)
   }
 
   /**
@@ -547,7 +549,8 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, fpp: Double): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, -1L, fpp)
+val numBits = BloomFilter.optimalNumOfBits(expectedNumItems, fpp)
+bloomFilter(col, expectedNumItems, numBits)
   }
 
   /**
@@ -559,7 +562,7 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(colName: String, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(Column(colName), expectedNumItems, numBits, Double.NaN)
+bloomFilter(Column(colName), expectedNumItems, numBits)
   }
 
   /**
@@ -571,57 +574,14 @@ final class DataFrameStatFunctions private[sql](df: 
DataFrame) {
* @since 2.0.0
*/
   def bloomFilter(col: Column, expectedNumItems: Long, numBits: Long): 
BloomFilter = {
-buildBloomFilter(col, expectedNumItems, numBits, Double.NaN)
-  }
-
-  private def buildBloomFilter(col: Column, expectedNumItems: Long,
-   numBits: Long,
-   fpp: Double): BloomFilter = {
-val singleCol = df.select(col)
-val colType = singleCol.schema.head.dataType
-
-require(colType == StringType || colType.isInstanceOf[IntegralType],
-  s"Bloom filter only supports string type and integral types, but got 
$colType.")
-
-val updater: (BloomFilter, InternalRow) => Unit = colType match {
-  // For string type, we can get bytes of our `UTF8String` directly, and 
call the `putBinary`
-  // instead of `putString` to avoid unnecessary conversion.
-  case StringType => (filter, row) => 
filter.putBinary(row.getUTF8String(0).getBytes)
-  case ByteType => (filter, row) => filter.putLong(row.getByte(0))
-  case ShortType => (filter, row) => filter.putLong(row.getShort(0))
-  case IntegerType => (filter, row) => filter.putLong(row.getInt(0))
-  case LongType => (filter, row) => filter.putLong(row.getLong(0))
-  case _ =>
-throw new IllegalArgumentException(
-  s"Bloom filter only supports string type and integral types, " +
-s"and does not support type $colType."
-)
-}
-
-
singleCol.queryExecution.toRdd.treeAggregate(null.asInstanceOf[BloomFilter]

[spark] branch branch-3.3 updated: [SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

2023-10-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 8cd3c1a9c1c [SPARK-45568][TESTS] Fix flaky 
WholeStageCodegenSparkSubmitSuite
8cd3c1a9c1c is described below

commit 8cd3c1a9c1c336155fe09728171aba84ef55ef2d
Author: Kent Yao 
AuthorDate: Tue Oct 17 22:19:18 2023 +0800

[SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

### What changes were proposed in this pull request?

WholeStageCodegenSparkSubmitSuite is 
[flaky](https://github.com/apache/spark/actions/runs/6479534195/job/17593342589)
 because SHUFFLE_PARTITIONS(200) creates 200 reducers for one total core and 
improper stop progress causes executor launcher reties. The heavy load and 
reties might result in timeout test failures.

### Why are the changes needed?

CI robustness

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing WholeStageCodegenSparkSubmitSuite
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43394 from yaooqinn/SPARK-45568.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit f00ec39542a5f9ac75d8c24f0f04a7be703c8d7c)
Signed-off-by: Kent Yao 
---
 .../WholeStageCodegenSparkSubmitSuite.scala| 57 --
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
index 73c4e4c3e1e..06ba8fb772a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
@@ -26,6 +26,7 @@ import org.apache.spark.deploy.SparkSubmitTestUtils
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{QueryTest, Row, SparkSession}
 import org.apache.spark.sql.functions.{array, col, count, lit}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 import org.apache.spark.unsafe.Platform
 import org.apache.spark.util.ResetSystemProperties
@@ -68,39 +69,41 @@ class WholeStageCodegenSparkSubmitSuite extends 
SparkSubmitTestUtils
 
 object WholeStageCodegenSparkSubmitSuite extends Assertions with Logging {
 
-  var spark: SparkSession = _
-
   def main(args: Array[String]): Unit = {
 TestUtils.configTestLog4j2("INFO")
 
-spark = SparkSession.builder().getOrCreate()
+val spark = SparkSession.builder()
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, "2")
+  .getOrCreate()
+
+try {
+  // Make sure the test is run where the driver and the executors uses 
different object layouts
+  val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
+  val executorArrayHeaderSize =
+spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect().head
+  assert(driverArrayHeaderSize > executorArrayHeaderSize)
 
-// Make sure the test is run where the driver and the executors uses 
different object layouts
-val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
-val executorArrayHeaderSize =
-  spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect.head.toInt
-assert(driverArrayHeaderSize > executorArrayHeaderSize)
+  val df = spark.range(71773).select((col("id") % 
lit(10)).cast(IntegerType) as "v")
+.groupBy(array(col("v"))).agg(count(col("*")))
+  val plan = df.queryExecution.executedPlan
+  assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
 
-val df = spark.range(71773).select((col("id") % lit(10)).cast(IntegerType) 
as "v")
-  .groupBy(array(col("v"))).agg(count(col("*")))
-val plan = df.queryExecution.executedPlan
-assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
+  val expectedAnswer =
+Row(Array(0), 7178) ::
+  Row(Array(1), 7178) ::
+  Row(Array(2), 7178) ::
+  Row(Array(3), 7177) ::
+  Row(Array(4), 7177) ::
+  Row(Array(5), 7177) ::
+  Row(Array(6), 7177) ::
+  Row(Array(7), 7177) ::
+  Row(Array(8), 7177) ::
+  Row(Array(9), 7177) :: Nil
 
-val expectedAnswer =
-  Row(Array(0), 7178) ::
-Row(Array(1), 7178) ::
-Row(Array(2), 7178) ::
-Row(Array(3), 7177) ::
-Row(Array(4), 7177) ::
-Row(Array(5), 7177) ::
-Row(Array(6), 7177) ::
-Row(Array(7), 7177) ::
-Row(Array(8), 7177) ::
-Row(Array(9), 7177) :: Nil
-val result = df.collect
-QueryTest.sameRows(result.toSeq, expectedAnswer) match {
-  case Some(errMs

[spark] branch branch-3.4 updated: [SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

2023-10-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
 new 03b7f7d71bf [SPARK-45568][TESTS] Fix flaky 
WholeStageCodegenSparkSubmitSuite
03b7f7d71bf is described below

commit 03b7f7d71bf638b470b119b09c882253d32945a5
Author: Kent Yao 
AuthorDate: Tue Oct 17 22:19:18 2023 +0800

[SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

### What changes were proposed in this pull request?

WholeStageCodegenSparkSubmitSuite is 
[flaky](https://github.com/apache/spark/actions/runs/6479534195/job/17593342589)
 because SHUFFLE_PARTITIONS(200) creates 200 reducers for one total core and 
improper stop progress causes executor launcher reties. The heavy load and 
reties might result in timeout test failures.

### Why are the changes needed?

CI robustness

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing WholeStageCodegenSparkSubmitSuite
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43394 from yaooqinn/SPARK-45568.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit f00ec39542a5f9ac75d8c24f0f04a7be703c8d7c)
Signed-off-by: Kent Yao 
---
 .../WholeStageCodegenSparkSubmitSuite.scala| 57 --
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
index 73c4e4c3e1e..06ba8fb772a 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
@@ -26,6 +26,7 @@ import org.apache.spark.deploy.SparkSubmitTestUtils
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{QueryTest, Row, SparkSession}
 import org.apache.spark.sql.functions.{array, col, count, lit}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 import org.apache.spark.unsafe.Platform
 import org.apache.spark.util.ResetSystemProperties
@@ -68,39 +69,41 @@ class WholeStageCodegenSparkSubmitSuite extends 
SparkSubmitTestUtils
 
 object WholeStageCodegenSparkSubmitSuite extends Assertions with Logging {
 
-  var spark: SparkSession = _
-
   def main(args: Array[String]): Unit = {
 TestUtils.configTestLog4j2("INFO")
 
-spark = SparkSession.builder().getOrCreate()
+val spark = SparkSession.builder()
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, "2")
+  .getOrCreate()
+
+try {
+  // Make sure the test is run where the driver and the executors uses 
different object layouts
+  val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
+  val executorArrayHeaderSize =
+spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect().head
+  assert(driverArrayHeaderSize > executorArrayHeaderSize)
 
-// Make sure the test is run where the driver and the executors uses 
different object layouts
-val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
-val executorArrayHeaderSize =
-  spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect.head.toInt
-assert(driverArrayHeaderSize > executorArrayHeaderSize)
+  val df = spark.range(71773).select((col("id") % 
lit(10)).cast(IntegerType) as "v")
+.groupBy(array(col("v"))).agg(count(col("*")))
+  val plan = df.queryExecution.executedPlan
+  assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
 
-val df = spark.range(71773).select((col("id") % lit(10)).cast(IntegerType) 
as "v")
-  .groupBy(array(col("v"))).agg(count(col("*")))
-val plan = df.queryExecution.executedPlan
-assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
+  val expectedAnswer =
+Row(Array(0), 7178) ::
+  Row(Array(1), 7178) ::
+  Row(Array(2), 7178) ::
+  Row(Array(3), 7177) ::
+  Row(Array(4), 7177) ::
+  Row(Array(5), 7177) ::
+  Row(Array(6), 7177) ::
+  Row(Array(7), 7177) ::
+  Row(Array(8), 7177) ::
+  Row(Array(9), 7177) :: Nil
 
-val expectedAnswer =
-  Row(Array(0), 7178) ::
-Row(Array(1), 7178) ::
-Row(Array(2), 7178) ::
-Row(Array(3), 7177) ::
-Row(Array(4), 7177) ::
-Row(Array(5), 7177) ::
-Row(Array(6), 7177) ::
-Row(Array(7), 7177) ::
-Row(Array(8), 7177) ::
-Row(Array(9), 7177) :: Nil
-val result = df.collect
-QueryTest.sameRows(result.toSeq, expectedAnswer) match {
-  case Some(errMs

[spark] branch branch-3.5 updated: [SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

2023-10-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 6a5747d66e5 [SPARK-45568][TESTS] Fix flaky 
WholeStageCodegenSparkSubmitSuite
6a5747d66e5 is described below

commit 6a5747d66e53ed0d934cdd9ca5c9bd9fde6868e6
Author: Kent Yao 
AuthorDate: Tue Oct 17 22:19:18 2023 +0800

[SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

### What changes were proposed in this pull request?

WholeStageCodegenSparkSubmitSuite is 
[flaky](https://github.com/apache/spark/actions/runs/6479534195/job/17593342589)
 because SHUFFLE_PARTITIONS(200) creates 200 reducers for one total core and 
improper stop progress causes executor launcher reties. The heavy load and 
reties might result in timeout test failures.

### Why are the changes needed?

CI robustness

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing WholeStageCodegenSparkSubmitSuite
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43394 from yaooqinn/SPARK-45568.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
(cherry picked from commit f00ec39542a5f9ac75d8c24f0f04a7be703c8d7c)
Signed-off-by: Kent Yao 
---
 .../WholeStageCodegenSparkSubmitSuite.scala| 57 --
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
index e253de76221..69145d890fc 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
@@ -26,6 +26,7 @@ import org.apache.spark.deploy.SparkSubmitTestUtils
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{QueryTest, Row, SparkSession}
 import org.apache.spark.sql.functions.{array, col, count, lit}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 import org.apache.spark.tags.ExtendedSQLTest
 import org.apache.spark.unsafe.Platform
@@ -70,39 +71,41 @@ class WholeStageCodegenSparkSubmitSuite extends 
SparkSubmitTestUtils
 
 object WholeStageCodegenSparkSubmitSuite extends Assertions with Logging {
 
-  var spark: SparkSession = _
-
   def main(args: Array[String]): Unit = {
 TestUtils.configTestLog4j2("INFO")
 
-spark = SparkSession.builder().getOrCreate()
+val spark = SparkSession.builder()
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, "2")
+  .getOrCreate()
+
+try {
+  // Make sure the test is run where the driver and the executors uses 
different object layouts
+  val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
+  val executorArrayHeaderSize =
+spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect().head
+  assert(driverArrayHeaderSize > executorArrayHeaderSize)
 
-// Make sure the test is run where the driver and the executors uses 
different object layouts
-val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
-val executorArrayHeaderSize =
-  spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect.head.toInt
-assert(driverArrayHeaderSize > executorArrayHeaderSize)
+  val df = spark.range(71773).select((col("id") % 
lit(10)).cast(IntegerType) as "v")
+.groupBy(array(col("v"))).agg(count(col("*")))
+  val plan = df.queryExecution.executedPlan
+  assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
 
-val df = spark.range(71773).select((col("id") % lit(10)).cast(IntegerType) 
as "v")
-  .groupBy(array(col("v"))).agg(count(col("*")))
-val plan = df.queryExecution.executedPlan
-assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
+  val expectedAnswer =
+Row(Array(0), 7178) ::
+  Row(Array(1), 7178) ::
+  Row(Array(2), 7178) ::
+  Row(Array(3), 7177) ::
+  Row(Array(4), 7177) ::
+  Row(Array(5), 7177) ::
+  Row(Array(6), 7177) ::
+  Row(Array(7), 7177) ::
+  Row(Array(8), 7177) ::
+  Row(Array(9), 7177) :: Nil
 
-val expectedAnswer =
-  Row(Array(0), 7178) ::
-Row(Array(1), 7178) ::
-Row(Array(2), 7178) ::
-Row(Array(3), 7177) ::
-Row(Array(4), 7177) ::
-Row(Array(5), 7177) ::
-Row(Array(6), 7177) ::
-Row(Array(7), 7177) ::
-Row(Array(8), 7177) ::
-Row(Array(9), 7177) :: Nil
-val result = df.collect
-QueryTest.sameRows(result.toSeq, expectedAnswer) match {
-  case Some(errMsg) =>

[spark] branch master updated: [SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

2023-10-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f00ec39542a [SPARK-45568][TESTS] Fix flaky 
WholeStageCodegenSparkSubmitSuite
f00ec39542a is described below

commit f00ec39542a5f9ac75d8c24f0f04a7be703c8d7c
Author: Kent Yao 
AuthorDate: Tue Oct 17 22:19:18 2023 +0800

[SPARK-45568][TESTS] Fix flaky WholeStageCodegenSparkSubmitSuite

### What changes were proposed in this pull request?

WholeStageCodegenSparkSubmitSuite is 
[flaky](https://github.com/apache/spark/actions/runs/6479534195/job/17593342589)
 because SHUFFLE_PARTITIONS(200) creates 200 reducers for one total core and 
improper stop progress causes executor launcher reties. The heavy load and 
reties might result in timeout test failures.

### Why are the changes needed?

CI robustness

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing WholeStageCodegenSparkSubmitSuite
### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43394 from yaooqinn/SPARK-45568.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 .../WholeStageCodegenSparkSubmitSuite.scala| 57 --
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
index e253de76221..69145d890fc 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSparkSubmitSuite.scala
@@ -26,6 +26,7 @@ import org.apache.spark.deploy.SparkSubmitTestUtils
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{QueryTest, Row, SparkSession}
 import org.apache.spark.sql.functions.{array, col, count, lit}
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types.IntegerType
 import org.apache.spark.tags.ExtendedSQLTest
 import org.apache.spark.unsafe.Platform
@@ -70,39 +71,41 @@ class WholeStageCodegenSparkSubmitSuite extends 
SparkSubmitTestUtils
 
 object WholeStageCodegenSparkSubmitSuite extends Assertions with Logging {
 
-  var spark: SparkSession = _
-
   def main(args: Array[String]): Unit = {
 TestUtils.configTestLog4j2("INFO")
 
-spark = SparkSession.builder().getOrCreate()
+val spark = SparkSession.builder()
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, "2")
+  .getOrCreate()
+
+try {
+  // Make sure the test is run where the driver and the executors uses 
different object layouts
+  val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
+  val executorArrayHeaderSize =
+spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect().head
+  assert(driverArrayHeaderSize > executorArrayHeaderSize)
 
-// Make sure the test is run where the driver and the executors uses 
different object layouts
-val driverArrayHeaderSize = Platform.BYTE_ARRAY_OFFSET
-val executorArrayHeaderSize =
-  spark.sparkContext.range(0, 1).map(_ => 
Platform.BYTE_ARRAY_OFFSET).collect.head.toInt
-assert(driverArrayHeaderSize > executorArrayHeaderSize)
+  val df = spark.range(71773).select((col("id") % 
lit(10)).cast(IntegerType) as "v")
+.groupBy(array(col("v"))).agg(count(col("*")))
+  val plan = df.queryExecution.executedPlan
+  assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
 
-val df = spark.range(71773).select((col("id") % lit(10)).cast(IntegerType) 
as "v")
-  .groupBy(array(col("v"))).agg(count(col("*")))
-val plan = df.queryExecution.executedPlan
-assert(plan.exists(_.isInstanceOf[WholeStageCodegenExec]))
+  val expectedAnswer =
+Row(Array(0), 7178) ::
+  Row(Array(1), 7178) ::
+  Row(Array(2), 7178) ::
+  Row(Array(3), 7177) ::
+  Row(Array(4), 7177) ::
+  Row(Array(5), 7177) ::
+  Row(Array(6), 7177) ::
+  Row(Array(7), 7177) ::
+  Row(Array(8), 7177) ::
+  Row(Array(9), 7177) :: Nil
 
-val expectedAnswer =
-  Row(Array(0), 7178) ::
-Row(Array(1), 7178) ::
-Row(Array(2), 7178) ::
-Row(Array(3), 7177) ::
-Row(Array(4), 7177) ::
-Row(Array(5), 7177) ::
-Row(Array(6), 7177) ::
-Row(Array(7), 7177) ::
-Row(Array(8), 7177) ::
-Row(Array(9), 7177) :: Nil
-val result = df.collect
-QueryTest.sameRows(result.toSeq, expectedAnswer) match {
-  case Some(errMsg) => fail(errMsg)
-  case _ =>
+  QueryTest.checkAnswer(df, expectedAnswer)
+} finally {
+  spark.s

[spark] branch master updated: [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to `other-nullary-override`

2023-10-17 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3b46cc81614 [SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation 
warnings related to `other-nullary-override`
3b46cc81614 is described below

commit 3b46cc816143d5bb553e86e8b716c28982cb5748
Author: YangJie 
AuthorDate: Tue Oct 17 07:34:06 2023 -0500

[SPARK-45512][CORE][SQL][SS][DSTREAM] Fix compilation warnings related to 
`other-nullary-override`

### What changes were proposed in this pull request?
This PR fixes two compilation warnings related to `other-nullary-override`

```
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/CloseableIterator.scala:36:16:
 method with a single empty parameter list overrides method hasNext in trait 
Iterator defined without a parameter list [quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.WrappedCloseableIterator
[error]   override def hasNext(): Boolean = innerIterator.hasNext
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/ExecutePlanResponseReattachableIterator.scala:136:16:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.ExecutePlanResponseReattachableIterator
[error]   override def hasNext: Boolean = synchronized {
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala:73:20:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcExceptionConverter.convertIterator
[error]   override def hasNext: Boolean = {
[error]^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:77:18:
 method without a parameter list overrides method next in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator
[error] override def next: U = {
[error]  ^
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcRetryHandler.scala:81:18:
 method without a parameter list overrides method hasNext in class 
WrappedCloseableIterator defined with a single empty parameter list 
[quickfixable]
[error] Applicable -Wconf / nowarn filters for this fatal warning: 
msg=, cat=other-nullary-override, 
site=org.apache.spark.sql.connect.client.GrpcRetryHandler.RetryIterator
[error] override def hasNext: Boolean = {
[error]
```

and removes the corresponding suppression rules from the compilation options

```
"-Wconf:cat=other-nullary-override:wv",
```

On the other hand, the code corresponding to the following three 
suppression rules no longer exists, so the corresponding suppression rules were 
also cleaned up in this pr.

```
"-Wconf:cat=lint-multiarg-infix:wv",
"-Wconf:msg=method with a single empty parameter list overrides method 
without any parameter list:s",
"-Wconf:msg=method without a parameter list overrides a method with a 
single empty one:s",
```

### Why are the changes needed?
Code clean up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43332 from LuciferYang/other-nullary-override.

Lead-authored-by: YangJie 
Co-authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/avro/AvroRowReaderSuite.scala | 10 +-
 .../spark/sql/connect/client/CloseableIterator.scala   |  2 +-
 .../ExecutePlanResponseReattachableIterator.scala  |  4 ++--
 .../spark/sql/connect/client/GrpcRetryHandler.scala|  2 +-
 ..

Re: [PR] update the canonical link, due to a change in some addresses in the latest version of the document [spark-website]

2023-10-17 Thread via GitHub



panbingkun commented on PR #483:
URL: https://github.com/apache/spark-website/pull/483#issuecomment-1766288284

   cc @allanf-db @HyukjinKwon @zhengruifeng @allisonwang-db @srowen 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[PR] update the canonical link, due to a change in some addresses in the latest version of the document [spark-website]

2023-10-17 Thread via GitHub



panbingkun opened a new pull request, #483:
URL: https://github.com/apache/spark-website/pull/483

   The pr is followup https://github.com/apache/spark-website/pull/482.
   
   https://github.com/apache/spark-website/pull/482#issuecomment-1765322679
   As discussed above, due to changes in some document addresses after version 
`3.3.0`, `the canonical link` is incorrect. We are now correcting it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45572][PS][DOCS] Enable doctest of Frame.transpose

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7af95b49b89 [SPARK-45572][PS][DOCS] Enable doctest of Frame.transpose
7af95b49b89 is described below

commit 7af95b49b89e556c57eb4a0b3ac476c8051c11de
Author: Ruifeng Zheng 
AuthorDate: Tue Oct 17 20:47:27 2023 +0900

[SPARK-45572][PS][DOCS] Enable doctest of Frame.transpose

### What changes were proposed in this pull request?
Enable doctest of Frame.transpose

### Why are the changes needed?
for better test coverage

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #43399 from zhengruifeng/ps_enable_transpose_doctest.

Authored-by: Ruifeng Zheng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/frame.py | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 7d93af0485f..8b20abf9652 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -2654,8 +2654,6 @@ defaultdict(, {'col..., 'col...})]
 psdf._to_internal_pandas(), self.to_latex, pd.DataFrame.to_latex, 
args
 )
 
-# TODO: enable doctests once we drop Spark 2.3.x (due to type coercion 
logic
-#  when creating arrays)
 def transpose(self) -> "DataFrame":
 """
 Transpose index and columns.
@@ -2707,8 +2705,8 @@ defaultdict(, {'col..., 'col...})]
 0 1 3
 1 2 4
 
->>> df1_transposed = df1.T.sort_index()  # doctest: +SKIP
->>> df1_transposed  # doctest: +SKIP
+>>> df1_transposed = df1.T.sort_index()
+>>> df1_transposed
   0  1
 col1  1  2
 col2  3  4
@@ -2720,7 +2718,7 @@ defaultdict(, {'col..., 'col...})]
 col1int64
 col2int64
 dtype: object
->>> df1_transposed.dtypes  # doctest: +SKIP
+>>> df1_transposed.dtypes
 0int64
 1int64
 dtype: object
@@ -2736,8 +2734,8 @@ defaultdict(, {'col..., 'col...})]
 09.5 0   12
 18.0 0   22
 
->>> df2_transposed = df2.T.sort_index()  # doctest: +SKIP
->>> df2_transposed  # doctest: +SKIP
+>>> df2_transposed = df2.T.sort_index()
+>>> df2_transposed
   0 1
 age12.0  22.0
 kids0.0   0.0
@@ -2752,7 +2750,7 @@ defaultdict(, {'col..., 'col...})]
 ageint64
 dtype: object
 
->>> df2_transposed.dtypes  # doctest: +SKIP
+>>> df2_transposed.dtypes
 0float64
 1float64
 dtype: object


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45550][PS] Remove deprecated APIs from Pandas API on Spark

2023-10-17 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5280d492ad6 [SPARK-45550][PS] Remove deprecated APIs from Pandas API 
on Spark
5280d492ad6 is described below

commit 5280d492ad636782ca910a3c0bf0f0cb5bce2223
Author: Haejoon Lee 
AuthorDate: Tue Oct 17 19:40:12 2023 +0800

[SPARK-45550][PS] Remove deprecated APIs from Pandas API on Spark

### What changes were proposed in this pull request?

This PR proposes to remove deprecated APIs from Pandas API on Spark:

- Remove `DataFrame.to_spark_io`, use `DataFrame.spark.to_spark_io` instead.
- Remove `(Index|Series).is_monotonic`, use 
`(Index|Series).is_monotonic_increasing` instead.

### Why are the changes needed?

To cleanup API surface

### Does this PR introduce _any_ user-facing change?

Remove APIs no longer available from Spark 4.x.

### How was this patch tested?

The existing CI should pass

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43384 from itholic/SPARK-45550.

Authored-by: Haejoon Lee 
Signed-off-by: Ruifeng Zheng 
---
 .../source/migration_guide/pyspark_upgrade.rst |  2 +
 .../docs/source/reference/pyspark.pandas/frame.rst |  1 -
 .../source/reference/pyspark.pandas/indexing.rst   |  1 -
 python/docs/source/reference/pyspark.pandas/io.rst |  2 +-
 .../source/reference/pyspark.pandas/series.rst |  1 -
 python/pyspark/pandas/base.py  | 83 --
 python/pyspark/pandas/frame.py | 23 --
 python/pyspark/pandas/generic.py   |  1 -
 python/pyspark/pandas/indexing.py  |  4 +-
 python/pyspark/pandas/namespace.py |  7 +-
 python/pyspark/pandas/spark/accessors.py   |  7 +-
 .../pandas/tests/test_dataframe_spark_io.py|  4 +-
 12 files changed, 13 insertions(+), 123 deletions(-)

diff --git a/python/docs/source/migration_guide/pyspark_upgrade.rst 
b/python/docs/source/migration_guide/pyspark_upgrade.rst
index d081275dc83..933fa936f70 100644
--- a/python/docs/source/migration_guide/pyspark_upgrade.rst
+++ b/python/docs/source/migration_guide/pyspark_upgrade.rst
@@ -51,6 +51,8 @@ Upgrading from PySpark 3.5 to 4.0
 * In Spark 4.0, ``Index.asi8`` has been removed from pandas API on Spark, use 
``Index.astype`` instead.
 * In Spark 4.0, ``Index.is_type_compatible`` has been removed from pandas API 
on Spark, use ``Index.isin`` instead.
 * In Spark 4.0, ``col_space`` parameter from ``DataFrame.to_latex`` and 
``Series.to_latex`` has been removed from pandas API on Spark.
+* In Spark 4.0, ``DataFrame.to_spark_io`` has been removed from pandas API on 
Spark, use ``DataFrame.spark.to_spark_io`` instead.
+* In Spark 4.0, ``Series.is_monotonic`` and ``Index.is_monotonic`` have been 
removed from pandas API on Spark, use ``Series.is_monotonic_increasing`` or 
``Index.is_monotonic_increasing`` instead respectively.
 
 
 Upgrading from PySpark 3.3 to 3.4
diff --git a/python/docs/source/reference/pyspark.pandas/frame.rst 
b/python/docs/source/reference/pyspark.pandas/frame.rst
index a22078f86e2..911999b56be 100644
--- a/python/docs/source/reference/pyspark.pandas/frame.rst
+++ b/python/docs/source/reference/pyspark.pandas/frame.rst
@@ -276,7 +276,6 @@ Serialization / IO / Conversion
DataFrame.to_table
DataFrame.to_delta
DataFrame.to_parquet
-   DataFrame.to_spark_io
DataFrame.to_csv
DataFrame.to_orc
DataFrame.to_pandas
diff --git a/python/docs/source/reference/pyspark.pandas/indexing.rst 
b/python/docs/source/reference/pyspark.pandas/indexing.rst
index d6be57ee9c8..08f5e224e06 100644
--- a/python/docs/source/reference/pyspark.pandas/indexing.rst
+++ b/python/docs/source/reference/pyspark.pandas/indexing.rst
@@ -36,7 +36,6 @@ Properties
 .. autosummary::
:toctree: api/
 
-   Index.is_monotonic
Index.is_monotonic_increasing
Index.is_monotonic_decreasing
Index.is_unique
diff --git a/python/docs/source/reference/pyspark.pandas/io.rst 
b/python/docs/source/reference/pyspark.pandas/io.rst
index b39a4e8778a..118dd49a4ad 100644
--- a/python/docs/source/reference/pyspark.pandas/io.rst
+++ b/python/docs/source/reference/pyspark.pandas/io.rst
@@ -69,7 +69,7 @@ Generic Spark I/O
:toctree: api/
 
read_spark_io
-   DataFrame.to_spark_io
+   DataFrame.spark.to_spark_io
 
 Flat File / CSV
 ---
diff --git a/python/docs/source/reference/pyspark.pandas/series.rst 
b/python/docs/source/reference/pyspark.pandas/series.rst
index 7b658d45d4b..eb4a499c054 100644
--- a/python/docs/source/reference/pyspark.pandas/series.rst
+++ b/python/docs/source/reference/pyspark.pandas/series.rst
@@ -170,7 +170,6 @@ Computations / Descriptive Stats
Series.value_co

[spark] branch master updated: [SPARK-45562][SQL] XML: Make 'rowTag' a required option

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d63ca6394f [SPARK-45562][SQL] XML: Make 'rowTag' a required option
4d63ca6394f is described below

commit 4d63ca6394fe8692e1f9bceb93606a86b88b5dc1
Author: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
AuthorDate: Tue Oct 17 20:38:02 2023 +0900

[SPARK-45562][SQL] XML: Make 'rowTag' a required option

### What changes were proposed in this pull request?
User can specify `rowTag` option that is the name of the XML element that 
maps to a `DataFrame Row`.  A non-existent `rowTag` will not infer any schema 
or generate any `DataFrame` rows. Currently, not specifying `rowTag` option 
results in picking up its default value of  `ROW`, which won't match a real XML 
element in most scenarios. This results in an empty dataframe and confuse new 
users.

This PR makes `rowTag` a required option for both read and write. XML 
built-in functions (from_xml/schema_of_xml) ignore `rowTag` option.

### Why are the changes needed?
See above

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43389 from sandip-db/xml-rowTag.

Authored-by: Sandip Agarwala <131817656+sandip...@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon 
---
 .../apache/spark/sql/catalyst/xml/XmlOptions.scala |   4 +-
 .../execution/datasources/xml/XmlFileFormat.scala  |   2 +
 .../execution/datasources/xml/JavaXmlSuite.java|  10 +-
 .../sql/execution/datasources/xml/XmlSuite.scala   | 125 +++--
 .../xml/parsers/StaxXmlGeneratorSuite.scala|   4 +-
 5 files changed, 103 insertions(+), 42 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlOptions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlOptions.scala
index d0cfff87279..0dedbec58e1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlOptions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlOptions.scala
@@ -63,8 +63,8 @@ private[sql] class XmlOptions(
   }
 
   val compressionCodec = 
parameters.get(COMPRESSION).map(CompressionCodecs.getCodecClassName)
-  val rowTag = parameters.getOrElse(ROW_TAG, XmlOptions.DEFAULT_ROW_TAG)
-  require(rowTag.nonEmpty, s"'$ROW_TAG' option should not be empty string.")
+  val rowTag = parameters.getOrElse(ROW_TAG, XmlOptions.DEFAULT_ROW_TAG).trim
+  require(rowTag.nonEmpty, s"'$ROW_TAG' option should not be an empty string.")
   require(!rowTag.startsWith("<") && !rowTag.endsWith(">"),
   s"'$ROW_TAG' should not include angle brackets")
   val rootTag = parameters.getOrElse(ROOT_TAG, XmlOptions.DEFAULT_ROOT_TAG)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
index baacf7f0748..4342711b00f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XmlFileFormat.scala
@@ -42,6 +42,8 @@ class XmlFileFormat extends TextBasedFileFormat with 
DataSourceRegister {
   def getXmlOptions(
   sparkSession: SparkSession,
   parameters: Map[String, String]): XmlOptions = {
+val rowTagOpt = parameters.get(XmlOptions.ROW_TAG)
+require(rowTagOpt.isDefined, s"'${XmlOptions.ROW_TAG}' option is 
required.")
 new XmlOptions(parameters,
   sparkSession.sessionState.conf.sessionLocalTimeZone,
   sparkSession.sessionState.conf.columnNameOfCorruptRecord)
diff --git 
a/sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/xml/JavaXmlSuite.java
 
b/sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/xml/JavaXmlSuite.java
index b3f39180843..c773459dc4c 100644
--- 
a/sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/xml/JavaXmlSuite.java
+++ 
b/sql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/xml/JavaXmlSuite.java
@@ -82,7 +82,7 @@ public final class JavaXmlSuite {
 public void testXmlParser() {
 Map options = new HashMap<>();
 options.put("rowTag", booksFileTag);
-Dataset df = 
spark.read().options(options).format("xml").load(booksFile);
+Dataset df = spark.read().options(options).xml(booksFile);
 String prefix = XmlOptions.DEFAULT_ATTRIBUTE_PREFIX();
 long result = df.select(prefix + "id").count();
 Assertions.assertEquals(result, numBooks);
@@ -92,7 +92,7 @@ public final class JavaXmlSuite {
 public voi

[spark] branch master updated: [SPARK-45485][CONNECT] User agent improvements: Use SPARK_CONNECT_USER_AGENT env variable and include environment specific attributes

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd627503f96 [SPARK-45485][CONNECT] User agent improvements: Use 
SPARK_CONNECT_USER_AGENT env variable and include environment specific 
attributes
bd627503f96 is described below

commit bd627503f96758edae028a269b7a6ac203a8d941
Author: Robert Dillitz 
AuthorDate: Tue Oct 17 19:32:29 2023 +0900

[SPARK-45485][CONNECT] User agent improvements: Use 
SPARK_CONNECT_USER_AGENT env variable and include environment specific 
attributes

### What changes were proposed in this pull request?
With this PR similar to the[ Python 
client](https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284)
 the Scala client's user agent now:

1. Uses the SPARK_CONNECT_USER_AGENT environment variable if set
2. Includes the OS, JVM version, Scala version, and Spark version

### Why are the changes needed?
Feature parity with the Python client. Better observability of Scala Spark 
Connect clients.

### Does this PR introduce _any_ user-facing change?
By default, the user agent string now contains more useful information.
Before:
`_SPARK_CONNECT_SCALA`
After:
`_SPARK_CONNECT_SCALA spark/4.0.0-SNAPSHOT scala/2.13.12 jvm/17.0.8.1 
os/darwin`

### How was this patch tested?
Tests added & adjusted.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43313 from dillitz/user-agent-improvements.

Authored-by: Robert Dillitz 
Signed-off-by: Hyukjin Kwon 
---
 .../SparkConnectClientBuilderParseTestSuite.scala  |  8 +++---
 .../connect/client/SparkConnectClientSuite.scala   | 13 --
 .../sql/connect/client/SparkConnectClient.scala| 30 +++---
 3 files changed, 42 insertions(+), 9 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala
index e1d4a18d0ff..68d2e86b19d 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientBuilderParseTestSuite.scala
@@ -47,7 +47,7 @@ class SparkConnectClientBuilderParseTestSuite extends 
ConnectFunSuite {
   argumentTest("token", "azbycxdwev1234567890", _.token.get)
   argumentTest("user_id", "U1238", _.userId.get)
   argumentTest("user_name", "alice", _.userName.get)
-  argumentTest("user_agent", "MY APP", _.userAgent)
+  argumentTest("user_agent", "robert", _.userAgent.split(" ")(0))
   argumentTest("session_id", UUID.randomUUID().toString, _.sessionId.get)
 
   test("Argument - remote") {
@@ -95,7 +95,7 @@ class SparkConnectClientBuilderParseTestSuite extends 
ConnectFunSuite {
 "Q12")
   assert(builder.host === "localhost")
   assert(builder.port === 1507)
-  assert(builder.userAgent === "U8912")
+  assert(builder.userAgent.contains("U8912"))
   assert(!builder.sslEnabled)
   assert(builder.token.isEmpty)
   assert(builder.userId.contains("Q12"))
@@ -113,7 +113,7 @@ class SparkConnectClientBuilderParseTestSuite extends 
ConnectFunSuite {
 "cluster=mycl")
   assert(builder.host === "localhost")
   assert(builder.port === 15002)
-  assert(builder.userAgent == "_SPARK_CONNECT_SCALA")
+  assert(builder.userAgent.contains("_SPARK_CONNECT_SCALA"))
   assert(builder.sslEnabled)
   assert(builder.token.isEmpty)
   assert(builder.userId.isEmpty)
@@ -124,7 +124,7 @@ class SparkConnectClientBuilderParseTestSuite extends 
ConnectFunSuite {
   val builder = build("--token", "thisismysecret")
   assert(builder.host === "localhost")
   assert(builder.port === 15002)
-  assert(builder.userAgent === "_SPARK_CONNECT_SCALA")
+  assert(builder.userAgent.contains("_SPARK_CONNECT_SCALA"))
   assert(builder.sslEnabled)
   assert(builder.token.contains("thisismysecret"))
   assert(builder.userId.isEmpty)
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
index a3df39da4a8..b3ff4eb0bb2 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/SparkConnectClientSuite.scala
@@ -270,7 +270,7 @@ class SparkCon

[spark] branch master updated: [SPARK-45567][CONNECT] Remove redundant if in org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender#run

2023-10-17 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c7b20b5dbfb [SPARK-45567][CONNECT] Remove redundant if in 
org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender#run
c7b20b5dbfb is described below

commit c7b20b5dbfbe7eb89f77b3f49854c90b6640a9c3
Author: zhaomin 
AuthorDate: Tue Oct 17 19:31:16 2023 +0900

[SPARK-45567][CONNECT] Remove redundant if in 
org.apache.spark.sql.connect.execution.ExecuteGrpcResponseSender#run

### What changes were proposed in this pull request?

remove redundant ```if```

### Why are the changes needed?

it is redundant.

https://issues.apache.org/jira/browse/SPARK-45567?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

pass ci.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43395 from zhaomin1423/45567.

Authored-by: zhaomin 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/connect/execution/ExecuteGrpcResponseSender.scala   | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala
 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala
index ba5ecc7a045..115cedfe112 100644
--- 
a/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala
+++ 
b/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteGrpcResponseSender.scala
@@ -124,11 +124,9 @@ private[connect] class ExecuteGrpcResponseSender[T <: 
Message](
 execute(lastConsumedStreamIndex)
   } finally {
 executeHolder.removeGrpcResponseSender(this)
-if (!executeHolder.reattachable) {
-  // Non reattachable executions release here immediately.
-  // (Reattachable executions release with ReleaseExecute RPC.)
-  
SparkConnectService.executionManager.removeExecuteHolder(executeHolder.key)
-}
+// Non reattachable executions release here immediately.
+// (Reattachable executions release with ReleaseExecute RPC.)
+
SparkConnectService.executionManager.removeExecuteHolder(executeHolder.key)
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45565][UI] Unnecessary JSON.stringify and JSON.parse loop for task list on stage detail

2023-10-17 Thread yao

This is an automated email from the ASF dual-hosted git repository.

yao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ac70daf7337 [SPARK-45565][UI] Unnecessary JSON.stringify and 
JSON.parse loop for task list on stage detail
ac70daf7337 is described below

commit ac70daf7337324d38742034c1d6afc2f0243b600
Author: Kent Yao 
AuthorDate: Tue Oct 17 18:17:31 2023 +0800

[SPARK-45565][UI] Unnecessary JSON.stringify and JSON.parse loop for task 
list on stage detail

### What changes were proposed in this pull request?

`dataSrc` returns a json value, we don't need to stringify it and parse it 
back

### Why are the changes needed?

performance improvements for UI rendering

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

build and verify the stage page locally

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43392 from yaooqinn/SPARK-45565.

Authored-by: Kent Yao 
Signed-off-by: Kent Yao 
---
 core/src/main/resources/org/apache/spark/ui/static/stagepage.js | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js 
b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
index ad3eca06a0c..4b6b7e219e1 100644
--- a/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
+++ b/core/src/main/resources/org/apache/spark/ui/static/stagepage.js
@@ -841,11 +841,7 @@ $(document).ready(function () {
 data.length = totalTasksToShow;
   }
 },
-"dataSrc": function (jsons) {
-  var jsonStr = JSON.stringify(jsons);
-  var tasksToShow = JSON.parse(jsonStr);
-  return tasksToShow.aaData;
-},
+"dataSrc": (jsons) => jsons.aaData,
 "error": function (_ignored_jqXHR, _ignored_textStatus, 
_ignored_errorThrown) {
   alert("Unable to connect to the server. Looks like the Spark " +
 "application must have ended. Please Switch to the history 
UI.");


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.5 updated: [SPARK-45283][CORE][TESTS][3.5] Make StatusTrackerSuite less fragile

2023-10-17 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.5 by this push:
 new 22a83caa489 [SPARK-45283][CORE][TESTS][3.5] Make StatusTrackerSuite 
less fragile
22a83caa489 is described below

commit 22a83caa4896a8d03ec7e76b3e7a3bd08930adcb
Author: Bo Xiong 
AuthorDate: Tue Oct 17 18:05:23 2023 +0800

[SPARK-45283][CORE][TESTS][3.5] Make StatusTrackerSuite less fragile

### Why are the changes needed?

It's discovered from [Github 
Actions](https://github.com/xiongbo-sjtu/spark/actions/runs/6270601155/job/17028788767)
 that StatusTrackerSuite can run into random failures, as shown by the 
following error message.  The proposed fix is to update the unit test to remove 
the nondeterministic behavior.

The fix has been made to the master branch in 
https://github.com/apache/spark/pull/43194.  This PR is meant to patch 
branch-3.5 only.

```
[info] StatusTrackerSuite:
[info] - basic status API usage (99 milliseconds)
[info] - getJobIdsForGroup() (56 milliseconds)
[info] - getJobIdsForGroup() with takeAsync() (48 milliseconds)
[info] - getJobIdsForGroup() with takeAsync() across multiple 
partitions (58 milliseconds)
[info] - getJobIdsForTag() *** FAILED *** (10 seconds, 77 milliseconds)
[info] The code passed to eventually never returned normally.
   Attempted 651 times over 10.00505994401 seconds.
   Last failure message: Set(3, 2, 1) was not equal to Set(1, 2). 
(StatusTrackerSuite.scala:148)
```

Full trace can be found 
[here](https://issues.apache.org/jira/browse/SPARK-45283).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
```
build/mvn package -DskipTests -pl core
build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.StatusTrackerSuite 
test
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43388 from xiongbo-sjtu/branch-3.5.

Authored-by: Bo Xiong 
Signed-off-by: yangjie01 
---
 core/src/test/scala/org/apache/spark/StatusTrackerSuite.scala | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/core/src/test/scala/org/apache/spark/StatusTrackerSuite.scala 
b/core/src/test/scala/org/apache/spark/StatusTrackerSuite.scala
index 0817abbc6a3..9019ea484b3 100644
--- a/core/src/test/scala/org/apache/spark/StatusTrackerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/StatusTrackerSuite.scala
@@ -140,16 +140,19 @@ class StatusTrackerSuite extends SparkFunSuite with 
Matchers with LocalSparkCont
 }
 
 sc.removeJobTag("tag1")
+
 // takeAsync() across multiple partitions
 val thirdJobFuture = sc.parallelize(1 to 1000, 2).takeAsync(999)
-val thirdJobId = eventually(timeout(10.seconds)) {
-  thirdJobFuture.jobIds.head
+val thirdJobIds = eventually(timeout(10.seconds)) {
+  // Wait for the two jobs triggered by takeAsync
+  thirdJobFuture.jobIds.size should be(2)
+  thirdJobFuture.jobIds
 }
 eventually(timeout(10.seconds)) {
   sc.statusTracker.getJobIdsForTag("tag1").toSet should be (
 Set(firstJobId, secondJobId))
   sc.statusTracker.getJobIdsForTag("tag2").toSet should be (
-Set(secondJobId, thirdJobId))
+Set(secondJobId) ++ thirdJobIds)
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

37 matches

Mail list logo