[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

2022-11-01 Thread GitBox


gaoyajun02 commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168

   We have now located the cause of zero-size chunk loss, 
   We have located the cause of the zero-size chunk problem on the shuffle 
service node. and there is the following information in the system `dmesg -T`:
   ```
   e Nov  1 19:40:04 2022] EXT4-fs (sde1): Delayed block allocation failed for 
inode 25755946 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed 
for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed 
for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   ```
   Although this is not from the software layer, and the number of bad nodes 
that lose data is very low, I think it makes sense to support fallback here.
   
   cc  @otterc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox


beliefer commented on PR #38461:
URL: https://github.com/apache/spark/pull/38461#issuecomment-1298447704

   ping @peter-toth cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec

2022-11-01 Thread GitBox


wangyum commented on PR #38458:
URL: https://github.com/apache/spark/pull/38458#issuecomment-1298446195

   Merged to master, branch-3.3 and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec

2022-11-01 Thread GitBox


wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop 
requirements for zstd mentioned in Parquet compression codec
URL: https://github.com/apache/spark/pull/38458


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes

2022-11-01 Thread GitBox


MaxGekk commented on PR #38454:
URL: https://github.com/apache/spark/pull/38454#issuecomment-1298428612

   @cloud-fan @srielau @itholic @LuciferYang @panbingkun Could you review this 
PR, please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] EnricoMi commented on pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator

2022-11-01 Thread GitBox


EnricoMi commented on PR #38356:
URL: https://github.com/apache/spark/pull/38356#issuecomment-1298425304

   @allisonwang-db can you elaborate on mapping `write.requiredOrdering` to the 
projected columns that you introduced in 
f98f9f8566243a8a01edcaad3b847bbd2f52305b? Was that existing code moved into 
`V1Writes`, or was that logic added.
   
   
https://github.com/apache/spark/blob/ff66adda3ca3762b8c71b14acbd6da00c5508a2e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala#L81-L84


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL

2022-11-01 Thread GitBox


zhengruifeng commented on PR #38459:
URL: https://github.com/apache/spark/pull/38459#issuecomment-1298422766

   merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL

2022-11-01 Thread GitBox


zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support 
session.sql in Connect DSL
URL: https://github.com/apache/spark/pull/38459


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


panbingkun commented on code in PR #38463:
URL: https://github.com/apache/spark/pull/38463#discussion_r1010356813


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:
##
@@ -444,17 +460,32 @@ case class CreateNamedStruct(children: Seq[Expression]) 
extends Expression with
 
   override def checkInputDataTypes(): TypeCheckResult = {
 if (children.size % 2 != 0) {
-  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an even number of 
arguments.")
+  DataTypeMismatch(
+errorSubClass = "WRONG_NUM_ARGS",
+messageParameters = Map(
+  "functionName" -> toSQLId(prettyName),
+  "expectedNum" -> "2n (n > 0)",
+  "actualNum" -> children.length.toString
+)
+  )
 } else {
   val invalidNames = nameExprs.filterNot(e => e.foldable && e.dataType == 
StringType)
   if (invalidNames.nonEmpty) {
-TypeCheckResult.TypeCheckFailure(
-  s"Only foldable ${StringType.catalogString} expressions are allowed 
to appear at odd" +
-s" position, got: ${invalidNames.mkString(",")}")
+DataTypeMismatch(
+  errorSubClass = "CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING",

Review Comment:
   https://user-images.githubusercontent.com/15246973/199229111-f39526ba-1f5e-46b0-b8b5-6abad300674b.png;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


panbingkun commented on code in PR #38463:
URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945


##
core/src/main/resources/error/error-classes.json:
##
@@ -155,6 +160,21 @@
   "To convert values from  to , you can use the 
functions  instead."
 ]
   },
+  "CREATE_MAP_KEY_DIFF_TYPES" : {

Review Comment:
   but
   https://user-images.githubusercontent.com/15246973/199228400-42c1759f-fc16-4205-a052-fc54b1f3d70b.png;>
   Very hard to reuse it.
   Maybe `functionName` pass to: `map's keys` or `map's values`, seems 
unreasonable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


panbingkun commented on code in PR #38463:
URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945


##
core/src/main/resources/error/error-classes.json:
##
@@ -155,6 +160,21 @@
   "To convert values from  to , you can use the 
functions  instead."
 ]
   },
+  "CREATE_MAP_KEY_DIFF_TYPES" : {

Review Comment:
   but
   https://user-images.githubusercontent.com/15246973/199228400-42c1759f-fc16-4205-a052-fc54b1f3d70b.png;>
   Very hard to reuse it.
   Maybe `functionName` pass to: `map's key`, seems unreasonable



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox


panbingkun commented on code in PR #38438:
URL: https://github.com/apache/spark/pull/38438#discussion_r1010349649


##
sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java:
##
@@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() {
   createStructField("a", IntegerType, false),
   createStructField("b", createArrayType(IntegerType, false), false)));
 Dataset df = spark.createDataFrame(rows, schema);
-Exception e = Assert.assertThrows(Exception.class,
+AnalysisException e = Assert.assertThrows(AnalysisException.class,
   () -> df.filter(df.col("a").isInCollection(Arrays.asList(new 
Column("b");
-Arrays.asList("cannot resolve",
-  "due to data type mismatch: Arguments must be same type but were")
-.forEach(s ->
-  Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT)
-.contains(s.toLowerCase(Locale.ROOT;
+
Assert.assertTrue(e.message().startsWith("[DATATYPE_MISMATCH.DATA_DIFF_TYPES]"));

Review Comment:
   e is an AnalysisException
   https://user-images.githubusercontent.com/15246973/199227396-98442bc1-f5d6-4779-8e09-b21691b7ade5.png;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] eejbyfeldt commented on a diff in pull request #38428: [SPARK-40912][CORE][WIP] Overhead of Exceptions in DeserializationStream

2022-11-01 Thread GitBox


eejbyfeldt commented on code in PR #38428:
URL: https://github.com/apache/spark/pull/38428#discussion_r1010340881


##
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala:
##
@@ -301,15 +300,18 @@ class KryoDeserializationStream(
 
   private[this] var kryo: Kryo = serInstance.borrowKryo()
 
+  final private[this] def hasNext: Boolean = {

Review Comment:
   Yes, will fix.



##
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala:
##
@@ -324,6 +326,36 @@ class KryoDeserializationStream(
   }
 }
   }
+
+  final override def asIterator: Iterator[Any] = new NextIterator[Any] {
+override protected def getNext() = {
+  if (KryoDeserializationStream.this.hasNext) {
+readValue[Any]()
+  } else {
+finished = true
+null
+  }
+}
+
+override protected def close(): Unit = {
+  KryoDeserializationStream.this.close()
+}
+  }
+
+  final override def asKeyValueIterator: Iterator[(Any, Any)] = new 
NextIterator[(Any, Any)] {
+override protected def getNext() = {
+  if (KryoDeserializationStream.this.hasNext) {
+(readKey[Any](), readValue[Any]())

Review Comment:
   You mean that if only a key exist we just ignore it like the current 
implementation would?



##
core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala:
##
@@ -301,15 +300,18 @@ class KryoDeserializationStream(
 
   private[this] var kryo: Kryo = serInstance.borrowKryo()
 
+  final private[this] def hasNext: Boolean = {
+if (input == null) {
+  return false
+}
+
+val eof = input.eof()
+if (eof) close()
+!eof
+  }
+
   override def readObject[T: ClassTag](): T = {
-try {
   kryo.readClassAndObject(input).asInstanceOf[T]
-} catch {
-  // DeserializationStream uses the EOF exception to indicate stopping 
condition.
-  case e: KryoException
-if e.getMessage.toLowerCase(Locale.ROOT).contains("buffer underflow") 
=>
-throw new EOFException
-}

Review Comment:
   Sure will add it back. I think that catching and ignoring the exceptions 
here should be revisited in some other change as it seems to me like it could 
case dataloss that we just assume the exception here means EOF. 



##
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala:
##
@@ -504,44 +505,31 @@ class ExternalAppendOnlyMap[K, V, C](
  * If no more pairs are left, return null.
  */
 private def readNextItem(): (K, C) = {
-  try {
-val k = deserializeStream.readKey().asInstanceOf[K]
-val c = deserializeStream.readValue().asInstanceOf[C]
-val item = (k, c)
-objectsRead += 1
-if (objectsRead == serializerBatchSize) {
-  objectsRead = 0
-  deserializeStream = nextBatchStream()
-}
-item
-  } catch {
-case e: EOFException =>
-  cleanup()
-  null
+  val next = batchIterator.next()
+  objectsRead += 1
+  if (objectsRead == serializerBatchSize) {
+objectsRead = 0
+batchIterator = nextBatchIterator()
   }
+  next
 }
 
 override def hasNext: Boolean = {
-  if (nextItem == null) {
-if (deserializeStream == null) {
-  // In case of deserializeStream has not been initialized
-  deserializeStream = nextBatchStream()
-  if (deserializeStream == null) {
-return false
-  }
+  if (batchIterator == null) {
+// In case of batchIterator has not been initialized
+batchIterator = nextBatchIterator()
+if (batchIterator == null) {
+  return false
 }
-nextItem = readNextItem()
   }
-  nextItem != null
+  batchIterator.hasNext
 }
 
 override def next(): (K, C) = {
-  if (!hasNext) {
+  if (batchIterator == null) {

Review Comment:
   In that case it will call next on the empty iterator and we should still 
throw a `NoSuchElementException`. But `!hasNext` should also have that behavior 
so can change back to that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng opened a new pull request, #38468: [WIP][CONNECT][PYTHON] Arrow-based collect

2022-11-01 Thread GitBox


zhengruifeng opened a new pull request, #38468:
URL: https://github.com/apache/spark/pull/38468

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cxzl25 opened a new pull request, #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work

2022-11-01 Thread GitBox


cxzl25 opened a new pull request, #38467:
URL: https://github.com/apache/spark/pull/38467

   ### What changes were proposed in this pull request?
   Avoid creating a directory when deleting a block.
   
   ### Why are the changes needed?
   When the driver submits a job, DAGScheduler calls 
sc.broadcast(taskBinaryBytes).
   TorrentBroadcast#writeBlocks may fail due to disk problems during 
blockManager#putBytes.
   BlockManager#doPut calls BlockManager#removeBlockInternal to clean up the 
block.
   BlockManager#removeBlockInternal calls DiskStore#remove to clean up blocks 
on disk.
   DiskStore#remove will try to create the directory because the directory does 
not exist, and an exception will be thrown at this time.
   BlockInfoManager#blockInfoWrappers block info and lock not removed.
   The catch block in TorrentBroadcast#writeBlocks will call 
blockManager.removeBroadcast to clean up the broadcast.
   Because the block lock in BlockInfoManager#blockInfoWrappers is not 
released, the dag-scheduler-event-loop thread of DAGScheduler will wait forever.
   
   ```
   22/11/01 18:27:48 WARN BlockManager: Putting block broadcast_0_piece0 failed 
due to exception java.io.IOException: X.
   22/11/01 18:27:48 ERROR TorrentBroadcast: Store broadcast broadcast_0 fail, 
remove all pieces of the broadcast 
   ```
   
   ```
   "dag-scheduler-event-loop" #54 daemon prio=5 os_prio=31 
tid=0x7fc98e3fa800 nid=0x7203 waiting on condition [0x78c1e000]
      java.lang.Thread.State: WAITING (parking)
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for  <0x0007add3d8c8> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
       at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
       at 
org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1(BlockInfoManager.scala:221)
       at 
org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1$adapted(BlockInfoManager.scala:214)
       at 
org.apache.spark.storage.BlockInfoManager$$Lambda$3038/1307533457.apply(Unknown 
Source)
       at 
org.apache.spark.storage.BlockInfoWrapper.withLock(BlockInfoManager.scala:105)
       at 
org.apache.spark.storage.BlockInfoManager.acquireLock(BlockInfoManager.scala:214)
       at 
org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:293)
       at 
org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1979)
       at 
org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3(BlockManager.scala:1970)
       at 
org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3$adapted(BlockManager.scala:1970)
       at 
org.apache.spark.storage.BlockManager$$Lambda$3092/1241801156.apply(Unknown 
Source)
       at scala.collection.Iterator.foreach(Iterator.scala:943)
       at scala.collection.Iterator.foreach$(Iterator.scala:943)
       at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
       at 
org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1970)
       at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:179)
       at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:99)
       at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:38)
       at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:78)
       at 
org.apache.spark.SparkContext.broadcastInternal(SparkContext.scala:1538)
       at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1520)
       at 
org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1539)
       at 
org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1355)
       at 
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1297)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2929)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2921)
       at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2910)
       at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) 
   ```
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Throw an exception before `Files.createDirectory` to simulate disk problems.
   
   DiskBlockManager#getFile
   ```java
   if (filename.contains("piece")) {
 throw new java.io.IOException("disk issue")
   }
   Files.createDirectory(path)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For 

[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


panbingkun commented on code in PR #38463:
URL: https://github.com/apache/spark/pull/38463#discussion_r1010317628


##
core/src/main/resources/error/error-classes.json:
##
@@ -155,6 +160,21 @@
   "To convert values from  to , you can use the 
functions  instead."
 ]
   },
+  "CREATE_MAP_KEY_DIFF_TYPES" : {

Review Comment:
   https://user-images.githubusercontent.com/15246973/199219681-f7089110-7701-42e5-82df-d772f9ed4304.png;>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request, #38466: [WIP][SPARK-40986][SQL] Using distinct to reduce the data size for bloom filter

2022-11-01 Thread GitBox


beliefer opened a new pull request, #38466:
URL: https://github.com/apache/spark/pull/38466

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox


dongjoon-hyun commented on PR #38464:
URL: https://github.com/apache/spark/pull/38464#issuecomment-1298287656

   Thank you for pinging me, @wangyum .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35

2022-11-01 Thread GitBox


LuciferYang opened a new pull request, #38465:
URL: https://github.com/apache/spark/pull/38465

   ### What changes were proposed in this pull request?
   This pr aims upgrade RoaringBitmap 0.9.35
   
   
   ### Why are the changes needed?
   This version bring some bug fix:
   
   - https://github.com/RoaringBitmap/RoaringBitmap/pull/587
   - https://github.com/RoaringBitmap/RoaringBitmap/issues/588
   
   other changes as follows:
   
   https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.32...0.9.35
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Pass GitHub Actions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox


wangyum commented on PR #38464:
URL: https://github.com/apache/spark/pull/38464#issuecomment-1298235836

   cc @cloud-fan @sigmod @aokolnychyi @dongjoon-hyun @huaxingao @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request, #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning

2022-11-01 Thread GitBox


wangyum opened a new pull request, #38464:
URL: https://github.com/apache/spark/pull/38464

   ### What changes were proposed in this pull request?
   
   This PR enhances DPP to use bloom filters if 
`spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled 
and build plan can't build broadcast by size and can reuse the existing shuffle 
exchanges.
   
   ### Why are the changes needed?
   
   Avoid job fail if 
`spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled:
   ```sql
   select catalog_sales.* from  catalog_sales join catalog_returns  where 
cr_order_number = cs_sold_date_sk and cr_returned_time_sk < 4;
   ```
   ```
   20/08/16 06:44:42 ERROR TaskSetManager: Total size of serialized results of 
494 tasks (1225.3 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB)
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client

2022-11-01 Thread GitBox


zhengruifeng commented on PR #38460:
URL: https://github.com/apache/spark/pull/38460#issuecomment-1298229126

   merged into master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client

2022-11-01 Thread GitBox


zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support 
session.range in Python client
URL: https://github.com/apache/spark/pull/38460


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite

2022-11-01 Thread GitBox


MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error 
classes in DataSourceV2SQLSuite
URL: https://github.com/apache/spark/pull/38439


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite

2022-11-01 Thread GitBox


MaxGekk commented on PR #38439:
URL: https://github.com/apache/spark/pull/38439#issuecomment-1298219391

   +1, LGTM. Merging to master. I ran the test locally.
   Thank you, @panbingkun.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


LuciferYang commented on code in PR #38463:
URL: https://github.com/apache/spark/pull/38463#discussion_r1010207329


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:
##
@@ -444,17 +460,32 @@ case class CreateNamedStruct(children: Seq[Expression]) 
extends Expression with
 
   override def checkInputDataTypes(): TypeCheckResult = {
 if (children.size % 2 != 0) {
-  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an even number of 
arguments.")
+  DataTypeMismatch(
+errorSubClass = "WRONG_NUM_ARGS",
+messageParameters = Map(
+  "functionName" -> toSQLId(prettyName),
+  "expectedNum" -> "2n (n > 0)",
+  "actualNum" -> children.length.toString
+)
+  )
 } else {
   val invalidNames = nameExprs.filterNot(e => e.foldable && e.dataType == 
StringType)
   if (invalidNames.nonEmpty) {
-TypeCheckResult.TypeCheckFailure(
-  s"Only foldable ${StringType.catalogString} expressions are allowed 
to appear at odd" +
-s" position, got: ${invalidNames.mkString(",")}")
+DataTypeMismatch(
+  errorSubClass = "CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING",

Review Comment:
   Can `NON_FOLDABLE_INPUT` be reused?
   




##
core/src/main/resources/error/error-classes.json:
##
@@ -155,6 +160,21 @@
   "To convert values from  to , you can use the 
functions  instead."
 ]
   },
+  "CREATE_MAP_KEY_DIFF_TYPES" : {

Review Comment:
   Cloud we make `DATA_DIFF_TYPES ` more general and reuse it?



##
core/src/main/resources/error/error-classes.json:
##
@@ -155,6 +160,21 @@
   "To convert values from  to , you can use the 
functions  instead."
 ]
   },
+  "CREATE_MAP_KEY_DIFF_TYPES" : {
+"message" : [
+  "The given keys of function  should all be the same 
type, but they are ."
+]
+  },
+  "CREATE_MAP_VALUE_DIFF_TYPES" : {

Review Comment:
   ditto



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala:
##
@@ -668,10 +699,19 @@ case class UpdateFields(structExpr: Expression, fieldOps: 
Seq[StructFieldsOperat
   override def checkInputDataTypes(): TypeCheckResult = {
 val dataType = structExpr.dataType
 if (!dataType.isInstanceOf[StructType]) {
-  TypeCheckResult.TypeCheckFailure("struct argument should be struct type, 
got: " +
-dataType.catalogString)
+  DataTypeMismatch(
+errorSubClass = "UNEXPECTED_INPUT_TYPE",
+messageParameters = Map(
+  "paramIndex" -> "1",
+  "requiredType" -> toSQLType(StructType),
+  "inputSql" -> toSQLExpr(structExpr),
+  "inputType" -> toSQLType(structExpr.dataType))
+  )
 } else if (newExprs.isEmpty) {
-  TypeCheckResult.TypeCheckFailure("cannot drop all fields in struct")
+  DataTypeMismatch(

Review Comment:
   It seems that the original message did not clearly describe this error, the 
message doesn't look like `DataTypeMismatch`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox


LuciferYang commented on PR #38457:
URL: https://github.com/apache/spark/pull/38457#issuecomment-1298209124

   Thanks @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox


MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check 
failures of NthValue and NTile onto error classes
URL: https://github.com/apache/spark/pull/38457


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes

2022-11-01 Thread GitBox


MaxGekk commented on PR #38457:
URL: https://github.com/apache/spark/pull/38457#issuecomment-1298207749

   +1, LGTM. Merging to master.
   Thank you, @LuciferYang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox


itholic commented on code in PR #38170:
URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
   }
 
   def noSuchElementExceptionError(key: String): Throwable = {
-new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2221",
-  messageParameters = Map(
-"key" -> key),
-  cause = null)
+new NoSuchElementException(key)

Review Comment:
   Seems like we cannot just simply treat `key` for `NoSuchElementException` as 
same as message in error class.
   
   I think maybe we need to introduce new Exception something like 
`SparkNoSuchElementException`.
   
   Let me just revert for now, and will open a PR for new Exception, and cover 
the missing cases here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox


itholic commented on code in PR #38170:
URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##
@@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends 
QueryErrorsBase {
   }
 
   def noSuchElementExceptionError(key: String): Throwable = {
-new SparkException(
-  errorClass = "_LEGACY_ERROR_TEMP_2221",
-  messageParameters = Map(
-"key" -> key),
-  cause = null)
+new NoSuchElementException(key)

Review Comment:
   Seems like we cannot just simply treat `key` for `NoSuchElementException` as 
same as message in error class.
   
   I think maybe we need to new Exception something like 
`SparkNoSuchElementException`.
   
   Let me just revert for now, and will open a PR for new Exception, and cover 
the missing cases here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] panbingkun opened a new pull request, #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes

2022-11-01 Thread GitBox


panbingkun opened a new pull request, #38463:
URL: https://github.com/apache/spark/pull/38463

   ### What changes were proposed in this pull request?
   This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the 
complex type creator expressions, includes:
   
   1. CreateMap (3): 
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214
   2. CreateNamedStruct (3): 
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457
   3. UpdateFields (2): 
https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673
   
   ### Why are the changes needed?
   Migration onto error classes unifies Spark SQL error messages.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. The PR changes user-facing error messages.
   
   ### How was this patch tested?
   1. Add new UT
   2. Update existed UT
   3. Pass GA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-11-01 Thread GitBox


cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM 
should keep same exit code with client side
URL: https://github.com/apache/spark/pull/35594


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-11-01 Thread GitBox


cloud-fan commented on PR #35594:
URL: https://github.com/apache/spark/pull/35594#issuecomment-1298126483

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] grundprinzip opened a new pull request, #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect

2022-11-01 Thread GitBox


grundprinzip opened a new pull request, #38462:
URL: https://github.com/apache/spark/pull/38462

   ### What changes were proposed in this pull request?
   
   This PR implements the client-side serialization of most Python literals 
into Spark Connect literals. 
   
   ### Why are the changes needed?
   Expanding the Python client support.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   UT


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer opened a new pull request, #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries

2022-11-01 Thread GitBox


beliefer opened a new pull request, #38461:
URL: https://github.com/apache/spark/pull/38461

   ### What changes were proposed in this pull request?
   Recently, I read the `MergeScalarSubqueries` because it is a feature used 
for improve performance.
   I fount the parameters of ScalarSubqueryReference is hard to understand, so 
I want add some comments on it.
   
   Additionally, the private method `supportedAggregateMerge` of 
`MergeScalarSubqueries` looks redundant, this PR wants simplify the code.
   
   
   ### Why are the changes needed?
   Improve the readability and simplify the code for `MergeScalarSubqueries`.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   Just improve the readability and simplify the code for 
`MergeScalarSubqueries`.
   
   
   ### How was this patch tested?
   Exists tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38438:
URL: https://github.com/apache/spark/pull/38438#discussion_r1010145931


##
sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java:
##
@@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() {
   createStructField("a", IntegerType, false),
   createStructField("b", createArrayType(IntegerType, false), false)));
 Dataset df = spark.createDataFrame(rows, schema);
-Exception e = Assert.assertThrows(Exception.class,
+AnalysisException e = Assert.assertThrows(AnalysisException.class,
   () -> df.filter(df.col("a").isInCollection(Arrays.asList(new 
Column("b");
-Arrays.asList("cannot resolve",
-  "due to data type mismatch: Arguments must be same type but were")
-.forEach(s ->
-  Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT)
-.contains(s.toLowerCase(Locale.ROOT;
+
Assert.assertTrue(e.message().startsWith("[DATATYPE_MISMATCH.DATA_DIFF_TYPES]"));

Review Comment:
   e is an `AnalysisException`, correct? Can't you just check its `errorClass`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38447:
URL: https://github.com/apache/spark/pull/38447#discussion_r1010126072


##
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala:
##
@@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends 
QueryErrorsBase {
   }
 
   def unclosedBracketedCommentError(command: String, position: Origin): 
Throwable = {
-new ParseException(Some(command), "Unclosed bracketed comment", position, 
position,
-  Some("_LEGACY_ERROR_TEMP_0055"))
+new ParseException(
+  Some(command),
+  "Found an unclosed bracketed comment. Please, append */ at the end of 
the comment.",

Review Comment:
   Please, don't duplicate the error message here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225

2022-11-01 Thread GitBox


MaxGekk commented on PR #38170:
URL: https://github.com/apache/spark/pull/38170#issuecomment-1298094912

   @itholic Can you fix GAs? Some test failures are related to your changes:
   ```
   2022-10-30 23:01:03.735 - stderr> org.apache.spark.SparkException: 
spark.sql.catalog.default
   2022-10-30 23:01:03.735 - stderr>at 
org.apache.spark.sql.errors.QueryExecutionErrors$.noSuchElementExceptionError(QueryExecutionErrors.scala:2178)
   2022-10-30 23:01:03.736 - stderr>at 
org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$3(SQLConf.scala:4839)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class 
`_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
URL: https://github.com/apache/spark/pull/38448


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk commented on PR #38448:
URL: https://github.com/apache/spark/pull/38448#issuecomment-1298090370

   Merging to master. Thank you, @cloud-fan @LuciferYang @itholic @srielau for 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   It depends on implementation, and we might support more in the future. 
Having the list in the source code can give the following benefits:
   1. The list of parameters might be formatted in different ways ( as a 
foldable or drop-down list) by frontend tools. 
   2. The error classes can be re-used from another places with different list.
   3. More likely, devs will not forget to update the list in source code.
   4. If the list will be in the JSON file, tech editors might forget to edit 
it or accidentally modify it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   It depends on implementation, and we might support more in the future. 
Having the list in the source code can give the following benefits:
   1. The list of parameters might be formatted in different ways ( as a 
foldable or drop-down list) by frontend tools. 
   2. More likely, devs will not forget to update the list in source code.
   3. If the list will be in the JSON file, tech editors might forget to edit 
it or accidentally modify it.
   4. The error classes can be re-used from another places with different list.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


LuciferYang commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010108242


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   I think it should be a parameter, unless `UNSUPPORTED_TYPED_LITERAL` will 
only be used in this scenario
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   It depends on implementation, and we might support more in the future. 
Having the list in the source code can give the following benefits:
   1. The list of parameters might be formatted in different ways ( as a 
foldable or drop-down list) by frontend tools. 
   2. More likely, devs will not forget to update the list in source code.
   3. If the list will be in the JSON file, tech editors might forget to edit 
it or accidentally modify it.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


MaxGekk commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   It depends on implementation, and we might support more in the future. 
Having the list in the source code can give the following benefits:
   1. More likely, devs will not forget to update the list in source code.
   2. If the list will be in the JSON file, tech editors might forget to edit 
it or accidentally modify it.
   3. The list of parameters might be formatted in different ways ( as a 
foldable or drop-down list) by frontend tools. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.

2022-11-01 Thread GitBox


HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the 
DataFrame.corrwith test after fixing in future pandas.
URL: https://github.com/apache/spark/pull/38455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.

2022-11-01 Thread GitBox


HyukjinKwon commented on PR #38455:
URL: https://github.com/apache/spark/pull/38455#issuecomment-1298081042

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`

2022-11-01 Thread GitBox


cloud-fan commented on code in PR #38448:
URL: https://github.com/apache/spark/pull/38448#discussion_r1010103819


##
sql/core/src/test/resources/sql-tests/results/literals.sql.out:
##
@@ -442,9 +442,10 @@ struct<>
 -- !query output
 org.apache.spark.sql.catalyst.parser.ParseException
 {
-  "errorClass" : "_LEGACY_ERROR_TEMP_0021",
+  "errorClass" : "UNSUPPORTED_TYPED_LITERAL",
   "messageParameters" : {
-"valueType" : "GEO"
+"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", 
\"TIMESTAMP\", \"INTERVAL\", \"X\"",

Review Comment:
   isn't it a constant list? does it need to be a parameter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'

2022-11-01 Thread GitBox


cloud-fan commented on PR #38263:
URL: https://github.com/apache/spark/pull/38263#issuecomment-1298074997

   I'd like to reach a consensus on 
https://github.com/apache/spark/pull/38263#discussion_r1009055117 before moving 
forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on a diff in pull request #38347: [SPARK-40883][CONNECT] Support Range in Connect proto

2022-11-01 Thread GitBox


amaliujia commented on code in PR #38347:
URL: https://github.com/apache/spark/pull/38347#discussion_r1010097145


##
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##
@@ -217,3 +218,23 @@ message Sample {
 int64 seed = 1;
   }
 }
+
+// Relation of type [[Range]] that generates a sequence of integers.
+message Range {
+  // Optional. Default value = 0
+  int32 start = 1;
+  int32 end = 2;
+  // Optional. Default value = 1
+  Step step = 3;

Review Comment:
   Updating in https://github.com/apache/spark/pull/38460.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] amaliujia commented on pull request #38459: [SPARK-40980][CONNECT][DSL] Support session.sql in Connect DSL

2022-11-01 Thread GitBox


amaliujia commented on PR #38459:
URL: https://github.com/apache/spark/pull/38459#issuecomment-1298066340

   R: @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2