[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size
gaoyajun02 commented on PR #38333: URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168 We have now located the cause of zero-size chunk loss, We have located the cause of the zero-size chunk problem on the shuffle service node. and there is the following information in the system `dmesg -T`: ``` e Nov 1 19:40:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 25755946 at logical offset 0 with max blocks 15 with error 117 [Tue Nov 1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117 [Tue Nov 1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117 ``` Although this is not from the software layer, and the number of bad nodes that lose data is very low, I think it makes sense to support fallback here. cc @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries
beliefer commented on PR #38461: URL: https://github.com/apache/spark/pull/38461#issuecomment-1298447704 ping @peter-toth cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec
wangyum commented on PR #38458: URL: https://github.com/apache/spark/pull/38458#issuecomment-1298446195 Merged to master, branch-3.3 and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec
wangyum closed pull request #38458: [SPARK-40983][DOC] Remove Hadoop requirements for zstd mentioned in Parquet compression codec URL: https://github.com/apache/spark/pull/38458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38454: [SPARK-40978][SQL] Migrate `failAnalysis()` w/o a context onto error classes
MaxGekk commented on PR #38454: URL: https://github.com/apache/spark/pull/38454#issuecomment-1298428612 @cloud-fan @srielau @itholic @LuciferYang @panbingkun Could you review this PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] EnricoMi commented on pull request #38356: [SPARK-40885] `Sort` may not take effect when it is the last 'Transform' operator
EnricoMi commented on PR #38356: URL: https://github.com/apache/spark/pull/38356#issuecomment-1298425304 @allisonwang-db can you elaborate on mapping `write.requiredOrdering` to the projected columns that you introduced in f98f9f8566243a8a01edcaad3b847bbd2f52305b? Was that existing code moved into `V1Writes`, or was that logic added. https://github.com/apache/spark/blob/ff66adda3ca3762b8c71b14acbd6da00c5508a2e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/V1Writes.scala#L81-L84 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL
zhengruifeng commented on PR #38459: URL: https://github.com/apache/spark/pull/38459#issuecomment-1298422766 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL
zhengruifeng closed pull request #38459: [SPARK-40980][CONNECT][TEST] Support session.sql in Connect DSL URL: https://github.com/apache/spark/pull/38459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010356813 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -444,17 +460,32 @@ case class CreateNamedStruct(children: Seq[Expression]) extends Expression with override def checkInputDataTypes(): TypeCheckResult = { if (children.size % 2 != 0) { - TypeCheckResult.TypeCheckFailure(s"$prettyName expects an even number of arguments.") + DataTypeMismatch( +errorSubClass = "WRONG_NUM_ARGS", +messageParameters = Map( + "functionName" -> toSQLId(prettyName), + "expectedNum" -> "2n (n > 0)", + "actualNum" -> children.length.toString +) + ) } else { val invalidNames = nameExprs.filterNot(e => e.foldable && e.dataType == StringType) if (invalidNames.nonEmpty) { -TypeCheckResult.TypeCheckFailure( - s"Only foldable ${StringType.catalogString} expressions are allowed to appear at odd" + -s" position, got: ${invalidNames.mkString(",")}") +DataTypeMismatch( + errorSubClass = "CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING", Review Comment: https://user-images.githubusercontent.com/15246973/199229111-f39526ba-1f5e-46b0-b8b5-6abad300674b.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead." ] }, + "CREATE_MAP_KEY_DIFF_TYPES" : { Review Comment: but https://user-images.githubusercontent.com/15246973/199228400-42c1759f-fc16-4205-a052-fc54b1f3d70b.png;> Very hard to reuse it. Maybe `functionName` pass to: `map's keys` or `map's values`, seems unreasonable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010355945 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead." ] }, + "CREATE_MAP_KEY_DIFF_TYPES" : { Review Comment: but https://user-images.githubusercontent.com/15246973/199228400-42c1759f-fc16-4205-a052-fc54b1f3d70b.png;> Very hard to reuse it. Maybe `functionName` pass to: `map's key`, seems unreasonable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes
panbingkun commented on code in PR #38438: URL: https://github.com/apache/spark/pull/38438#discussion_r1010349649 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java: ## @@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() { createStructField("a", IntegerType, false), createStructField("b", createArrayType(IntegerType, false), false))); Dataset df = spark.createDataFrame(rows, schema); -Exception e = Assert.assertThrows(Exception.class, +AnalysisException e = Assert.assertThrows(AnalysisException.class, () -> df.filter(df.col("a").isInCollection(Arrays.asList(new Column("b"); -Arrays.asList("cannot resolve", - "due to data type mismatch: Arguments must be same type but were") -.forEach(s -> - Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT) -.contains(s.toLowerCase(Locale.ROOT; + Assert.assertTrue(e.message().startsWith("[DATATYPE_MISMATCH.DATA_DIFF_TYPES]")); Review Comment: e is an AnalysisException https://user-images.githubusercontent.com/15246973/199227396-98442bc1-f5d6-4779-8e09-b21691b7ade5.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eejbyfeldt commented on a diff in pull request #38428: [SPARK-40912][CORE][WIP] Overhead of Exceptions in DeserializationStream
eejbyfeldt commented on code in PR #38428: URL: https://github.com/apache/spark/pull/38428#discussion_r1010340881 ## core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala: ## @@ -301,15 +300,18 @@ class KryoDeserializationStream( private[this] var kryo: Kryo = serInstance.borrowKryo() + final private[this] def hasNext: Boolean = { Review Comment: Yes, will fix. ## core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala: ## @@ -324,6 +326,36 @@ class KryoDeserializationStream( } } } + + final override def asIterator: Iterator[Any] = new NextIterator[Any] { +override protected def getNext() = { + if (KryoDeserializationStream.this.hasNext) { +readValue[Any]() + } else { +finished = true +null + } +} + +override protected def close(): Unit = { + KryoDeserializationStream.this.close() +} + } + + final override def asKeyValueIterator: Iterator[(Any, Any)] = new NextIterator[(Any, Any)] { +override protected def getNext() = { + if (KryoDeserializationStream.this.hasNext) { +(readKey[Any](), readValue[Any]()) Review Comment: You mean that if only a key exist we just ignore it like the current implementation would? ## core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala: ## @@ -301,15 +300,18 @@ class KryoDeserializationStream( private[this] var kryo: Kryo = serInstance.borrowKryo() + final private[this] def hasNext: Boolean = { +if (input == null) { + return false +} + +val eof = input.eof() +if (eof) close() +!eof + } + override def readObject[T: ClassTag](): T = { -try { kryo.readClassAndObject(input).asInstanceOf[T] -} catch { - // DeserializationStream uses the EOF exception to indicate stopping condition. - case e: KryoException -if e.getMessage.toLowerCase(Locale.ROOT).contains("buffer underflow") => -throw new EOFException -} Review Comment: Sure will add it back. I think that catching and ignoring the exceptions here should be revisited in some other change as it seems to me like it could case dataloss that we just assume the exception here means EOF. ## core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala: ## @@ -504,44 +505,31 @@ class ExternalAppendOnlyMap[K, V, C]( * If no more pairs are left, return null. */ private def readNextItem(): (K, C) = { - try { -val k = deserializeStream.readKey().asInstanceOf[K] -val c = deserializeStream.readValue().asInstanceOf[C] -val item = (k, c) -objectsRead += 1 -if (objectsRead == serializerBatchSize) { - objectsRead = 0 - deserializeStream = nextBatchStream() -} -item - } catch { -case e: EOFException => - cleanup() - null + val next = batchIterator.next() + objectsRead += 1 + if (objectsRead == serializerBatchSize) { +objectsRead = 0 +batchIterator = nextBatchIterator() } + next } override def hasNext: Boolean = { - if (nextItem == null) { -if (deserializeStream == null) { - // In case of deserializeStream has not been initialized - deserializeStream = nextBatchStream() - if (deserializeStream == null) { -return false - } + if (batchIterator == null) { +// In case of batchIterator has not been initialized +batchIterator = nextBatchIterator() +if (batchIterator == null) { + return false } -nextItem = readNextItem() } - nextItem != null + batchIterator.hasNext } override def next(): (K, C) = { - if (!hasNext) { + if (batchIterator == null) { Review Comment: In that case it will call next on the empty iterator and we should still throw a `NoSuchElementException`. But `!hasNext` should also have that behavior so can change back to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request, #38468: [WIP][CONNECT][PYTHON] Arrow-based collect
zhengruifeng opened a new pull request, #38468: URL: https://github.com/apache/spark/pull/38468 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cxzl25 opened a new pull request, #38467: [SPARK-40987][CORE] Avoid creating a directory when deleting a block, causing DAGScheduler to not work
cxzl25 opened a new pull request, #38467: URL: https://github.com/apache/spark/pull/38467 ### What changes were proposed in this pull request? Avoid creating a directory when deleting a block. ### Why are the changes needed? When the driver submits a job, DAGScheduler calls sc.broadcast(taskBinaryBytes). TorrentBroadcast#writeBlocks may fail due to disk problems during blockManager#putBytes. BlockManager#doPut calls BlockManager#removeBlockInternal to clean up the block. BlockManager#removeBlockInternal calls DiskStore#remove to clean up blocks on disk. DiskStore#remove will try to create the directory because the directory does not exist, and an exception will be thrown at this time. BlockInfoManager#blockInfoWrappers block info and lock not removed. The catch block in TorrentBroadcast#writeBlocks will call blockManager.removeBroadcast to clean up the broadcast. Because the block lock in BlockInfoManager#blockInfoWrappers is not released, the dag-scheduler-event-loop thread of DAGScheduler will wait forever. ``` 22/11/01 18:27:48 WARN BlockManager: Putting block broadcast_0_piece0 failed due to exception java.io.IOException: X. 22/11/01 18:27:48 ERROR TorrentBroadcast: Store broadcast broadcast_0 fail, remove all pieces of the broadcast ``` ``` "dag-scheduler-event-loop" #54 daemon prio=5 os_prio=31 tid=0x7fc98e3fa800 nid=0x7203 waiting on condition [0x78c1e000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007add3d8c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1(BlockInfoManager.scala:221) at org.apache.spark.storage.BlockInfoManager.$anonfun$acquireLock$1$adapted(BlockInfoManager.scala:214) at org.apache.spark.storage.BlockInfoManager$$Lambda$3038/1307533457.apply(Unknown Source) at org.apache.spark.storage.BlockInfoWrapper.withLock(BlockInfoManager.scala:105) at org.apache.spark.storage.BlockInfoManager.acquireLock(BlockInfoManager.scala:214) at org.apache.spark.storage.BlockInfoManager.lockForWriting(BlockInfoManager.scala:293) at org.apache.spark.storage.BlockManager.removeBlock(BlockManager.scala:1979) at org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3(BlockManager.scala:1970) at org.apache.spark.storage.BlockManager.$anonfun$removeBroadcast$3$adapted(BlockManager.scala:1970) at org.apache.spark.storage.BlockManager$$Lambda$3092/1241801156.apply(Unknown Source) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at org.apache.spark.storage.BlockManager.removeBroadcast(BlockManager.scala:1970) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:179) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:99) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:38) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:78) at org.apache.spark.SparkContext.broadcastInternal(SparkContext.scala:1538) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1520) at org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1539) at org.apache.spark.scheduler.DAGScheduler.submitStage(DAGScheduler.scala:1355) at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:1297) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2929) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2921) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2910) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Throw an exception before `Files.createDirectory` to simulate disk problems. DiskBlockManager#getFile ```java if (filename.contains("piece")) { throw new java.io.IOException("disk issue") } Files.createDirectory(path) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] [spark] panbingkun commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
panbingkun commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010317628 ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead." ] }, + "CREATE_MAP_KEY_DIFF_TYPES" : { Review Comment: https://user-images.githubusercontent.com/15246973/199219681-f7089110-7701-42e5-82df-d772f9ed4304.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request, #38466: [WIP][SPARK-40986][SQL] Using distinct to reduce the data size for bloom filter
beliefer opened a new pull request, #38466: URL: https://github.com/apache/spark/pull/38466 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning
dongjoon-hyun commented on PR #38464: URL: https://github.com/apache/spark/pull/38464#issuecomment-1298287656 Thank you for pinging me, @wangyum . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang opened a new pull request, #38465: [SPARK-40985][BUILD] Upgrade RoaringBitmap to 0.9.35
LuciferYang opened a new pull request, #38465: URL: https://github.com/apache/spark/pull/38465 ### What changes were proposed in this pull request? This pr aims upgrade RoaringBitmap 0.9.35 ### Why are the changes needed? This version bring some bug fix: - https://github.com/RoaringBitmap/RoaringBitmap/pull/587 - https://github.com/RoaringBitmap/RoaringBitmap/issues/588 other changes as follows: https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.32...0.9.35 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning
wangyum commented on PR #38464: URL: https://github.com/apache/spark/pull/38464#issuecomment-1298235836 cc @cloud-fan @sigmod @aokolnychyi @dongjoon-hyun @huaxingao @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request, #38464: [SPARK-32628][SQL] Use bloom filter to improve dynamic partition pruning
wangyum opened a new pull request, #38464: URL: https://github.com/apache/spark/pull/38464 ### What changes were proposed in this pull request? This PR enhances DPP to use bloom filters if `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled and build plan can't build broadcast by size and can reuse the existing shuffle exchanges. ### Why are the changes needed? Avoid job fail if `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly` is disabled: ```sql select catalog_sales.* from catalog_sales join catalog_returns where cr_order_number = cs_sold_date_sk and cr_returned_time_sk < 4; ``` ``` 20/08/16 06:44:42 ERROR TaskSetManager: Total size of serialized results of 494 tasks (1225.3 MiB) is bigger than spark.driver.maxResultSize (1024.0 MiB) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client
zhengruifeng commented on PR #38460: URL: https://github.com/apache/spark/pull/38460#issuecomment-1298229126 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client
zhengruifeng closed pull request #38460: [SPARK-40981][CONNECT][PYTHON] Support session.range in Python client URL: https://github.com/apache/spark/pull/38460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite
MaxGekk closed pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite URL: https://github.com/apache/spark/pull/38439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38439: [SPARK-40890][SQL][TESTS] Check error classes in DataSourceV2SQLSuite
MaxGekk commented on PR #38439: URL: https://github.com/apache/spark/pull/38439#issuecomment-1298219391 +1, LGTM. Merging to master. I ran the test locally. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
LuciferYang commented on code in PR #38463: URL: https://github.com/apache/spark/pull/38463#discussion_r1010207329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -444,17 +460,32 @@ case class CreateNamedStruct(children: Seq[Expression]) extends Expression with override def checkInputDataTypes(): TypeCheckResult = { if (children.size % 2 != 0) { - TypeCheckResult.TypeCheckFailure(s"$prettyName expects an even number of arguments.") + DataTypeMismatch( +errorSubClass = "WRONG_NUM_ARGS", +messageParameters = Map( + "functionName" -> toSQLId(prettyName), + "expectedNum" -> "2n (n > 0)", + "actualNum" -> children.length.toString +) + ) } else { val invalidNames = nameExprs.filterNot(e => e.foldable && e.dataType == StringType) if (invalidNames.nonEmpty) { -TypeCheckResult.TypeCheckFailure( - s"Only foldable ${StringType.catalogString} expressions are allowed to appear at odd" + -s" position, got: ${invalidNames.mkString(",")}") +DataTypeMismatch( + errorSubClass = "CREATE_NAMED_STRUCT_WITHOUT_FOLDABLE_STRING", Review Comment: Can `NON_FOLDABLE_INPUT` be reused? ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead." ] }, + "CREATE_MAP_KEY_DIFF_TYPES" : { Review Comment: Cloud we make `DATA_DIFF_TYPES ` more general and reuse it? ## core/src/main/resources/error/error-classes.json: ## @@ -155,6 +160,21 @@ "To convert values from to , you can use the functions instead." ] }, + "CREATE_MAP_KEY_DIFF_TYPES" : { +"message" : [ + "The given keys of function should all be the same type, but they are ." +] + }, + "CREATE_MAP_VALUE_DIFF_TYPES" : { Review Comment: ditto ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -668,10 +699,19 @@ case class UpdateFields(structExpr: Expression, fieldOps: Seq[StructFieldsOperat override def checkInputDataTypes(): TypeCheckResult = { val dataType = structExpr.dataType if (!dataType.isInstanceOf[StructType]) { - TypeCheckResult.TypeCheckFailure("struct argument should be struct type, got: " + -dataType.catalogString) + DataTypeMismatch( +errorSubClass = "UNEXPECTED_INPUT_TYPE", +messageParameters = Map( + "paramIndex" -> "1", + "requiredType" -> toSQLType(StructType), + "inputSql" -> toSQLExpr(structExpr), + "inputType" -> toSQLType(structExpr.dataType)) + ) } else if (newExprs.isEmpty) { - TypeCheckResult.TypeCheckFailure("cannot drop all fields in struct") + DataTypeMismatch( Review Comment: It seems that the original message did not clearly describe this error, the message doesn't look like `DataTypeMismatch` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes
LuciferYang commented on PR #38457: URL: https://github.com/apache/spark/pull/38457#issuecomment-1298209124 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes
MaxGekk closed pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes URL: https://github.com/apache/spark/pull/38457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38457: [SPARK-40371][SQL] Migrate type check failures of NthValue and NTile onto error classes
MaxGekk commented on PR #38457: URL: https://github.com/apache/spark/pull/38457#issuecomment-1298207749 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225
itholic commented on code in PR #38170: URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { } def noSuchElementExceptionError(key: String): Throwable = { -new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2221", - messageParameters = Map( -"key" -> key), - cause = null) +new NoSuchElementException(key) Review Comment: Seems like we cannot just simply treat `key` for `NoSuchElementException` as same as message in error class. I think maybe we need to introduce new Exception something like `SparkNoSuchElementException`. Let me just revert for now, and will open a PR for new Exception, and cover the missing cases here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225
itholic commented on code in PR #38170: URL: https://github.com/apache/spark/pull/38170#discussion_r1010170342 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2171,11 +2168,7 @@ private[sql] object QueryExecutionErrors extends QueryErrorsBase { } def noSuchElementExceptionError(key: String): Throwable = { -new SparkException( - errorClass = "_LEGACY_ERROR_TEMP_2221", - messageParameters = Map( -"key" -> key), - cause = null) +new NoSuchElementException(key) Review Comment: Seems like we cannot just simply treat `key` for `NoSuchElementException` as same as message in error class. I think maybe we need to new Exception something like `SparkNoSuchElementException`. Let me just revert for now, and will open a PR for new Exception, and cover the missing cases here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun opened a new pull request, #38463: [SPARK-40374][SQL] Migrate type check failures of type creators onto error classes
panbingkun opened a new pull request, #38463: URL: https://github.com/apache/spark/pull/38463 ### What changes were proposed in this pull request? This pr replaces TypeCheckFailure by DataTypeMismatch in type checks in the complex type creator expressions, includes: 1. CreateMap (3): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L205-L214 2. CreateNamedStruct (3): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L445-L457 3. UpdateFields (2): https://github.com/apache/spark/blob/1431975723d8df30a25b2333eddcfd0bb6c57677/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala#L670-L673 ### Why are the changes needed? Migration onto error classes unifies Spark SQL error messages. ### Does this PR introduce _any_ user-facing change? Yes. The PR changes user-facing error messages. ### How was this patch tested? 1. Add new UT 2. Update existed UT 3. Pass GA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side
cloud-fan closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side URL: https://github.com/apache/spark/pull/35594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side
cloud-fan commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1298126483 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] grundprinzip opened a new pull request, #38462: [SPARK-40533] [CONNECT] [PYTHON] Support most built-in literal types for Python in Spark Connect
grundprinzip opened a new pull request, #38462: URL: https://github.com/apache/spark/pull/38462 ### What changes were proposed in this pull request? This PR implements the client-side serialization of most Python literals into Spark Connect literals. ### Why are the changes needed? Expanding the Python client support. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request, #38461: [SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries
beliefer opened a new pull request, #38461: URL: https://github.com/apache/spark/pull/38461 ### What changes were proposed in this pull request? Recently, I read the `MergeScalarSubqueries` because it is a feature used for improve performance. I fount the parameters of ScalarSubqueryReference is hard to understand, so I want add some comments on it. Additionally, the private method `supportedAggregateMerge` of `MergeScalarSubqueries` looks redundant, this PR wants simplify the code. ### Why are the changes needed? Improve the readability and simplify the code for `MergeScalarSubqueries`. ### Does this PR introduce _any_ user-facing change? 'No'. Just improve the readability and simplify the code for `MergeScalarSubqueries`. ### How was this patch tested? Exists tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38438: [SPARK-40748][SQL] Migrate type check failures of conditions onto error classes
MaxGekk commented on code in PR #38438: URL: https://github.com/apache/spark/pull/38438#discussion_r1010145931 ## sql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java: ## @@ -79,12 +80,8 @@ public void isInCollectionCheckExceptionMessage() { createStructField("a", IntegerType, false), createStructField("b", createArrayType(IntegerType, false), false))); Dataset df = spark.createDataFrame(rows, schema); -Exception e = Assert.assertThrows(Exception.class, +AnalysisException e = Assert.assertThrows(AnalysisException.class, () -> df.filter(df.col("a").isInCollection(Arrays.asList(new Column("b"); -Arrays.asList("cannot resolve", - "due to data type mismatch: Arguments must be same type but were") -.forEach(s -> - Assert.assertTrue(e.getMessage().toLowerCase(Locale.ROOT) -.contains(s.toLowerCase(Locale.ROOT; + Assert.assertTrue(e.message().startsWith("[DATATYPE_MISMATCH.DATA_DIFF_TYPES]")); Review Comment: e is an `AnalysisException`, correct? Can't you just check its `errorClass`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38447: [SPARK-40973][SQL] Rename `_LEGACY_ERROR_TEMP_0055` to `UNCLOSED_BRACKETED_COMMENT`
MaxGekk commented on code in PR #38447: URL: https://github.com/apache/spark/pull/38447#discussion_r1010126072 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -608,8 +608,12 @@ private[sql] object QueryParsingErrors extends QueryErrorsBase { } def unclosedBracketedCommentError(command: String, position: Origin): Throwable = { -new ParseException(Some(command), "Unclosed bracketed comment", position, position, - Some("_LEGACY_ERROR_TEMP_0055")) +new ParseException( + Some(command), + "Found an unclosed bracketed comment. Please, append */ at the end of the comment.", Review Comment: Please, don't duplicate the error message here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38170: [WIP][SPARK-40663][SQL] Migrate execution errors onto error classes: _LEGACY_ERROR_TEMP_2201-2225
MaxGekk commented on PR #38170: URL: https://github.com/apache/spark/pull/38170#issuecomment-1298094912 @itholic Can you fix GAs? Some test failures are related to your changes: ``` 2022-10-30 23:01:03.735 - stderr> org.apache.spark.SparkException: spark.sql.catalog.default 2022-10-30 23:01:03.735 - stderr>at org.apache.spark.sql.errors.QueryExecutionErrors$.noSuchElementExceptionError(QueryExecutionErrors.scala:2178) 2022-10-30 23:01:03.736 - stderr>at org.apache.spark.sql.internal.SQLConf.$anonfun$getConfString$3(SQLConf.scala:4839) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk closed pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL` URL: https://github.com/apache/spark/pull/38448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk commented on PR #38448: URL: https://github.com/apache/spark/pull/38448#issuecomment-1298090370 Merging to master. Thank you, @cloud-fan @LuciferYang @itholic @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: It depends on implementation, and we might support more in the future. Having the list in the source code can give the following benefits: 1. The list of parameters might be formatted in different ways ( as a foldable or drop-down list) by frontend tools. 2. The error classes can be re-used from another places with different list. 3. More likely, devs will not forget to update the list in source code. 4. If the list will be in the JSON file, tech editors might forget to edit it or accidentally modify it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: It depends on implementation, and we might support more in the future. Having the list in the source code can give the following benefits: 1. The list of parameters might be formatted in different ways ( as a foldable or drop-down list) by frontend tools. 2. More likely, devs will not forget to update the list in source code. 3. If the list will be in the JSON file, tech editors might forget to edit it or accidentally modify it. 4. The error classes can be re-used from another places with different list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
LuciferYang commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108242 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: I think it should be a parameter, unless `UNSUPPORTED_TYPED_LITERAL` will only be used in this scenario -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: It depends on implementation, and we might support more in the future. Having the list in the source code can give the following benefits: 1. The list of parameters might be formatted in different ways ( as a foldable or drop-down list) by frontend tools. 2. More likely, devs will not forget to update the list in source code. 3. If the list will be in the JSON file, tech editors might forget to edit it or accidentally modify it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
MaxGekk commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010108090 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: It depends on implementation, and we might support more in the future. Having the list in the source code can give the following benefits: 1. More likely, devs will not forget to update the list in source code. 2. If the list will be in the JSON file, tech editors might forget to edit it or accidentally modify it. 3. The list of parameters might be formatted in different ways ( as a foldable or drop-down list) by frontend tools. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.
HyukjinKwon closed pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas. URL: https://github.com/apache/spark/pull/38455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #38455: [SPARK-40827][PS][TESTS] Re-enable the DataFrame.corrwith test after fixing in future pandas.
HyukjinKwon commented on PR #38455: URL: https://github.com/apache/spark/pull/38455#issuecomment-1298081042 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #38448: [SPARK-40975][SQL] Rename the error class `_LEGACY_ERROR_TEMP_0021` to `UNSUPPORTED_TYPED_LITERAL`
cloud-fan commented on code in PR #38448: URL: https://github.com/apache/spark/pull/38448#discussion_r1010103819 ## sql/core/src/test/resources/sql-tests/results/literals.sql.out: ## @@ -442,9 +442,10 @@ struct<> -- !query output org.apache.spark.sql.catalyst.parser.ParseException { - "errorClass" : "_LEGACY_ERROR_TEMP_0021", + "errorClass" : "UNSUPPORTED_TYPED_LITERAL", "messageParameters" : { -"valueType" : "GEO" +"supportedTypes" : "\"DATE\", \"TIMESTAMP_NTZ\", \"TIMESTAMP_LTZ\", \"TIMESTAMP\", \"INTERVAL\", \"X\"", Review Comment: isn't it a constant list? does it need to be a parameter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'
cloud-fan commented on PR #38263: URL: https://github.com/apache/spark/pull/38263#issuecomment-1298074997 I'd like to reach a consensus on https://github.com/apache/spark/pull/38263#discussion_r1009055117 before moving forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on a diff in pull request #38347: [SPARK-40883][CONNECT] Support Range in Connect proto
amaliujia commented on code in PR #38347: URL: https://github.com/apache/spark/pull/38347#discussion_r1010097145 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -217,3 +218,23 @@ message Sample { int64 seed = 1; } } + +// Relation of type [[Range]] that generates a sequence of integers. +message Range { + // Optional. Default value = 0 + int32 start = 1; + int32 end = 2; + // Optional. Default value = 1 + Step step = 3; Review Comment: Updating in https://github.com/apache/spark/pull/38460. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amaliujia commented on pull request #38459: [SPARK-40980][CONNECT][DSL] Support session.sql in Connect DSL
amaliujia commented on PR #38459: URL: https://github.com/apache/spark/pull/38459#issuecomment-1298066340 R: @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org