[GitHub] [spark] SparkQA commented on pull request #29844: [SPARK-27872][K8s] Fix executor service account inconsistency for branch-2.4
SparkQA commented on pull request #29844: URL: https://github.com/apache/spark/pull/29844#issuecomment-698424261 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29863: [SPARK-32877][SQL][TEST] Add test for Hive UDF complex decimal type
SparkQA commented on pull request #29863: URL: https://github.com/apache/spark/pull/29863#issuecomment-698261937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #29677: [SPARK-32820][SQL] Remove redundant shuffle exchanges inserted by EnsureRequirements
sarutak commented on pull request #29677: URL: https://github.com/apache/spark/pull/29677#issuecomment-698678194 @c21 @imback82 @maropu @HyukjinKwon Any other feedback for this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29806: [SPARK-32187][PYTHON][DOCS] Doc on Python packaging
AmplabJenkins commented on pull request #29806: URL: https://github.com/apache/spark/pull/29806#issuecomment-698110509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on a change in pull request #29817: [SPARK-32850][CORE][K8S] Simplify the RPC message flow of decommission
holdenk commented on a change in pull request #29817: URL: https://github.com/apache/spark/pull/29817#discussion_r494434701 ## File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala ## @@ -166,17 +171,6 @@ private[spark] class CoarseGrainedExecutorBackend( if (executor == null) { exitExecutor(1, "Received LaunchTask command but executor was null") } else { -if (decommissioned) { - val msg = "Asked to launch a task while decommissioned." - logError(msg) - driver match { -case Some(endpoint) => - logInfo("Sending DecommissionExecutor to driver.") - endpoint.send(DecommissionExecutor(executorId, ExecutorDecommissionInfo(msg))) -case _ => - logError("No registered driver to send Decommission to.") - } -} Review comment: Right, so we should resend the notice then right? ## File path: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala ## @@ -213,9 +207,17 @@ private[spark] class CoarseGrainedExecutorBackend( logInfo(s"Received tokens of ${tokenBytes.length} bytes") SparkHadoopUtil.get.addDelegationTokens(tokenBytes, env.conf) -case DecommissionSelf => - logInfo("Received decommission self") +case DecommissionExecutor => decommissionSelf() + +case ExecutorSigPWRReceived => + decommissionSelf() + if (driver.nonEmpty) { Review comment: So we don’t ask the driver to stop scheduling jobs on us first, and the driver could ask us to run a job while we are part way through decommissioning. This won’t result in a failure because well accept the job but it will slow down the decommissioning. So swap the order of these two. ## File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala ## @@ -1809,7 +1809,9 @@ private[spark] class BlockManager( blocksToRemove.size } - def decommissionBlockManager(): Unit = synchronized { + def decommissionBlockManager(): Unit = storageEndpoint.ask(DecommissionBlockManager) Review comment: Why did you make this change? ## File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ## @@ -70,7 +70,10 @@ private[deploy] class Worker( if (conf.get(config.DECOMMISSION_ENABLED)) { logInfo("Registering SIGPWR handler to trigger decommissioning.") SignalUtils.register("PWR", "Failed to register SIGPWR handler - " + - "disabling worker decommission feature.")(decommissionSelf) + "disabling worker decommission feature.") { + self.send(WorkerSigPWRReceived) Review comment: Can you look into what the difference of this behavior might cause at the system level and then tell me if that’s a desired change? I’m ok with us making changes here, I just want us to be intentional and know if we need to test the change and it seems like this change was incidental. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
HeartSaVioR commented on a change in pull request #28841: URL: https://github.com/apache/spark/pull/28841#discussion_r494268346 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -467,6 +467,12 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `pathGlobFilter`: an optional glob pattern to only include files with paths matching * the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. * It does not change the behavior of partition discovery. + * `modifiedBefore`: an optional timestamp to only include files with + * modification times occurring before the specified Time. The provided timestamp + * must be in the following form: -MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00) + * `modifiedAfter`: an optional timestamp to only include files with Review comment: ditto ## File path: python/pyspark/sql/readwriter.py ## @@ -184,7 +196,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, mode=None, columnNameOfCorruptRecord=None, dateFormat=None, timestampFormat=None, multiLine=None, allowUnquotedControlChars=None, lineSep=None, samplingRatio=None, dropFieldIfAllNull=None, encoding=None, locale=None, pathGlobFilter=None, - recursiveFileLookup=None, allowNonNumericNumbers=None): + recursiveFileLookup=None, modifiedBefore=None, modifiedAfter=None, Review comment: Probably better not to change the order. I think such huge number of parameters end users will use named parameter almost every time, but just to be sure. ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -752,6 +764,12 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `pathGlobFilter`: an optional glob pattern to only include files with paths matching * the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. * It does not change the behavior of partition discovery. + * `modifiedBefore`: an optional timestamp to only include files with + * modification times occurring before the specified Time. The provided timestamp + * must be in the following form: -MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00) + * `modifiedAfter`: an optional timestamp to only include files with Review comment: ditto ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -785,6 +803,12 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `pathGlobFilter`: an optional glob pattern to only include files with paths matching * the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. * It does not change the behavior of partition discovery. + * `modifiedBefore`: an optional timestamp to only include files with Review comment: ditto ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -785,6 +803,12 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { * `pathGlobFilter`: an optional glob pattern to only include files with paths matching * the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. * It does not change the behavior of partition discovery. + * `modifiedBefore`: an optional timestamp to only include files with + * modification times occurring before the specified Time. The provided timestamp + * must be in the following form: -MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00) + * `modifiedAfter`: an optional timestamp to only include files with Review comment: ditto ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/pathFilters.scala ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import java.util.{Locale, TimeZone} + +import org.apache.hadoop.fs.{FileStatus, GlobFilter} + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap,
[GitHub] [spark] SparkQA removed a comment on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s
SparkQA removed a comment on pull request #29533: URL: https://github.com/apache/spark/pull/29533#issuecomment-698523837 **[Test build #129084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129084/testReport)** for PR 29533 at commit [`6449efa`](https://github.com/apache/spark/commit/6449efa72b2f7ff2aea53139520a04ef37b72f18). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API
AmplabJenkins commented on pull request #29756: URL: https://github.com/apache/spark/pull/29756#issuecomment-698101187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments num
SparkQA removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-698077548 **[Test build #129057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129057/testReport)** for PR 29054 at commit [`918aea4`](https://github.com/apache/spark/commit/918aea452c8e9c7d98574726e8e6ddde8c05624c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #27604: [SPARK-30849][CORE][SHUFFLE]Fix application failed due to failed to get MapStatuses broadcast block
github-actions[bot] closed pull request #27604: URL: https://github.com/apache/spark/pull/27604 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29863: [SPARK-32877][SQL][TEST] Add test for Hive UDF complex decimal type
SparkQA removed a comment on pull request #29863: URL: https://github.com/apache/spark/pull/29863#issuecomment-698261937 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Victsm commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
Victsm commented on a change in pull request #29855: URL: https://github.com/apache/spark/pull/29855#discussion_r494487660 ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); + return 4 + b.serializedSizeInBytes(); +} + +public static void encode(ByteBuf buf, RoaringBitmap b) { + ByteBuffer outBuffer = ByteBuffer.allocate(b.serializedSizeInBytes()); + try { +b.serialize(new DataOutputStream(new OutputStream() { + ByteBuffer buffer; + + OutputStream init(ByteBuffer buffer) { +this.buffer = buffer; +return this; + } + + @Override + public void close() { + } + + @Override + public void flush() { + } + + @Override + public void write(int b) { +buffer.put((byte) b); + } + + @Override + public void write(byte[] b) { +buffer.put(b); + } + + @Override + public void write(byte[] b, int off, int l) { +buffer.put(b, off, l); + } +}.init(outBuffer))); + } catch (IOException e) { +throw new RuntimeException("Exception while encoding bitmap", e); + } + byte[] bytes = outBuffer.array(); + buf.writeInt(bytes.length); + buf.writeBytes(bytes); +} + +public static RoaringBitmap decode(ByteBuf buf) { + int length = buf.readInt(); + byte[] bytes = new byte[length]; + buf.readBytes(bytes); Review comment: This would require using ByteArrays.encode to encode the original byte arrays. I think @Ngone51 's recommendation earlier makes sense, that we should use roaringbitmap#serialize(ByteBuffer) to avoid the one additional memory copy during encoding. By doing that, we would directly serialize into the ByteBuf, and it won't be possible to use ByteArrays.encode to encode the corresponding byte arrays. ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); + return 4 + b.serializedSizeInBytes(); +} + +public static void encode(ByteBuf buf, RoaringBitmap b) { + ByteBuffer outBuffer = ByteBuffer.allocate(b.serializedSizeInBytes()); + try { +b.serialize(new DataOutputStream(new OutputStream() { Review comment: Good point, I think this also avoids one more memory copy. ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); + return 4 + b.serializedSizeInBytes(); +} + +public static void encode(ByteBuf buf, RoaringBitmap b) { + ByteBuffer outBuffer = ByteBuffer.allocate(b.serializedSizeInBytes()); Review comment: Yes, BlockTransferMessage.toByteBuffer ensures that. Need to know the encodedLength in order to create the encoding ByteBuf in the first place. Will add a comment to clarify this. ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); Review comment: It should be invoked only once. BlockTransferMessage.toByteBuffer is where the initial call to encodedLength happens. It's only called once for each RoaringBitmap in the bitmap array. ## File path: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java ## @@ -209,12 +225,17 @@ public void onData(String streamId, ByteBuffer buf) throws IOException {
[GitHub] [spark] SparkQA removed a comment on pull request #29859: [SPARK-32971][K8S][FOLLOWUP] Add `.toSeq` for Scala 2.13 compilation
SparkQA removed a comment on pull request #29859: URL: https://github.com/apache/spark/pull/29859#issuecomment-698075792 **[Test build #129056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129056/testReport)** for PR 29859 at commit [`19d9a2f`](https://github.com/apache/spark/commit/19d9a2f302baf0cf9c9382f28622b83355103d7e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols
AmplabJenkins removed a comment on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698707878 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] holdenk commented on pull request #29471: [SPARK-32381][CORE][SQL] Move and refactor parallel listing & non-location sensitive listing to core
holdenk commented on pull request #29471: URL: https://github.com/apache/spark/pull/29471#issuecomment-698493851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #29843: [WIP][SPARK-29250] Upgrade to Hadoop 3.2.1 and move to shaded client
sunchao commented on a change in pull request #29843: URL: https://github.com/apache/spark/pull/29843#discussion_r494467897 ## File path: external/kafka-0-10-sql/pom.xml ## @@ -79,6 +79,10 @@ kafka-clients ${kafka.version} + + com.google.code.findbugs Review comment: Thanks. Yes will do after making all tests pass. ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ## @@ -118,11 +118,15 @@ private[hive] object IsolatedClientLoader extends Logging { hadoopVersion: String, ivyPath: Option[String], remoteRepos: String): Seq[URL] = { +val hadoopJarName = if (hadoopVersion.startsWith("3")) { Review comment: Yes I think so. These modules should be available in any production Hadoop 3.x releases I think. See https://issues.apache.org/jira/browse/HADOOP-11804, it is fixed in 3.0.0-alpha2. ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala ## @@ -118,11 +118,15 @@ private[hive] object IsolatedClientLoader extends Logging { hadoopVersion: String, ivyPath: Option[String], remoteRepos: String): Seq[URL] = { +val hadoopJarName = if (hadoopVersion.startsWith("3")) { Review comment: Yes I believe so. These modules should be available in any production Hadoop 3.x releases I think. See https://issues.apache.org/jira/browse/HADOOP-11804, it is fixed in 3.0.0-alpha2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29862: [SPARK-32956][SQL] Ensure that the generated and existing headers are not duplicated in CSV DataSource
AmplabJenkins commented on pull request #29862: URL: https://github.com/apache/spark/pull/29862#issuecomment-698251280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29533: [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s
dongjoon-hyun commented on pull request #29533: URL: https://github.com/apache/spark/pull/29533#issuecomment-698512589 It seems that `SparkR` test fail. ``` KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - All pods have the same service account by default - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. - Start pod creation from template - PVs with local storage - Launcher client dependencies - Run SparkR on simple dataframe.R example *** FAILED *** ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable
cloud-fan commented on pull request #29798: URL: https://github.com/apache/spark/pull/29798#issuecomment-698770919 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled
AmplabJenkins commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-698907651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen closed pull request #29833: [SPARK-32886][SPARK-31882][WEBUI][2.4] fix 'undefined' link in event timeline view
srowen closed pull request #29833: URL: https://github.com/apache/spark/pull/29833 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29024: [SPARK-32001][SQL]Create JDBC authentication provider developer API
AmplabJenkins removed a comment on pull request #29024: URL: https://github.com/apache/spark/pull/29024#issuecomment-698852755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols
srowen commented on a change in pull request #29868: URL: https://github.com/apache/spark/pull/29868#discussion_r494947145 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/FeatureHasher.scala ## @@ -91,8 +91,7 @@ class FeatureHasher(@Since("2.3.0") override val uid: String) extends Transforme /** * Numeric columns to treat as categorical features. By default only string and boolean * columns are treated as categorical, so this param can be used to explicitly specify the - * numerical columns to treat as categorical. Note, the relevant columns must also be set in - * `inputCols`. + * numerical columns to treat as categorical. Review comment: This is still 'required' right? we're not making it an error, but it won't have any effect if not in inputCols. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader at final stage on DataWritingCommand
SparkQA commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-698914035 **[Test build #129111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129111/testReport)** for PR 29797 at commit [`84134b0`](https://github.com/apache/spark/commit/84134b09ef5295818a32d9dc4612141fe93fa05c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29804: [SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically
c21 commented on a change in pull request #29804: URL: https://github.com/apache/spark/pull/29804#discussion_r494083795 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/DisableUnnecessaryBucketedScan.scala ## @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.expressions.aggregate.{Partial, PartialMerge} +import org.apache.spark.sql.catalyst.plans.physical.{ClusteredDistribution, HashClusteredDistribution} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SortExec, SparkPlan} +import org.apache.spark.sql.execution.aggregate.BaseAggregateExec +import org.apache.spark.sql.execution.exchange.Exchange +import org.apache.spark.sql.internal.SQLConf + +/** + * Disable unnecessary bucketed table scan based on actual physical query plan. + * NOTE: this rule is designed to be applied right after [[EnsureRequirements]], + * where all [[ShuffleExchangeExec]] and [[SortExec]] have been added to plan properly. + * + * When BUCKETING_ENABLED and AUTO_BUCKETED_SCAN_ENABLED are set to true, go through + * query plan to check where bucketed table scan is unnecessary, and disable bucketed table + * scan if needed. + * + * For all operators which [[hasInterestingPartition]] (i.e., require [[ClusteredDistribution]] + * or [[HashClusteredDistribution]]), check if the sub-plan for operator has [[Exchange]] and + * bucketed table scan. If yes, disable the bucketed table scan in the sub-plan. + * Only allow certain operators in sub-plan, which guarantees each sub-plan is single lineage + * (i.e., each operator has only one child). See details in + * [[disableBucketWithInterestingPartition]]). + * + * Examples: + * (1).join: + * SortMergeJoin(t1.i = t2.j) + */\ + *Sort(i)Sort(j) + * / \ + * Shuffle(i) Scan(t2: i, j) + */ (bucketed on column j, enable bucketed scan) + * Scan(t1: i, j) + * (bucketed on column j, DISABLE bucketed scan) + * + * (2).aggregate: + * HashAggregate(i, ..., Final) + * | + * Shuffle(i) + * | + * HashAggregate(i, ..., Partial) + * | + *Filter + * | + * Scan(t1: i, j) + * (bucketed on column j, DISABLE bucketed scan) + * + * The idea of [[hasInterestingPartition]] is inspired from "interesting order" in + * the paper "Access Path Selection in a Relational Database Management System" + * (http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf). + */ +case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan] { + + /** + * Disable bucketed table scan with pre-order traversal of plan. + * + * @param withInterestingPartition The traversed plan has operator with interesting partition. + * @param withExchange The traversed plan has [[Exchange]] operator. + */ + private def disableBucketWithInterestingPartition( + plan: SparkPlan, + withInterestingPartition: Boolean, + withExchange: Boolean): SparkPlan = { +plan match { + case p if hasInterestingPartition(p) => +// Operators with interesting partition, propagates `withInterestingPartition` as true +// to its children. +p.mapChildren(disableBucketWithInterestingPartition(_, true, false)) + case exchange: Exchange if withInterestingPartition => +// Exchange operator propagates `withExchange` as true to its child +// if the plan has interesting partition. +exchange.mapChildren(disableBucketWithInterestingPartition( + _, withInterestingPartition, true)) + case scan: FileSourceScanExec + if withInterestingPartition && withExchange && isBucketedScanWithoutFilter(scan) => +// Disable bucketed table scan if the plan has interesting partition, +// and [[Exchange]] in the plan. +scan.copy(disableBucketedScan = true) + case o => +if
[GitHub] [spark] dongjoon-hyun closed pull request #29853: [SPARK-32977][SQL][DOCS] Fix JavaDoc on Default Save Mode
dongjoon-hyun closed pull request #29853: URL: https://github.com/apache/spark/pull/29853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29862: [SPARK-32956][SQL] Ensure that the generated and existing headers are not duplicated in CSV DataSource
HyukjinKwon commented on a change in pull request #29862: URL: https://github.com/apache/spark/pull/29862#discussion_r49425 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ## @@ -93,6 +93,12 @@ object CSVUtils { value } } + if (header.sameElements(row)) { +header + } else { +// Ensure that the newly generated and existing headers are not duplicated. +makeSafeHeader(header, caseSensitive, options) + } Review comment: Can you check how R's `read_csv` works in this case? That patch was inspired by R's one. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ## @@ -93,6 +93,12 @@ object CSVUtils { value } } + if (header.sameElements(row)) { +header + } else { +// Ensure that the newly generated and existing headers are not duplicated. +makeSafeHeader(header, caseSensitive, options) + } Review comment: Can you check how R's `read_csv` works in this case? This behaviour was inspired by R's one. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ## @@ -93,6 +93,12 @@ object CSVUtils { value } } + if (header.sameElements(row)) { +header + } else { +// Ensure that the newly generated and existing headers are not duplicated. +makeSafeHeader(header, caseSensitive, options) + } Review comment: Can we follow this behaviour? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala ## @@ -93,6 +93,12 @@ object CSVUtils { value } } + if (header.sameElements(row)) { +header + } else { +// Ensure that the newly generated and existing headers are not duplicated. +makeSafeHeader(header, caseSensitive, options) + } Review comment: I mean the numbering. Can. we create a name like `a1 a3 a4 a2` for `a, a, a, a, a.2`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29861: [SPARK-32971][K8S][FOLLOWUP] Fix k8s-core module compilation in Scala 2.13
dongjoon-hyun commented on pull request #29861: URL: https://github.com/apache/spark/pull/29861#issuecomment-698111394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29843: [WIP][SPARK-29250] Upgrade to Hadoop 3.2.1 and move to shaded client
AmplabJenkins removed a comment on pull request #29843: URL: https://github.com/apache/spark/pull/29843#issuecomment-698120109 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API
cloud-fan closed pull request #29756: URL: https://github.com/apache/spark/pull/29756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29867: [SPARK-32889][SQL][TESTS][FOLLOWUP][test-hadoop2.7][test-hive1.2] Skip special column names test in Hive 1.2
AmplabJenkins commented on pull request #29867: URL: https://github.com/apache/spark/pull/29867#issuecomment-698623561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
dongjoon-hyun commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698112690 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain
maropu commented on a change in pull request #29828: URL: https://github.com/apache/spark/pull/29828#discussion_r494010395 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JsonSuite.scala ## @@ -0,0 +1,85 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.expressions._ +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.{LocalRelation, LogicalPlan} +import org.apache.spark.sql.catalyst.rules.RuleExecutor +import org.apache.spark.sql.types._ + +class JsonSuite extends PlanTest with ExpressionEvalHelper { + + object Optimizer extends RuleExecutor[LogicalPlan] { +val batches = Batch("Json optimization", FixedPoint(10), OptimizeJsonExprs) :: Nil + } + + val schema = StructType.fromDDL("a int, b int") + + private val structAtt = 'struct.struct(schema).notNull + + private val testRelation = LocalRelation(structAtt) + + test("SPARK-32948: optimize from_json + to_json") { +val options = Map.empty[String, String] + +val query1 = testRelation + .select(JsonToStructs(schema, options, StructsToJson(options, 'struct)).as("struct")) +val optimized1 = Optimizer.execute(query1.analyze) + +val expected = testRelation.select('struct.as("struct")).analyze +comparePlans(optimized1, expected) + +val query2 = testRelation + .select( +JsonToStructs(schema, options, + StructsToJson(options, +JsonToStructs(schema, options, + StructsToJson(options, 'struct.as("struct")) +val optimized2 = Optimizer.execute(query2.analyze) + +comparePlans(optimized2, expected) + } + + test("SPARK-32948: not optimize from_json + to_json if schema is different") { +val options = Map.empty[String, String] +val schema = StructType.fromDDL("a int") + +val query = testRelation + .select(JsonToStructs(schema, options, StructsToJson(options, 'struct)).as("struct")) +val optimized = Optimizer.execute(query.analyze) + +val expected = testRelation.select( + JsonToStructs(schema, options, StructsToJson(options, 'struct)).as("struct")).analyze +comparePlans(optimized, expected) + } + + test("SPARK-32948: not optimize from_json + to_json if option is not empty") { Review comment: Could you add tests with different timezone cases, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang removed a comment on pull request #29864: [SPARK-32987][MESOS] Pass all `mllib` module UTs in Scala 2.13
LuciferYang removed a comment on pull request #29864: URL: https://github.com/apache/spark/pull/29864#issuecomment-698276483 cc @srowen @dongjoon-hyun to review this patch ~ thx This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29868: [SPARK-32973][ML][DOC] FeatureHasher does not check categoricalCols in inputCols
zhengruifeng commented on pull request #29868: URL: https://github.com/apache/spark/pull/29868#issuecomment-698693191 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] juliuszsompolski commented on a change in pull request #29834: [SPARK-32963][SQL] empty string should be consistent for schema name in SparkGetSchemasOperation
juliuszsompolski commented on a change in pull request #29834: URL: https://github.com/apache/spark/pull/29834#discussion_r494156528 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala ## @@ -77,7 +77,8 @@ private[hive] class SparkGetSchemasOperation( val globalTempViewDb = sqlContext.sessionState.catalog.globalTempViewManager.database val databasePattern = Pattern.compile(CLIServiceUtils.patternToRegex(schemaName)) - if (databasePattern.matcher(globalTempViewDb).matches()) { + if (schemaName == null || schemaName.isEmpty || Review comment: https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getSchemas(java.lang.String,%20java.lang.String) `schemaPattern - a schema name; must match the schema name as it is stored in the database; null means schema name should not be used to narrow down the search.` This doc doesn't mention empty string, but if it's treated as a pattern, it should default to empty string not matching anything. schemaName == null is already handled to match everything in patternToRegex: ``` public static String patternToRegex(String pattern) { if (pattern == null) { return ".*"; } else { ``` So the current behaviour seems to be consistent with JDBC documentation? ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetSchemasOperation.scala ## @@ -77,7 +77,8 @@ private[hive] class SparkGetSchemasOperation( val globalTempViewDb = sqlContext.sessionState.catalog.globalTempViewManager.database val databasePattern = Pattern.compile(CLIServiceUtils.patternToRegex(schemaName)) - if (databasePattern.matcher(globalTempViewDb).matches()) { + if (schemaName == null || schemaName.isEmpty || Review comment: https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String[]) `schemaPattern - a schema name pattern; must match the schema name as it is stored in the database; "" retrieves those without a schema; null means that the schema name should not be used to narrow the search` The behaviour for getTables treats "" as no schema (e.g. local temp views), not all schemas, so it seems consistent that getSchemas wouldn't treat "" as "all schemas". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29795: [SPARK-32511][SQL] Add dropFields method to Column class
SparkQA commented on pull request #29795: URL: https://github.com/apache/spark/pull/29795#issuecomment-698674623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29866: [SPARK-32990][SQL] Migrate REFRESH TABLE to use UnresolvedTableOrView to resolve the identifier
AmplabJenkins commented on pull request #29866: URL: https://github.com/apache/spark/pull/29866#issuecomment-698617283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29425: [SPARK-32350][FOLLOW-UP] Fix count update issue and partition the value list to a set of small batches for LevelDB writeAll
HeartSaVioR commented on pull request #29425: URL: https://github.com/apache/spark/pull/29425#issuecomment-698303116 Sorry I still have several things in my plate and have been struggling with these things. You'd better ping @mridulm as he'd understand the patch well. @mridulm Appreciated if you have a time to look into this. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29844: [SPARK-27872][K8s] Fix executor service account inconsistency for branch-2.4
AmplabJenkins removed a comment on pull request #29844: URL: https://github.com/apache/spark/pull/29844#issuecomment-697043404 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API
AmplabJenkins removed a comment on pull request #29756: URL: https://github.com/apache/spark/pull/29756#issuecomment-698101187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain
SparkQA removed a comment on pull request #29828: URL: https://github.com/apache/spark/pull/29828#issuecomment-698088612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MLnick commented on pull request #29850: [SPARK-32974][ML] FeatureHasher transform optimization
MLnick commented on pull request #29850: URL: https://github.com/apache/spark/pull/29850#issuecomment-698112434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #29844: [SPARK-27872][K8s] Fix executor service account inconsistency for branch-2.4
dongjoon-hyun commented on pull request #29844: URL: https://github.com/apache/spark/pull/29844#issuecomment-698422594 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29867: [SPARK-32889][SQL][TESTS][FOLLOWUP][test-hadoop2.7][test-hive1.2] Skip special column names test in Hive 1.2
AmplabJenkins removed a comment on pull request #29867: URL: https://github.com/apache/spark/pull/29867#issuecomment-698623561 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29795: [SPARK-32511][SQL] Add dropFields method to Column class
cloud-fan commented on a change in pull request #29795: URL: https://github.com/apache/spark/pull/29795#discussion_r494745836 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ## @@ -901,39 +901,125 @@ class Column(val expr: Expression) extends Logging { * // result: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields * }}} * + * This method supports adding/replacing nested fields directly e.g. + * + * {{{ + * val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col") + * df.select($"struct_col".withField("a.c", lit(3)).withField("a.d", lit(4))) + * // result: {"a":{"a":1,"b":2,"c":3,"d":4}} + * }}} + * + * However, if you are going to add/replace multiple nested fields, it is more optimal to extract + * out the nested struct before adding/replacing multiple fields e.g. + * + * {{{ + * val df = sql("SELECT named_struct('a', named_struct('a', 1, 'b', 2)) struct_col") + * df.select($"struct_col".withField("a", $"struct_col.a".withField("c", lit(3)).withField("d", lit(4 + * // result: {"a":{"a":1,"b":2,"c":3,"d":4}} + * }}} + * * @group expr_ops * @since 3.1.0 */ // scalastyle:on line.size.limit def withField(fieldName: String, col: Column): Column = withExpr { require(fieldName != null, "fieldName cannot be null") require(col != null, "col cannot be null") +updateFieldsHelper(expr, nameParts(fieldName), name => WithField(name, col.expr)) + } -val nameParts = if (fieldName.isEmpty) { + // scalastyle:off line.size.limit + /** + * An expression that drops fields in `StructType` by name. Review comment: It's semantically noop. We can optimize away the struct reconstructing later. ## File path: sql/core/src/test/scala/org/apache/spark/sql/UpdateFieldsBenchmark.scala ## @@ -0,0 +1,310 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.benchmark.Benchmark +import org.apache.spark.sql.execution.benchmark.SqlBasedBenchmark +import org.apache.spark.sql.functions.{col, lit} +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{IntegerType, StructField, StructType} + +/** + * Benchmark to measure Spark's performance analyzing and optimizing long UpdateFields chains. + * + * {{{ + * To run this benchmark: + * 1. without sbt: + * bin/spark-submit --class + * 2. with sbt: + * build/sbt "sql/test:runMain " + * 3. generate result: + * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain " + * Results will be written to "benchmarks/UpdateFieldsBenchmark-results.txt". + * }}} + */ +object UpdateFieldsBenchmark extends SqlBasedBenchmark { + + private def nestedColName(d: Int, colNum: Int): String = s"nested${d}Col$colNum" + + private def nestedStructType( + colNums: Seq[Int], + nullable: Boolean, + maxDepth: Int, + currDepth: Int = 1): StructType = { + +if (currDepth == maxDepth) { + val fields = colNums.map { colNum => +val name = nestedColName(currDepth, colNum) +StructField(name, IntegerType, nullable = false) + } + StructType(fields) +} else { + val fields = colNums.foldLeft(Seq.empty[StructField]) { +case (structFields, colNum) if colNum == 0 => + val nested = nestedStructType(colNums, nullable, maxDepth, currDepth + 1) + structFields :+ StructField(nestedColName(currDepth, colNum), nested, nullable) +case (structFields, colNum) => + val name = nestedColName(currDepth, colNum) + structFields :+ StructField(name, IntegerType, nullable = false) + } + StructType(fields) +} + } + + private def nestedRow(colNums: Seq[Int], maxDepth: Int, currDepth: Int = 1): Row = { +if (currDepth == maxDepth) { + Row.fromSeq(colNums) +} else { + val values = colNums.foldLeft(Seq.empty[Any]) { +case (values, colNum) if colNum == 0 => + values :+ nestedRow(colNums, maxDepth, currDepth + 1) +case (values, colNum) => + values :+ colNum + } +
[GitHub] [spark] srowen commented on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
srowen commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698479253 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29833: [SPARK-32886][SPARK-31882][WEBUI][2.4] fix 'undefined' link in event timeline view
AmplabJenkins removed a comment on pull request #29833: URL: https://github.com/apache/spark/pull/29833#issuecomment-698402321 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #29833: [SPARK-32886][SPARK-31882][WEBUI][2.4] fix 'undefined' link in event timeline view
srowen commented on pull request #29833: URL: https://github.com/apache/spark/pull/29833#issuecomment-698397395 Jenkins retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
mridulm commented on a change in pull request #29855: URL: https://github.com/apache/spark/pull/29855#discussion_r494003389 ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); + return 4 + b.serializedSizeInBytes(); +} + +public static void encode(ByteBuf buf, RoaringBitmap b) { + ByteBuffer outBuffer = ByteBuffer.allocate(b.serializedSizeInBytes()); + try { +b.serialize(new DataOutputStream(new OutputStream() { + ByteBuffer buffer; + + OutputStream init(ByteBuffer buffer) { +this.buffer = buffer; +return this; + } + + @Override + public void close() { + } + + @Override + public void flush() { + } + + @Override + public void write(int b) { +buffer.put((byte) b); + } + + @Override + public void write(byte[] b) { +buffer.put(b); + } + + @Override + public void write(byte[] b, int off, int l) { +buffer.put(b, off, l); + } +}.init(outBuffer))); + } catch (IOException e) { +throw new RuntimeException("Exception while encoding bitmap", e); + } Review comment: Replace this with something more concise - for example see `UnsafeShuffleWriter.MyByteArrayOutputStream`. To illustrate, something like: ``` MyBaos out = new MyBaos(b.serializedSizeInBytes()); b.serialize(new DataOutputStream(out)); int size = out.size(); buf.writeInt(size); buf.writeBytes(out.getBuf(), 0, size); ``` The last part could also be moved as `ByteArrays.encode(byte[] arr, int offset, int len)` ## File path: common/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java ## @@ -44,6 +51,71 @@ public static String decode(ByteBuf buf) { } } + /** Bitmaps are encoded with their serialization length followed by the serialization bytes. */ + public static class Bitmaps { +public static int encodedLength(RoaringBitmap b) { + // Compress the bitmap before serializing it + b.trim(); + b.runOptimize(); Review comment: `BitmapArrays` results in calling `trim` and `runOptimize` twice - refactor so that it is only done once for this codepath ? ## File path: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java ## @@ -209,12 +225,17 @@ public void onData(String streamId, ByteBuffer buf) throws IOException { public void onComplete(String streamId) throws IOException { try { streamHandler.onComplete(streamId); - callback.onSuccess(ByteBuffer.allocate(0)); + callback.onSuccess(meta.duplicate()); Review comment: Can you add a comment on why we are making this change ? From sending empty buffer to meta. ## File path: common/network-common/src/main/java/org/apache/spark/network/server/TransportRequestHandler.java ## @@ -181,6 +182,17 @@ public void onFailure(Throwable e) { private void processStreamUpload(final UploadStream req) { assert (req.body() == null); try { + // Retain the original metadata buffer, since it will be used during the invocation of + // this method. Will be released later. + req.meta.retain(); + // Make a copy of the original metadata buffer. In benchmark, we noticed that + // we cannot respond the original metadata buffer back to the client, otherwise + // in cases where multiple concurrent shuffles are present, a wrong metadata might + // be sent back to client. This is related to the eager release of the metadata buffer, + // i.e., we always release the original buffer by the time the invocation of this + // method ends, instead of by the time we respond it to the client. This is necessary, + // otherwise we start seeing memory issues very quickly in benchmarks. + ByteBuffer meta = cloneBuffer(req.meta.nioByteBuffer()); Review comment: Since we are always making a copy of meta here; can we remove the `retain` + `release` below and instead always release it here and only rely on the cloned butter within this method ? ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockPusher.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29828: [SPARK-32948][SQL] Optimize to_json and from_json expression chain
AmplabJenkins removed a comment on pull request #29828: URL: https://github.com/apache/spark/pull/29828#issuecomment-698089064 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29869: [WIP][SPARK-32994][CORE] Update external accumulators before they entering into Spark listener event loop
AmplabJenkins removed a comment on pull request #29869: URL: https://github.com/apache/spark/pull/29869#issuecomment-698726138 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29591: [SPARK-32714][PYTHON] Initial pyspark-stubs port.
HyukjinKwon commented on pull request #29591: URL: https://github.com/apache/spark/pull/29591#issuecomment-698117104 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #29852: [SPARK-21481][ML][FOLLOWUP][Trivial] HashingTF use util.collection.OpenHashMap instead of mutable.HashMap
zhengruifeng commented on pull request #29852: URL: https://github.com/apache/spark/pull/29852#issuecomment-698162567 ping @huaxingao This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29857: [SPARK-32972][ML] Fix UTs of `mllib` module in Scala 2.13 except RandomForestRegressorSuite
AmplabJenkins removed a comment on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698087732 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nssalian commented on pull request #29844: [SPARK-27872][K8s][2.4] Fix executor service account inconsistency
nssalian commented on pull request #29844: URL: https://github.com/apache/spark/pull/29844#issuecomment-698610691 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29852: [SPARK-21481][ML][FOLLOWUP][Trivial] HashingTF use util.collection.OpenHashMap instead of mutable.HashMap
srowen commented on a change in pull request #29852: URL: https://github.com/apache/spark/pull/29852#discussion_r494390254 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala ## @@ -91,20 +90,13 @@ class HashingTF @Since("3.0.0") private[ml] ( @Since("2.0.0") override def transform(dataset: Dataset[_]): DataFrame = { val outputSchema = transformSchema(dataset.schema) -val localNumFeatures = $(numFeatures) -val localBinary = $(binary) +val n = $(numFeatures) +val updateFunc = if ($(binary)) (v: Double) => 1.0 else (v: Double) => v + 1.0 val hashUDF = udf { terms: Seq[_] => - val termFrequencies = mutable.HashMap.empty[Int, Double].withDefaultValue(0.0) - terms.foreach { term => -val i = indexOf(term) -if (localBinary) { - termFrequencies(i) = 1.0 -} else { - termFrequencies(i) += 1.0 -} - } - Vectors.sparse(localNumFeatures, termFrequencies.toSeq) + val map = new OpenHashMap[Int, Double]() Review comment: This seems fine but is it faster than Scala's Map? the comment refers to the Java HashMap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29860: [SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite
AmplabJenkins commented on pull request #29860: URL: https://github.com/apache/spark/pull/29860#issuecomment-698090732 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29756: [SPARK-32885][SS] Add DataStreamReader.table API
cloud-fan commented on pull request #29756: URL: https://github.com/apache/spark/pull/29756#issuecomment-698755963 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29806: [SPARK-32187][PYTHON][DOCS] Doc on Python packaging
HyukjinKwon commented on pull request #29806: URL: https://github.com/apache/spark/pull/29806#issuecomment-698110090 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fqaiser94 commented on a change in pull request #29795: [SPARK-32511][SQL] Add dropFields method to Column class
fqaiser94 commented on a change in pull request #29795: URL: https://github.com/apache/spark/pull/29795#discussion_r494698625 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -541,57 +541,105 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E } /** - * Adds/replaces field in struct by name. + * Represents an operation to be applied to the fields of a struct. */ -case class WithFields( -structExpr: Expression, -names: Seq[String], -valExprs: Seq[Expression]) extends Unevaluable { +trait StructFieldsOperation { - assert(names.length == valExprs.length) + val resolver: Resolver = SQLConf.get.resolver + + /** + * Returns an updated list of StructFields and Expressions that will ultimately be used + * as the fields argument for [[StructType]] and as the children argument for + * [[CreateNamedStruct]] respectively inside of [[UpdateFields]]. + */ + def apply(values: Seq[(StructField, Expression)]): Seq[(StructField, Expression)] +} + +/** + * Add or replace a field by name. + * + * We extend [[Unevaluable]] here to ensure that [[UpdateFields]] can include it as part of its + * children, and thereby enable the analyzer to resolve and transform valExpr as necessary. + */ +case class WithField(name: String, valExpr: Expression) + extends Unevaluable with StructFieldsOperation { + + override def apply(values: Seq[(StructField, Expression)]): Seq[(StructField, Expression)] = { +val newFieldExpr = (StructField(name, valExpr.dataType, valExpr.nullable), valExpr) +if (values.exists { case (field, _) => resolver(field.name, name) }) { + values.map { +case (field, _) if resolver(field.name, name) => newFieldExpr +case x => x + } +} else { + values :+ newFieldExpr +} + } + + override def children: Seq[Expression] = valExpr :: Nil + + override def dataType: DataType = throw new UnresolvedException(this, "dataType") Review comment: done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -541,57 +541,105 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E } /** - * Adds/replaces field in struct by name. + * Represents an operation to be applied to the fields of a struct. */ -case class WithFields( -structExpr: Expression, -names: Seq[String], -valExprs: Seq[Expression]) extends Unevaluable { +trait StructFieldsOperation { - assert(names.length == valExprs.length) + val resolver: Resolver = SQLConf.get.resolver + + /** + * Returns an updated list of StructFields and Expressions that will ultimately be used + * as the fields argument for [[StructType]] and as the children argument for + * [[CreateNamedStruct]] respectively inside of [[UpdateFields]]. + */ + def apply(values: Seq[(StructField, Expression)]): Seq[(StructField, Expression)] +} + +/** + * Add or replace a field by name. + * + * We extend [[Unevaluable]] here to ensure that [[UpdateFields]] can include it as part of its + * children, and thereby enable the analyzer to resolve and transform valExpr as necessary. + */ +case class WithField(name: String, valExpr: Expression) + extends Unevaluable with StructFieldsOperation { + + override def apply(values: Seq[(StructField, Expression)]): Seq[(StructField, Expression)] = { +val newFieldExpr = (StructField(name, valExpr.dataType, valExpr.nullable), valExpr) +if (values.exists { case (field, _) => resolver(field.name, name) }) { Review comment: thanks for sharing the code, done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -541,57 +541,105 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E } /** - * Adds/replaces field in struct by name. + * Represents an operation to be applied to the fields of a struct. */ -case class WithFields( -structExpr: Expression, -names: Seq[String], -valExprs: Seq[Expression]) extends Unevaluable { +trait StructFieldsOperation { - assert(names.length == valExprs.length) + val resolver: Resolver = SQLConf.get.resolver + + /** + * Returns an updated list of StructFields and Expressions that will ultimately be used + * as the fields argument for [[StructType]] and as the children argument for + * [[CreateNamedStruct]] respectively inside of [[UpdateFields]]. + */ + def apply(values: Seq[(StructField, Expression)]): Seq[(StructField, Expression)] +} + +/** + * Add or replace a field by name. + * + * We extend [[Unevaluable]] here to ensure that [[UpdateFields]] can include it as part of its + * children, and thereby enable the analyzer to resolve and transform valExpr as necessary. + */ +case class WithField(name: String, valExpr: Expression) + extends
[GitHub] [spark] cloud-fan commented on a change in pull request #29860: [SPARK-32984][TESTS][SQL] Improve showing the differences between approved and actual plans of PlanStabilitySuite
cloud-fan commented on a change in pull request #29860: URL: https://github.com/apache/spark/pull/29860#discussion_r494832113 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -153,23 +154,93 @@ trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { // write out for debugging FileUtils.writeStringToFile(actualSimplifiedFile, actualSimplified, StandardCharsets.UTF_8) FileUtils.writeStringToFile(actualExplainFile, explain, StandardCharsets.UTF_8) + val (approvedSimplifiedWithHint, actualSimplifiedWithHint) = +addDiffHint(approvedSimplified, actualSimplified) fail( s""" |Plans did not match: |last approved simplified plan: ${approvedSimplifiedFile.getAbsolutePath} |last approved explain plan: ${approvedExplainFile.getAbsolutePath} | - |$approvedSimplified + |$approvedSimplifiedWithHint | |actual simplified plan: ${actualSimplifiedFile.getAbsolutePath} |actual explain plan: ${actualExplainFile.getAbsolutePath} | - |$actualSimplified + |$actualSimplifiedWithHint """.stripMargin) } } + /** + * Add the hint to the simplified plans where they first become different. + */ + private def addDiffHint(approvedSimplified: String, actualSimplified: String) +: (String, String) = { +// reverse the plan so we can compare the node from the bottom to top Review comment: One hard problem is how to match the lines from both sides. It's possible that the left side has one more node in the middle, so simply matching lines bottom-up may not work. It's like git diff, we should do the match w.r.t. the content, which can be very complicated. Maybe we should just recommend some online text diff tools in the comment and ask people to use. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #29056: [SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
gatorsmile commented on a change in pull request #29056: URL: https://github.com/apache/spark/pull/29056#discussion_r494808052 ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -36,6 +36,14 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ LOCATION path ] Review comment: The bucketSpec is still missing in CREATE HIVE FORMAT table, right? ``` [ CLUSTERED BY ( col_name3, col_name4, ... ) [ SORTED BY ( col_name [ ASC | DESC ], ... ) ] INTO num_buckets BUCKETS ] ``` ## File path: docs/sql-ref-syntax-ddl-create-table-hiveformat.md ## @@ -36,6 +36,14 @@ CREATE [ EXTERNAL ] TABLE [ IF NOT EXISTS ] table_identifier [ LOCATION path ] Review comment: Any reason we did not add it? @huaxingao @GuoPhilipse This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled
SparkQA commented on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-698906945 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled
AmplabJenkins removed a comment on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-698907651 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #29024: [SPARK-32001][SQL]Create JDBC authentication provider developer API
gaborgsomogyi commented on a change in pull request #29024: URL: https://github.com/apache/spark/pull/29024#discussion_r494903467 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala ## @@ -23,12 +23,15 @@ import java.util.{Locale, Properties} import org.apache.commons.io.FilenameUtils import org.apache.spark.SparkFiles +import org.apache.spark.annotation.DeveloperApi import org.apache.spark.internal.Logging import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap /** + * ::DeveloperApi:: * Options for the JDBC data source. */ +@DeveloperApi Review comment: @HyukjinKwon thanks for having a look! I agree that `JDBCOptions` mustn't be exposed. Let me change the code to show `option 1`. As said passing only `keytab: String, principal: String` is not enough because not all but some of the providers need further configurations. I've started to work on this this change (unless anybody has better option). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29831: [SPARK-32351][SQL] Show partially pushed down partition filters in explain()
viirya commented on pull request #29831: URL: https://github.com/apache/spark/pull/29831#issuecomment-699265145 cc @maropu too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid argumen
AmplabJenkins removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699272519 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid argumen
AmplabJenkins removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699272523 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129127/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments numbe
AmplabJenkins commented on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699272519 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments num
SparkQA removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699231076 **[Test build #129127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129127/testReport)** for PR 29054 at commit [`766c931`](https://github.com/apache/spark/commit/766c931975821781b91e49013caa3c39a35f2cb2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number erro
SparkQA commented on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699272432 **[Test build #129127 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129127/testReport)** for PR 29054 at commit [`766c931`](https://github.com/apache/spark/commit/766c931975821781b91e49013caa3c39a35f2cb2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number erro
SparkQA commented on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699287405 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33746/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29872: [SPARK-32996][Web-UI] Handle empty ExecutorMetrics in ExecutorMetricsJsonSerializer
SparkQA commented on pull request #29872: URL: https://github.com/apache/spark/pull/29872#issuecomment-699266269 **[Test build #129125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129125/testReport)** for PR 29872 at commit [`c27a699`](https://github.com/apache/spark/commit/c27a6994be6f580e331d49aeedfab2ca4c427e30). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29872: [SPARK-32996][Web-UI] Handle empty ExecutorMetrics in ExecutorMetricsJsonSerializer
SparkQA removed a comment on pull request #29872: URL: https://github.com/apache/spark/pull/29872#issuecomment-699197271 **[Test build #129125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129125/testReport)** for PR 29872 at commit [`c27a699`](https://github.com/apache/spark/commit/c27a6994be6f580e331d49aeedfab2ca4c427e30). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
SparkQA removed a comment on pull request #29855: URL: https://github.com/apache/spark/pull/29855#issuecomment-699207624 **[Test build #129126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129126/testReport)** for PR 29855 at commit [`85b0de8`](https://github.com/apache/spark/commit/85b0de8f48c8f998a41e794cf0a32c8bea35f237). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
AmplabJenkins removed a comment on pull request #29855: URL: https://github.com/apache/spark/pull/29855#issuecomment-699271839 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29855: [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to support push shuffle blocks
AmplabJenkins commented on pull request #29855: URL: https://github.com/apache/spark/pull/29855#issuecomment-699271839 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid argumen
AmplabJenkins removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699257263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/33745/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments numbe
AmplabJenkins commented on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699257257 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid argumen
AmplabJenkins removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699257257 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
SparkQA removed a comment on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699168581 **[Test build #129124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129124/testReport)** for PR 29875 at commit [`3f14f68`](https://github.com/apache/spark/commit/3f14f6842e04342297ac671bf9791a21ff7ec258). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
SparkQA commented on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699274191 **[Test build #129124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129124/testReport)** for PR 29875 at commit [`3f14f68`](https://github.com/apache/spark/commit/3f14f6842e04342297ac671bf9791a21ff7ec258). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
SparkQA commented on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699296378 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33747/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API
HeartSaVioR commented on pull request #29729: URL: https://github.com/apache/spark/pull/29729#issuecomment-699269552 Worth noting that the issue is not just occurred in theory, but I've seen the case multiple times around community report, customers, etc. Probably we'd feel better to document the change on security viewpoint (release note as well?) to notice the end users, but I hope the change on security requirement doesn't block resolving "real world" issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
AmplabJenkins commented on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699274719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
AmplabJenkins removed a comment on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699274719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid arguments number erro
SparkQA commented on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699281169 **[Test build #129130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129130/testReport)** for PR 29054 at commit [`95cfebe`](https://github.com/apache/spark/commit/95cfebeff7b0eb2b696e9882d8040ff635aeb68b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29875: [SPARK-32999][SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode
SparkQA commented on pull request #29875: URL: https://github.com/apache/spark/pull/29875#issuecomment-699281142 **[Test build #129129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129129/testReport)** for PR 29875 at commit [`d7aeded`](https://github.com/apache/spark/commit/d7aeded2141a45ac770fb2926a3ed1ef55420fec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29872: [SPARK-32996][Web-UI] Handle empty ExecutorMetrics in ExecutorMetricsJsonSerializer
SparkQA commented on pull request #29872: URL: https://github.com/apache/spark/pull/29872#issuecomment-699402597 **[Test build #129131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129131/testReport)** for PR 29872 at commit [`2967673`](https://github.com/apache/spark/commit/29676739bbb2ef6db17cd170da7fb1ed24ffa769). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29054: [SPARK-32243][SQL]HiveSessionCatalog call super.makeFunctionExpression should throw earlier when got Spark UDAF Invalid argumen
AmplabJenkins removed a comment on pull request #29054: URL: https://github.com/apache/spark/pull/29054#issuecomment-699400125 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/129128/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29872: [SPARK-32996][Web-UI] Handle empty ExecutorMetrics in ExecutorMetricsJsonSerializer
AmplabJenkins commented on pull request #29872: URL: https://github.com/apache/spark/pull/29872#issuecomment-699266831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29872: [SPARK-32996][Web-UI] Handle empty ExecutorMetrics in ExecutorMetricsJsonSerializer
AmplabJenkins removed a comment on pull request #29872: URL: https://github.com/apache/spark/pull/29872#issuecomment-699266831 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29798: [SPARK-32931][SQL] Unevaluable Expressions are not Foldable
cloud-fan closed pull request #29798: URL: https://github.com/apache/spark/pull/29798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fhoering commented on pull request #29806: [SPARK-32187][PYTHON][DOCS] Doc on Python packaging
fhoering commented on pull request #29806: URL: https://github.com/apache/spark/pull/29806#issuecomment-698877710 It would be nice to have K8s here indeed but I never deployed to K8s. So I will only do the small changes from above and let you open anther JIRA ticket for someone else to write about K8s This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29797: [SPARK-32932][SQL] Do not use local shuffle reader on RepartitionByExpression when coalescing disabled
SparkQA removed a comment on pull request #29797: URL: https://github.com/apache/spark/pull/29797#issuecomment-698906945 **[Test build #129110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129110/testReport)** for PR 29797 at commit [`7e0d766`](https://github.com/apache/spark/commit/7e0d766b424cdcac27f4bb3b08e325886daf92b2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29857: [SPARK-32972][ML] Pass all UTs of `mllib` module in Scala 2.13
SparkQA commented on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698928101 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29790: [SPARK-32914][SQL] Avoid calling dataType multiple times for each expression
wangyum commented on a change in pull request #29790: URL: https://github.com/apache/spark/pull/29790#discussion_r494810795 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -3498,13 +3500,15 @@ object ArrayUnion { since = "2.4.0") case class ArrayIntersect(left: Expression, right: Expression) extends ArrayBinaryLike with ComplexTypeMergingExpression { - override def dataType: DataType = { -dataTypeCheck Review comment: Do you mean add it back? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29857: [SPARK-32972][ML] Pass all UTs of `mllib` module in Scala 2.13
AmplabJenkins removed a comment on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698937917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29857: [SPARK-32972][ML] Pass all UTs of `mllib` module in Scala 2.13
SparkQA removed a comment on pull request #29857: URL: https://github.com/apache/spark/pull/29857#issuecomment-698906903 **[Test build #129109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/129109/testReport)** for PR 29857 at commit [`f2a26c5`](https://github.com/apache/spark/commit/f2a26c571b37b6f8c3ad169c27e73a38a67160f2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org