date:20220607

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-07 Thread GitBox



chenzhx commented on code in PR #36663:
URL: https://github.com/apache/spark/pull/36663#discussion_r890253947


##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala:
##
@@ -259,6 +259,55 @@ class V2ExpressionBuilder(
   } else {
 None
   }
+case date: DateAdd =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("DATE_ADD", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case date: DateDiff =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("DATE_DIFF", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case date: TruncDate =>
+  val childrenExpressions = date.children.flatMap(generateExpression(_))
+  if (childrenExpressions.length == date.children.length) {
+Some(new GeneralScalarExpression("TRUNC", 
childrenExpressions.toArray[V2Expression]))
+  } else {
+None
+  }
+case Second(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("SECOND", Array[V2Expression](v)))
+case Minute(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("MINUTE", Array[V2Expression](v)))
+case Hour(child, _) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("HOUR", Array[V2Expression](v)))
+case Month(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("MONTH", Array[V2Expression](v)))
+case Quarter(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("QUARTER", Array[V2Expression](v)))
+case Year(child) => generateExpression(child)
+  .map(v => new GeneralScalarExpression("YEAR", Array[V2Expression](v)))
+// The DAY_OF_WEEK function in Spark returns the day of the week for 
date/timestamp.
+// Database dialects do not need to follow ISO semantics when handling 
DAY_OF_WEEK.

Review Comment:
   Yes. In ISO semantics, 1 represents Monday.
   But the DayOfWeek function in Spark, 1 represents Sunday.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-07 Thread GitBox



wangyum commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1148459572

   cc @yaooqinn @pan3793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on code in PR #36716:
URL: https://github.com/apache/spark/pull/36716#discussion_r890982027


##
core/src/main/scala/org/apache/spark/deploy/master/Master.scala:
##
@@ -725,26 +729,38 @@ private[deploy] class Master(
*/
   private def startExecutorsOnWorkers(): Unit = {
 // Right now this is a very simple FIFO scheduler. We keep trying to fit 
in the first app
-// in the queue, then the second app, etc.
+// in the queue, then the second app, etc. And for each app, we will 
schedule base on
+// resource profiles also with a simple FIFO scheduler, resource profile 
with smaller id
+// first.

Review Comment:
   I'd suggest to schedule in the order of the resource profile reuqests 
instead of the smaller id first. In the case of the resource profile is resued 
for later on RDD computation, the samller id doesn't seem to has the priority 
over other resource profiles. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on code in PR #36716:
URL: https://github.com/apache/spark/pull/36716#discussion_r890973724


##
core/src/main/scala/org/apache/spark/deploy/master/ExecutorResourceDescription.scala:
##
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.master
+
+import org.apache.spark.resource.ResourceRequirement
+
+/**
+ * Describe resource requests for different resource profiles. Used for 
executor schedule.
+ *
+ * @param coresPerExecutor cores for each executor.
+ * @param memoryMbPerExecutor memory for each executor.
+ * @param customResourcesPerExecutor custom resource requests for each 
executor.

Review Comment:
   nit: "resource requests" -> "resource requirements"
   
   (I think we also have `ExecutorResourceRequest` so it's good to distugish 
them carefully.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on code in PR #36716:
URL: https://github.com/apache/spark/pull/36716#discussion_r890949860


##
core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala:
##
@@ -65,7 +66,70 @@ private[spark] class ApplicationInfo(
 appSource = new ApplicationSource(this)
 nextExecutorId = 0
 removedExecutors = new ArrayBuffer[ExecutorDesc]
-executorLimit = desc.initialExecutorLimit.getOrElse(Integer.MAX_VALUE)
+val initialExecutorLimit = 
desc.initialExecutorLimit.getOrElse(Integer.MAX_VALUE)
+
+rpIdToResourceProfile = new mutable.HashMap[Int, ResourceProfile]()
+rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) = desc.defaultProfile
+rpIdToResourceDesc = new mutable.HashMap[Int, 
ExecutorResourceDescription]()
+createResourceDescForResourceProfile(desc.defaultProfile)
+
+targetNumExecutorsPerResourceProfileId = new mutable.HashMap[Int, Int]()
+targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = 
initialExecutorLimit
+
+executorsPerResourceProfileId = new mutable.HashMap[Int, 
mutable.Set[Int]]()
+  }
+
+  private[deploy] def getOrUpdateExecutorsForRPId(rpId: Int): mutable.Set[Int] 
= {
+executorsPerResourceProfileId.getOrElseUpdate(rpId, mutable.HashSet[Int]())
+  }
+
+  private[deploy] def getTargetExecutorNumForRPId(rpId: Int): Int = {
+targetNumExecutorsPerResourceProfileId.getOrElse(rpId, 0)
+  }
+
+  private[deploy] def getRequestedRPIds(): Seq[Int] = {
+rpIdToResourceProfile.keys.toSeq.sorted
+  }
+
+  private def createResourceDescForResourceProfile(resourceProfile: 
ResourceProfile): Unit = {
+if (!rpIdToResourceDesc.contains(resourceProfile.id)) {
+  val defaultMemoryMbPerExecutor = desc.memoryPerExecutorMB
+  val defaultCoresPerExecutor = desc.coresPerExecutor
+  val coresPerExecutor = resourceProfile.getExecutorCores
+.orElse(defaultCoresPerExecutor)
+  val memoryMbPerExecutor = resourceProfile.getExecutorMemory
+.map(_.toInt)
+.getOrElse(defaultMemoryMbPerExecutor)
+  val customResources = ResourceUtils.executorResourceRequestToRequirement(
+getCustomExecutorResources(resourceProfile).values.toSeq)
+
+  rpIdToResourceDesc(resourceProfile.id) =
+ExecutorResourceDescription(coresPerExecutor, memoryMbPerExecutor, 
customResources)
+}
+  }
+
+  // Get resources required for schedule.
+  private[deploy] def getResourceDescriptionForRpId(rpId: Int): 
ExecutorResourceDescription = {
+rpIdToResourceDesc(rpId)
+  }
+
+  private[deploy] def requestExecutors(
+  resourceProfileToTotalExecs: Map[ResourceProfile, Int]): Unit = {
+resourceProfileToTotalExecs.foreach { case (rp, num) =>
+  createResourceDescForResourceProfile(rp)
+
+  if (!rpIdToResourceProfile.contains(rp.id)) {
+rpIdToResourceProfile(rp.id) = rp
+  }
+
+  if (!targetNumExecutorsPerResourceProfileId.get(rp.id).contains(num)) {
+targetNumExecutorsPerResourceProfileId(rp.id) = num
+  }

Review Comment:
   How about:
   ```suggestion
 targetNumExecutorsPerResourceProfileId(rp.id) = num
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on code in PR #36716:
URL: https://github.com/apache/spark/pull/36716#discussion_r890938856


##
core/src/main/scala/org/apache/spark/deploy/ExecutorDescription.scala:
##
@@ -25,10 +25,13 @@ package org.apache.spark.deploy
 private[deploy] class ExecutorDescription(
 val appId: String,
 val execId: Int,
+val rpId: Int,
 val cores: Int,
+val memoryMb: Int,

Review Comment:
   > And in master, we can only reconstruct the resource profile information in 
ApplicationInfo after client send resource request RequestExecutors
   
   Does it mean we can't launch new executors with the specific `rpId` until 
the client sends the request with the corresponding resource profile? For 
example, if the number of executos with the specific `rpId` hasn't reached the 
targer number, it seems like we can't schedule new executors for it until we 
know resource profile by `RequestExecutors`, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on code in PR #36716:
URL: https://github.com/apache/spark/pull/36716#discussion_r890912848


##
core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala:
##
@@ -19,23 +19,28 @@ package org.apache.spark.deploy
 
 import java.net.URI
 
-import org.apache.spark.resource.ResourceRequirement
+import org.apache.spark.resource.{ResourceProfile, ResourceRequirement, 
ResourceUtils}
+import org.apache.spark.resource.ResourceProfile.getCustomExecutorResources
 
 private[spark] case class ApplicationDescription(
 name: String,
 maxCores: Option[Int],
-memoryPerExecutorMB: Int,
 command: Command,
 appUiUrl: String,
+defaultProfile: ResourceProfile,
 eventLogDir: Option[URI] = None,
 // short name of compression codec used when writing event logs, if any 
(e.g. lzf)
 eventLogCodec: Option[String] = None,
-coresPerExecutor: Option[Int] = None,
 // number of executors this application wants to start with,
 // only used if dynamic allocation is enabled
 initialExecutorLimit: Option[Int] = None,
-user: String = System.getProperty("user.name", ""),
-resourceReqsPerExecutor: Seq[ResourceRequirement] = Seq.empty) {
+user: String = System.getProperty("user.name", "")) {
+
+  def memoryPerExecutorMB: Int = 
defaultProfile.getExecutorMemory.map(_.toInt).getOrElse(1024)
+  def coresPerExecutor: Option[Int] = defaultProfile.getExecutorCores
+  def resourceReqsPerExecutor: Seq[ResourceRequirement] =
+ResourceUtils.executorResourceRequestToRequirement(
+  
getCustomExecutorResources(defaultProfile).values.toSeq.sortBy(_.resourceName))

Review Comment:
   Is `sortBy(_.resourceName)` necessary? I didn't see we sort it in 
`ApplicationInfo`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #36745:
URL: https://github.com/apache/spark/pull/36745#discussion_r890902366


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -2881,6 +2881,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val DEFAULT_COLUMN_ALLOWED_PROVIDERS =
+buildConf("spark.sql.defaultColumn.allowedProviders")
+  .internal()
+  .doc("List of table providers wherein SQL commands are permitted to 
assign DEFAULT column " +
+"values. Comma-separated list, whitespace ignored, case-insensitive.")
+  .version("3.4.0")
+  .stringConf
+  .createWithDefault("csv,json,orc,parquet")

Review Comment:
   This is fine for now. But eventually, I think we need a data source API for 
a source to report if it supports default value or not, instead of asking 
end-users to set this conf.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

2022-06-07 Thread GitBox



cloud-fan closed pull request #36662: [SPARK-39286][DOC] Update documentation 
for the decode function
URL: https://github.com/apache/spark/pull/36662


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

2022-06-07 Thread GitBox



cloud-fan commented on PR #36662:
URL: https://github.com/apache/spark/pull/36662#issuecomment-1148337126

   thanks, merging to master/3.3!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

2022-06-07 Thread GitBox



Ngone51 commented on PR #36716:
URL: https://github.com/apache/spark/pull/36716#issuecomment-1148335486

   cc @tgravescs for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang closed pull request #36732: [SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` with `filterNot(condition)`

2022-06-07 Thread GitBox



LuciferYang closed pull request #36732: 
[SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` 
with `filterNot(condition)` 
URL: https://github.com/apache/spark/pull/36732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ulysses-you opened a new pull request, #36785: [SPARK-39397][SQL] Relax AliasAwareOutputExpression to support alias with expression

2022-06-07 Thread GitBox



ulysses-you opened a new pull request, #36785:
URL: https://github.com/apache/spark/pull/36785

   
   
   ### What changes were proposed in this pull request?
   
   Change AliasAwareOutputExpression to using expression rather than attribute 
to track if we can nomalize. So the aliased expression can also preserve the 
output partitioning and ordering.
   
   ### Why are the changes needed?
   
   We will pull out complex join keys from grouping expressions, so the project 
can hold a alias with expression. Unfortunately we may lose the output 
partitioning since the current AliasAwareOutputExpression only support preserve 
the alias with attribute.
   
   For example, the follow query will introduce three exchanges instead of two.
   ```SQL
   SELECT c1 + 1, count(*)
   FROM t1
   JOIN t2 ON c1 + 1 = c2
   GROUP BY c1 + 1
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   no, improve performance
   
   ### How was this patch tested?
   
   add new test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

2022-06-07 Thread GitBox



HyukjinKwon commented on PR #36784:
URL: https://github.com/apache/spark/pull/36784#issuecomment-1148308085

   cc @wangyum FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Resol1992 commented on pull request #30317: [SPARK-33409][SQL] And task killed check in BroadcastNestedLoopJoin to interrupt it after the job is killed

2022-06-07 Thread GitBox



Resol1992 commented on PR #30317:
URL: https://github.com/apache/spark/pull/30317#issuecomment-1148301891

   hi, @constzhou
   Recently, the same issue aslo occurs to me, could I talk with you about this 
issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear

2022-06-07 Thread GitBox



cloud-fan commented on PR #35612:
URL: https://github.com/apache/spark/pull/35612#issuecomment-1148292841

   @AngersZh can you retrigger the tests?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear

2022-06-07 Thread GitBox



cloud-fan commented on code in PR #35612:
URL: https://github.com/apache/spark/pull/35612#discussion_r890851508


##
sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java:
##
@@ -259,7 +260,7 @@ static class HelpOptionExecutor implements 
ServerOptionsExecutor {
 @Override
 public void execute() {
   new HelpFormatter().printHelp(serverName, options);
-  System.exit(0);
+  System.exit(SparkExitCode.EXIT_SUCCESS());

Review Comment:
   ah it's java...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Borjianamin98 commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-07 Thread GitBox



Borjianamin98 commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1148284070

   > @Borjianamin98 Could you please add a test?
   
   I agree. I added a test for this. This is my first experience participating 
in the Spark project and I hope I did well. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ArvinZheng commented on a diff in pull request #35484: [SPARK-38181][SS][DOCS] Update comments in KafkaDataConsumer.scala

2022-06-07 Thread GitBox



ArvinZheng commented on code in PR #35484:
URL: https://github.com/apache/spark/pull/35484#discussion_r890843318


##
connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala:
##
@@ -298,9 +296,10 @@ private[kafka010] class KafkaDataConsumer(
   s"requested $offset")
 
 // The following loop is basically for `failOnDataLoss = false`. When 
`failOnDataLoss` is
-// `false`, first, we will try to fetch the record at `offset`. If no such 
record exists, then
-// we will move to the next available offset within `[offset, 
untilOffset)` and retry.
-// If `failOnDataLoss` is `true`, the loop body will be executed only once.
+// `false`, we will try to fetch the record at `offset`, if the record 
does not exist, we will
+// try to fetch next available record within [offset, untilOffset).
+// If `failOnDataLoss` is `true`, the loop body will be executed only 
once, either return the
+// record at `offset` or throw an exception when the record does not exist

Review Comment:
   thanks, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

[GitHub] [spark] wangyum commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] cloud-fan commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

[GitHub] [spark] cloud-fan closed pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

[GitHub] [spark] cloud-fan commented on pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function

[GitHub] [spark] Ngone51 commented on pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster

[GitHub] [spark] LuciferYang closed pull request #36732: [SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` with `filterNot(condition)`

[GitHub] [spark] ulysses-you opened a new pull request, #36785: [SPARK-39397][SQL] Relax AliasAwareOutputExpression to support alias with expression

[GitHub] [spark] HyukjinKwon commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'

[GitHub] [spark] Resol1992 commented on pull request #30317: [SPARK-33409][SQL] And task killed check in BroadcastNestedLoopJoin to interrupt it after the job is killed

[GitHub] [spark] cloud-fan commented on pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear

[GitHub] [spark] cloud-fan commented on a diff in pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear

[GitHub] [spark] Borjianamin98 commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

[GitHub] [spark] ArvinZheng commented on a diff in pull request #35484: [SPARK-38181][SS][DOCS] Update comments in KafkaDataConsumer.scala

< 1 2 3

201 - 219 of 219 matches

Site Navigation

Mail list logo

Footer information