[GitHub] [spark] chenzhx commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions
chenzhx commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r890253947 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -259,6 +259,55 @@ class V2ExpressionBuilder( } else { None } +case date: DateAdd => + val childrenExpressions = date.children.flatMap(generateExpression(_)) + if (childrenExpressions.length == date.children.length) { +Some(new GeneralScalarExpression("DATE_ADD", childrenExpressions.toArray[V2Expression])) + } else { +None + } +case date: DateDiff => + val childrenExpressions = date.children.flatMap(generateExpression(_)) + if (childrenExpressions.length == date.children.length) { +Some(new GeneralScalarExpression("DATE_DIFF", childrenExpressions.toArray[V2Expression])) + } else { +None + } +case date: TruncDate => + val childrenExpressions = date.children.flatMap(generateExpression(_)) + if (childrenExpressions.length == date.children.length) { +Some(new GeneralScalarExpression("TRUNC", childrenExpressions.toArray[V2Expression])) + } else { +None + } +case Second(child, _) => generateExpression(child) + .map(v => new GeneralScalarExpression("SECOND", Array[V2Expression](v))) +case Minute(child, _) => generateExpression(child) + .map(v => new GeneralScalarExpression("MINUTE", Array[V2Expression](v))) +case Hour(child, _) => generateExpression(child) + .map(v => new GeneralScalarExpression("HOUR", Array[V2Expression](v))) +case Month(child) => generateExpression(child) + .map(v => new GeneralScalarExpression("MONTH", Array[V2Expression](v))) +case Quarter(child) => generateExpression(child) + .map(v => new GeneralScalarExpression("QUARTER", Array[V2Expression](v))) +case Year(child) => generateExpression(child) + .map(v => new GeneralScalarExpression("YEAR", Array[V2Expression](v))) +// The DAY_OF_WEEK function in Spark returns the day of the week for date/timestamp. +// Database dialects do not need to follow ISO semantics when handling DAY_OF_WEEK. Review Comment: Yes. In ISO semantics, 1 represents Monday. But the DayOfWeek function in Spark, 1 represents Sunday. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'
wangyum commented on PR #36784: URL: https://github.com/apache/spark/pull/36784#issuecomment-1148459572 cc @yaooqinn @pan3793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on code in PR #36716: URL: https://github.com/apache/spark/pull/36716#discussion_r890982027 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -725,26 +729,38 @@ private[deploy] class Master( */ private def startExecutorsOnWorkers(): Unit = { // Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app -// in the queue, then the second app, etc. +// in the queue, then the second app, etc. And for each app, we will schedule base on +// resource profiles also with a simple FIFO scheduler, resource profile with smaller id +// first. Review Comment: I'd suggest to schedule in the order of the resource profile reuqests instead of the smaller id first. In the case of the resource profile is resued for later on RDD computation, the samller id doesn't seem to has the priority over other resource profiles. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on code in PR #36716: URL: https://github.com/apache/spark/pull/36716#discussion_r890973724 ## core/src/main/scala/org/apache/spark/deploy/master/ExecutorResourceDescription.scala: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy.master + +import org.apache.spark.resource.ResourceRequirement + +/** + * Describe resource requests for different resource profiles. Used for executor schedule. + * + * @param coresPerExecutor cores for each executor. + * @param memoryMbPerExecutor memory for each executor. + * @param customResourcesPerExecutor custom resource requests for each executor. Review Comment: nit: "resource requests" -> "resource requirements" (I think we also have `ExecutorResourceRequest` so it's good to distugish them carefully.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on code in PR #36716: URL: https://github.com/apache/spark/pull/36716#discussion_r890949860 ## core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala: ## @@ -65,7 +66,70 @@ private[spark] class ApplicationInfo( appSource = new ApplicationSource(this) nextExecutorId = 0 removedExecutors = new ArrayBuffer[ExecutorDesc] -executorLimit = desc.initialExecutorLimit.getOrElse(Integer.MAX_VALUE) +val initialExecutorLimit = desc.initialExecutorLimit.getOrElse(Integer.MAX_VALUE) + +rpIdToResourceProfile = new mutable.HashMap[Int, ResourceProfile]() +rpIdToResourceProfile(DEFAULT_RESOURCE_PROFILE_ID) = desc.defaultProfile +rpIdToResourceDesc = new mutable.HashMap[Int, ExecutorResourceDescription]() +createResourceDescForResourceProfile(desc.defaultProfile) + +targetNumExecutorsPerResourceProfileId = new mutable.HashMap[Int, Int]() +targetNumExecutorsPerResourceProfileId(DEFAULT_RESOURCE_PROFILE_ID) = initialExecutorLimit + +executorsPerResourceProfileId = new mutable.HashMap[Int, mutable.Set[Int]]() + } + + private[deploy] def getOrUpdateExecutorsForRPId(rpId: Int): mutable.Set[Int] = { +executorsPerResourceProfileId.getOrElseUpdate(rpId, mutable.HashSet[Int]()) + } + + private[deploy] def getTargetExecutorNumForRPId(rpId: Int): Int = { +targetNumExecutorsPerResourceProfileId.getOrElse(rpId, 0) + } + + private[deploy] def getRequestedRPIds(): Seq[Int] = { +rpIdToResourceProfile.keys.toSeq.sorted + } + + private def createResourceDescForResourceProfile(resourceProfile: ResourceProfile): Unit = { +if (!rpIdToResourceDesc.contains(resourceProfile.id)) { + val defaultMemoryMbPerExecutor = desc.memoryPerExecutorMB + val defaultCoresPerExecutor = desc.coresPerExecutor + val coresPerExecutor = resourceProfile.getExecutorCores +.orElse(defaultCoresPerExecutor) + val memoryMbPerExecutor = resourceProfile.getExecutorMemory +.map(_.toInt) +.getOrElse(defaultMemoryMbPerExecutor) + val customResources = ResourceUtils.executorResourceRequestToRequirement( +getCustomExecutorResources(resourceProfile).values.toSeq) + + rpIdToResourceDesc(resourceProfile.id) = +ExecutorResourceDescription(coresPerExecutor, memoryMbPerExecutor, customResources) +} + } + + // Get resources required for schedule. + private[deploy] def getResourceDescriptionForRpId(rpId: Int): ExecutorResourceDescription = { +rpIdToResourceDesc(rpId) + } + + private[deploy] def requestExecutors( + resourceProfileToTotalExecs: Map[ResourceProfile, Int]): Unit = { +resourceProfileToTotalExecs.foreach { case (rp, num) => + createResourceDescForResourceProfile(rp) + + if (!rpIdToResourceProfile.contains(rp.id)) { +rpIdToResourceProfile(rp.id) = rp + } + + if (!targetNumExecutorsPerResourceProfileId.get(rp.id).contains(num)) { +targetNumExecutorsPerResourceProfileId(rp.id) = num + } Review Comment: How about: ```suggestion targetNumExecutorsPerResourceProfileId(rp.id) = num ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on code in PR #36716: URL: https://github.com/apache/spark/pull/36716#discussion_r890938856 ## core/src/main/scala/org/apache/spark/deploy/ExecutorDescription.scala: ## @@ -25,10 +25,13 @@ package org.apache.spark.deploy private[deploy] class ExecutorDescription( val appId: String, val execId: Int, +val rpId: Int, val cores: Int, +val memoryMb: Int, Review Comment: > And in master, we can only reconstruct the resource profile information in ApplicationInfo after client send resource request RequestExecutors Does it mean we can't launch new executors with the specific `rpId` until the client sends the request with the corresponding resource profile? For example, if the number of executos with the specific `rpId` hasn't reached the targer number, it seems like we can't schedule new executors for it until we know resource profile by `RequestExecutors`, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a diff in pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on code in PR #36716: URL: https://github.com/apache/spark/pull/36716#discussion_r890912848 ## core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala: ## @@ -19,23 +19,28 @@ package org.apache.spark.deploy import java.net.URI -import org.apache.spark.resource.ResourceRequirement +import org.apache.spark.resource.{ResourceProfile, ResourceRequirement, ResourceUtils} +import org.apache.spark.resource.ResourceProfile.getCustomExecutorResources private[spark] case class ApplicationDescription( name: String, maxCores: Option[Int], -memoryPerExecutorMB: Int, command: Command, appUiUrl: String, +defaultProfile: ResourceProfile, eventLogDir: Option[URI] = None, // short name of compression codec used when writing event logs, if any (e.g. lzf) eventLogCodec: Option[String] = None, -coresPerExecutor: Option[Int] = None, // number of executors this application wants to start with, // only used if dynamic allocation is enabled initialExecutorLimit: Option[Int] = None, -user: String = System.getProperty("user.name", ""), -resourceReqsPerExecutor: Seq[ResourceRequirement] = Seq.empty) { +user: String = System.getProperty("user.name", "")) { + + def memoryPerExecutorMB: Int = defaultProfile.getExecutorMemory.map(_.toInt).getOrElse(1024) + def coresPerExecutor: Option[Int] = defaultProfile.getExecutorCores + def resourceReqsPerExecutor: Seq[ResourceRequirement] = +ResourceUtils.executorResourceRequestToRequirement( + getCustomExecutorResources(defaultProfile).values.toSeq.sortBy(_.resourceName)) Review Comment: Is `sortBy(_.resourceName)` necessary? I didn't see we sort it in `ApplicationInfo`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types
cloud-fan commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r890902366 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2881,6 +2881,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val DEFAULT_COLUMN_ALLOWED_PROVIDERS = +buildConf("spark.sql.defaultColumn.allowedProviders") + .internal() + .doc("List of table providers wherein SQL commands are permitted to assign DEFAULT column " + +"values. Comma-separated list, whitespace ignored, case-insensitive.") + .version("3.4.0") + .stringConf + .createWithDefault("csv,json,orc,parquet") Review Comment: This is fine for now. But eventually, I think we need a data source API for a source to report if it supports default value or not, instead of asking end-users to set this conf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function
cloud-fan closed pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function URL: https://github.com/apache/spark/pull/36662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #36662: [SPARK-39286][DOC] Update documentation for the decode function
cloud-fan commented on PR #36662: URL: https://github.com/apache/spark/pull/36662#issuecomment-1148337126 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #36716: [SPARK-39062][CORE] Add stage level resource scheduling support for standalone cluster
Ngone51 commented on PR #36716: URL: https://github.com/apache/spark/pull/36716#issuecomment-1148335486 cc @tgravescs for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang closed pull request #36732: [SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` with `filterNot(condition)`
LuciferYang closed pull request #36732: [SPARK-39345][CORE][SQL][DSTREAM][ML][MESOS][SS] Replace `filter(!condition)` with `filterNot(condition)` URL: https://github.com/apache/spark/pull/36732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you opened a new pull request, #36785: [SPARK-39397][SQL] Relax AliasAwareOutputExpression to support alias with expression
ulysses-you opened a new pull request, #36785: URL: https://github.com/apache/spark/pull/36785 ### What changes were proposed in this pull request? Change AliasAwareOutputExpression to using expression rather than attribute to track if we can nomalize. So the aliased expression can also preserve the output partitioning and ordering. ### Why are the changes needed? We will pull out complex join keys from grouping expressions, so the project can hold a alias with expression. Unfortunately we may lose the output partitioning since the current AliasAwareOutputExpression only support preserve the alias with attribute. For example, the follow query will introduce three exchanges instead of two. ```SQL SELECT c1 + 1, count(*) FROM t1 JOIN t2 ON c1 + 1 = c2 GROUP BY c1 + 1 ``` ### Does this PR introduce _any_ user-facing change? no, improve performance ### How was this patch tested? add new test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36784: [SPARK-39396][SQL] Fix LDAP login exception 'error code 49 - invalid credentials'
HyukjinKwon commented on PR #36784: URL: https://github.com/apache/spark/pull/36784#issuecomment-1148308085 cc @wangyum FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Resol1992 commented on pull request #30317: [SPARK-33409][SQL] And task killed check in BroadcastNestedLoopJoin to interrupt it after the job is killed
Resol1992 commented on PR #30317: URL: https://github.com/apache/spark/pull/30317#issuecomment-1148301891 hi, @constzhou Recently, the same issue aslo occurs to me, could I talk with you about this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear
cloud-fan commented on PR #35612: URL: https://github.com/apache/spark/pull/35612#issuecomment-1148292841 @AngersZh can you retrigger the tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #35612: [SPARK-38289][SQL] Refactor SQL CLI exit code to make it more clear
cloud-fan commented on code in PR #35612: URL: https://github.com/apache/spark/pull/35612#discussion_r890851508 ## sql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java: ## @@ -259,7 +260,7 @@ static class HelpOptionExecutor implements ServerOptionsExecutor { @Override public void execute() { new HelpFormatter().printHelp(serverName, options); - System.exit(0); + System.exit(SparkExitCode.EXIT_SUCCESS()); Review Comment: ah it's java... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Borjianamin98 commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types
Borjianamin98 commented on PR #36781: URL: https://github.com/apache/spark/pull/36781#issuecomment-1148284070 > @Borjianamin98 Could you please add a test? I agree. I added a test for this. This is my first experience participating in the Spark project and I hope I did well. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ArvinZheng commented on a diff in pull request #35484: [SPARK-38181][SS][DOCS] Update comments in KafkaDataConsumer.scala
ArvinZheng commented on code in PR #35484: URL: https://github.com/apache/spark/pull/35484#discussion_r890843318 ## connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala: ## @@ -298,9 +296,10 @@ private[kafka010] class KafkaDataConsumer( s"requested $offset") // The following loop is basically for `failOnDataLoss = false`. When `failOnDataLoss` is -// `false`, first, we will try to fetch the record at `offset`. If no such record exists, then -// we will move to the next available offset within `[offset, untilOffset)` and retry. -// If `failOnDataLoss` is `true`, the loop body will be executed only once. +// `false`, we will try to fetch the record at `offset`, if the record does not exist, we will +// try to fetch next available record within [offset, untilOffset). +// If `failOnDataLoss` is `true`, the loop body will be executed only once, either return the +// record at `offset` or throw an exception when the record does not exist Review Comment: thanks, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org