[GitHub] [spark] TJX2014 opened a new pull request #28291: [SPARK-31400][ML,MLlib]The catalogString doesn't distinguish Vectors in ml and mllib
TJX2014 opened a new pull request #28291: URL: https://github.com/apache/spark/pull/28291 ### What changes were proposed in this pull request? add package check in org.apache.spark.ml.util.SchemaUtils#checkColumnType and add unit test in org.scalatest.FunSuiteLike#test ### Why are the changes needed? the catalogString doesn't distinguish Vectors in ml and mllib when mllib vector misused in ml [https://issues.apache.org/jira/browse/SPARK-31400](url) ### Does this PR introduce any user-facing change? No ### How was this patch tested? Unit test is added including both negative and positive cases. `test("Valid class of dataType") { val schema = StructType(Array[StructField] { StructField("features", new org.apache.spark.mllib.linalg.VectorUDT) }) val e = intercept[IllegalArgumentException] { SchemaUtils.checkColumnType(schema, "features", new VectorUDT) } assert(e.getMessage.contains("must belong to ml package"), "package is not valid to ml") val normalSchema = StructType(Array[StructField] { StructField("features", new VectorUDT) }) SchemaUtils.checkColumnType(normalSchema, "features", new VectorUDT) }` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28272: [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to `java.sql.Date`
MaxGekk commented on issue #28272: URL: https://github.com/apache/spark/pull/28272#issuecomment-617564224 @cloud-fan @HyukjinKwon @dongjoon-hyun Please, take a look at the PR if you have time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
SparkQA commented on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617560789 **[Test build #121607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)** for PR 28290 at commit [`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
SparkQA removed a comment on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617557202 **[Test build #121607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)** for PR 28290 at commit [`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
AmplabJenkins commented on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617560892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
AmplabJenkins removed a comment on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617560892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
huaxingao commented on a change in pull request #28290: URL: https://github.com/apache/spark/pull/28290#discussion_r412681949 ## File path: docs/sql-getting-started.md ## @@ -347,7 +347,8 @@ For example: ## Scalar Functions -(to be filled soon) + +Scalar functions are functions that return a single value per row, as opposed to aggregation functions, which return a value for a group of rows. Spark SQL supports a variety of built-in scalar functions such as [Array Functions](sql-ref-functions-builtin.html#array-functions), [Map Functions](sql-ref-functions-builtin.html#map-functions), [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions), etc. It also supports [User Defined Scalar Functions](sql-ref-functions-udf-scalar.html). Review comment: I used Scalar function instead of built-in function (because the next paragraph is for aggregations). I don't want to talk about window function because I think window function is still sort of aggregation, even thought it returns result for each row. I can't link it to builtin function because builtin function contains aggregation function, so I link to each group of the builtin scalar functions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
maropu commented on a change in pull request #28290: URL: https://github.com/apache/spark/pull/28290#discussion_r412678792 ## File path: docs/sql-getting-started.md ## @@ -347,7 +347,8 @@ For example: ## Scalar Functions -(to be filled soon) + +Scalar functions are functions that return a single value per row, as opposed to aggregation functions, which return a value for a group of rows. Spark SQL supports a variety of built-in scalar functions such as [Array Functions](sql-ref-functions-builtin.html#array-functions), [Map Functions](sql-ref-functions-builtin.html#map-functions), [Date and Timestamp Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions), etc. It also supports [User Defined Scalar Functions](sql-ref-functions-udf-scalar.html). Review comment: Any reason not to describe aggregate/window functions? Also, I think its better not to describe each function group name here because we might forget to update this page every time we add a new group. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
AmplabJenkins removed a comment on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617555485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
SparkQA commented on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617557202 **[Test build #121607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)** for PR 28290 at commit [`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
huaxingao commented on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617556913 @HyukjinKwon @maropu I quickly wrote a paragraph for scalar function. It has to be scalar function because the following paragraph is for aggregation. The definition for scalar function is a bit confusing. Snowflake (https://docs.snowflake.com/en/sql-reference/functions.html) and Oracle (https://docs.oracle.com/cd/E57185_01/IRWUG/ch12s04s01.html) define scalar function as functions opposite to aggregation function, but seems IBM (https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/apsg/src/tpc/db2z_sqlscalarfn.html) uses scalar function for scalar UDF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
AmplabJenkins commented on issue #28290: URL: https://github.com/apache/spark/pull/28290#issuecomment-617555485 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started
huaxingao opened a new pull request #28290: URL: https://github.com/apache/spark/pull/28290 ### What changes were proposed in this pull request? Add a paragraph for scalar function in sql getting started ### Why are the changes needed? To make 3.0 doc complete. ### Does this PR introduce any user-facing change? before: https://user-images.githubusercontent.com/13592258/79943182-16d1fd00-841d-11ea-9744-9cdd58d83f81.png;> after: https://user-images.githubusercontent.com/13592258/79943178-133e7600-841d-11ea-91dc-ff66485b679a.png;> ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename
AmplabJenkins removed a comment on issue #28289: URL: https://github.com/apache/spark/pull/28289#issuecomment-617544598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename
AmplabJenkins commented on issue #28289: URL: https://github.com/apache/spark/pull/28289#issuecomment-617544598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412663151 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: thanks @dongjoon-hyun. PR description updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename
SparkQA commented on issue #28289: URL: https://github.com/apache/spark/pull/28289#issuecomment-617544275 **[Test build #121606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121606/testReport)** for PR 28289 at commit [`59e99f6`](https://github.com/apache/spark/commit/59e99f6d7aad7c464c68fbb6e568f9347f6774ac). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename
viirya commented on issue #28289: URL: https://github.com/apache/spark/pull/28289#issuecomment-617543743 cc @cloud-fan @xuanyuanking @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename
viirya opened a new pull request #28289: URL: https://github.com/apache/spark/pull/28289 ### What changes were proposed in this pull request? As suggested by https://github.com/apache/spark/pull/28255#discussion_r412619438, this patch proposes to use taskAttemptId in checkpoint filename, instead of stageAttemptNumber + attemptNumber. ### Why are the changes needed? To simplify checkpoint simplified and unique. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
HyukjinKwon commented on a change in pull request #28085: URL: https://github.com/apache/spark/pull/28085#discussion_r412644686 ## File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ## @@ -1135,6 +1136,27 @@ private[spark] class DAGScheduler( } } + /** + * `PythonRunner` needs to know what the pyspark memory and cores settings are for the profile + * being run. Pass them in the local properties of the task if it's set for the stage profile. + */ + private def addPysparkConfigsToProperties(stage: Stage, properties: Properties): Unit = { Review comment: sorry, nit:`addPysparkConfigsToProperties` -> `addPysparkConfigsToProperties` ## File path: python/pyspark/resource/executorrequests.py ## @@ -0,0 +1,169 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +from pyspark.resource.taskrequests import TaskResourceRequest +from pyspark.util import _parse_memory + + +class ExecutorResourceRequest(object): +""" +.. note:: Evolving + +An Executor resource request. This is used in conjunction with the ResourceProfile to +programmatically specify the resources needed for an RDD that will be applied at the +stage level. + +This is used to specify what the resource requirements are for an Executor and how +Spark can find out specific details about those resources. Not all the parameters are +required for every resource type. Resources like GPUs are supported and have same limitations +as using the global spark configs spark.executor.resource.gpu.*. The amount, discoveryScript, +and vendor parameters for resources are all the same parameters a user would specify through the +configs: spark.executor.resource.{resourceName}.{amount, discoveryScript, vendor}. + +For instance, a user wants to allocate an Executor with GPU resources on YARN. The user has +to specify the resource name (gpu), the amount or number of GPUs per Executor, +the discovery script would be specified so that when the Executor starts up it can +discovery what GPU addresses are available for it to use because YARN doesn't tell +Spark that, then vendor would not be used because its specific for Kubernetes. + +See the configuration and cluster specific docs for more details. + +Use `pyspark.ExecutorResourceRequests` class as a convenience API. + +:param resourceName: Name of the resource +:param amount: Amount requesting +:param discoveryScript: Optional script used to discover the resources. This is required on some +cluster managers that don't tell Spark the addresses of the resources +allocated. The script runs on Executors startup to discover the addresses +of the resources available. +:param vendor: Vendor, required for some cluster managers + +.. versionadded:: 3.1.0 +""" +def __init__(self, resourceName, amount, discoveryScript="", vendor=""): +self._name = resourceName +self._amount = amount +self._discoveryScript = discoveryScript Review comment: nit `_discoveryScript` -> `_discovery_script` ## File path: python/pyspark/rdd.py ## @@ -2587,6 +2616,7 @@ def pipeline_func(split, iterator): self._prev_jrdd = prev._prev_jrdd # maintain the pipeline self._prev_jrdd_deserializer = prev._prev_jrdd_deserializer self.is_cached = False +self.has_resourceProfile = False Review comment: nit `has_resourceProfile` -> `has_resource_profile`. Basically the logic is that we should keep everything `a_b_c` when it's not an API inspired from Java (or Scala API). IIRC, this logic complies pep8. ## File path: python/pyspark/resource/executorrequests.py ## @@ -0,0 +1,169 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at
[GitHub] [spark] viirya commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception
viirya commented on a change in pull request #28255: URL: https://github.com/apache/spark/pull/28255#discussion_r412659291 ## File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala ## @@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging { val finalOutputName = ReliableCheckpointRDD.checkpointFileName(ctx.partitionId()) val finalOutputPath = new Path(outputDir, finalOutputName) -val tempOutputPath = - new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}") +val tempOutputPath = new Path(outputDir, + s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}") Review comment: OK. Let me create a follow-up for it. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
SparkQA commented on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-617540272 **[Test build #121605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121605/testReport)** for PR 28288 at commit [`4961eb3`](https://github.com/apache/spark/commit/4961eb39e282d6b98ca00cc7e4aa49380948ed73). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
AmplabJenkins removed a comment on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-617538573 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
AmplabJenkins removed a comment on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617538904 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121601/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
AmplabJenkins removed a comment on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617538900 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
AmplabJenkins commented on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617538904 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
AmplabJenkins commented on issue #28288: URL: https://github.com/apache/spark/pull/28288#issuecomment-617538573 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
SparkQA removed a comment on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617500691 **[Test build #121601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)** for PR 28260 at commit [`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
SparkQA commented on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617538600 **[Test build #121601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)** for PR 28260 at commit [`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking opened a new pull request #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone
xuanyuanking opened a new pull request #28288: URL: https://github.com/apache/spark/pull/28288 ### What changes were proposed in this pull request? Override the canonicalized fields with respect to the result of `needsTimeZone`. ### Why are the changes needed? The current approach breaks sematic equal of two cast expressions that don't relate with datetime type. If we don't need to use `timeZone` information casting `from` type to `to` type, then the timeZoneId should not influence the canonicalize result. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? New UT added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412650911 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Got it. Thanks. I understand that Snowflake is considered important. Could you add some of the above comment(https://github.com/apache/spark/pull/28284#discussion_r412621357) to the PR description then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412648326 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Specifically, snowflake use dayofweek_iso except db2. I couldn't find more because most platforms do not have two day-of-week system implemented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412648326 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Specifically, snowflake use dayofweek_iso except db2. I couldn't find more because most platforms do not have two week-numbering system implemented. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched
mridulm commented on a change in pull request #28257: URL: https://github.com/apache/spark/pull/28257#discussion_r412639758 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -401,9 +401,7 @@ private[spark] class TaskSchedulerImpl( // addresses are the same as that we allocated in taskResourceAssignments since it's // synchronized. We don't remove the exact addresses allocated because the current // approach produces the identical result with less time complexity. -availableResources(i).getOrElse(rName, - throw new SparkException(s"Try to acquire resource $rName that doesn't exist.")) Review comment: Agree, this can be removed and simplified as below. ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -468,8 +466,9 @@ private[spark] class TaskSchedulerImpl( resourceProfileIds: Array[Int], availableCpus: Array[Int], availableResources: Array[Map[String, Buffer[String]]], - rpId: Int): Int = { -val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(rpId) + taskSet: TaskSetManager): Int = { +val resourceProfile = sc.resourceProfileManager.resourceProfileFromId( + taskSet.taskSet.resourceProfileId) val offersForResourceProfile = resourceProfileIds.zipWithIndex.filter { case (id, _) => Review comment: Any particular reason to change this ? ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -675,11 +676,15 @@ private[spark] class TaskSchedulerImpl( // Check whether the barrier tasks are partially launched. // TODO SPARK-24818 handle the assert failure case (that can happen when some locality // requirements are not fulfilled, and we should revert the launched tasks). - require(addressesWithDescs.size == taskSet.numTasks, -s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " + + if (addressesWithDescs.size != taskSet.numTasks) { +val errorMsg = + s"Fail current round of resource offers for barrier stage ${taskSet.stageId} " + s"because only ${addressesWithDescs.size} out of a total number of " + s"${taskSet.numTasks} tasks got resource offers. The resource offers may have " + - "been blacklisted or cannot fulfill task locality requirements.") + "been blacklisted or cannot fulfill task locality requirements." +logError(errorMsg) Review comment: nit: logWarning ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -675,11 +676,15 @@ private[spark] class TaskSchedulerImpl( // Check whether the barrier tasks are partially launched. // TODO SPARK-24818 handle the assert failure case (that can happen when some locality // requirements are not fulfilled, and we should revert the launched tasks). - require(addressesWithDescs.size == taskSet.numTasks, -s"Skip current round of resource offers for barrier stage ${taskSet.stageId} " + Review comment: This is the primary issue causing the bug - and is overly strict. Unfortunately, the way barrier scheduling has been written, the only way to get it working currently is to disable delay scheduling entirely. We will need to relook how to get barrier scheduling working more gracefully with task locality and executor failures (and in future, dynamic allocation). ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -484,9 +483,11 @@ private[spark] class TaskSchedulerImpl( numTasksPerExecCores } else { val taskLimit = resourceProfile.taskResources.get(limitingResource).map(_.amount) - .getOrElse(throw new SparkException("limitingResource returns from ResourceProfile" + -s" $resourceProfile doesn't actually contain that task resource!") - ) + .getOrElse { +taskSet.abort("limitingResource returns from ResourceProfile " + + s"$resourceProfile doesn't actually contain that task resource!") +return -1 Review comment: We still need to throw the exception after the abort - else subsequent code will continue. This is a repeated pattern in this PR - when replacing an exception/require/etc (which would have thrown an exception earlier) with a task set abort, please ensure that the exception continues to be thrown. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries
[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
AmplabJenkins removed a comment on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617529054 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
AmplabJenkins removed a comment on issue #28085: URL: https://github.com/apache/spark/pull/28085#issuecomment-617529211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
AmplabJenkins commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617529054 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
AmplabJenkins commented on issue #28085: URL: https://github.com/apache/spark/pull/28085#issuecomment-617529211 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
SparkQA commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617528964 **[Test build #121603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)** for PR 28285 at commit [`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
SparkQA commented on issue #28085: URL: https://github.com/apache/spark/pull/28085#issuecomment-617528793 **[Test build #121604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121604/testReport)** for PR 28085 at commit [`528094c`](https://github.com/apache/spark/commit/528094cdf0266a63080a92c42354ee2bc7037bf7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
SparkQA removed a comment on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617520487 **[Test build #121603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)** for PR 28285 at commit [`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests
HyukjinKwon commented on issue #28085: URL: https://github.com/apache/spark/pull/28085#issuecomment-617528646 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins removed a comment on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617525552 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO day-of-week and a newly added `isodow` from PostgreSQL for ISO day-of-week. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso`
[GitHub] [spark] AmplabJenkins commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins commented on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617525552 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA removed a comment on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617454217 **[Test build #121598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121598/testReport)** for PR 28208 at commit [`da88978`](https://github.com/apache/spark/commit/da889787c2f430daa82802fea275d21b43b5d06a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA commented on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617525009 **[Test build #121598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121598/testReport)** for PR 28208 at commit [`da88978`](https://github.com/apache/spark/commit/da889787c2f430daa82802fea275d21b43b5d06a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412638011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: @yaooqinn . Ya. It's clear for that part. Could you enumerate the name of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering that specifically. It's because you wrote like `Many other systems` in the PR description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412638011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: @yaooqinn . Ya. It's clear for that part. Could you enumerate the name of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering that specifically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412638011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: @yaooqinn . Ya. It's clear for that part. But, I asked the explanation about your PR description. > isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms Let me ask again specifically. Could you enumerate the name of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail:
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412638011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: @yaooqinn . I asked the explanation about your PR description. > isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms Let me ask again specifically? Could you enumerate the name of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] [spark] HeartSaVioR commented on issue #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
HeartSaVioR commented on issue #28215: URL: https://github.com/apache/spark/pull/28215#issuecomment-617524078 Oh, nice catch. That compilation error seems to only come with JDK11. That said, Spark PR builder may need to run at least two different environments; that's lucky we found the issue from Github Action, but it has been the source of false alarm hence not considered seriously in many case. E.g. we may not able to find this if we see failure in Linter build side on Github Action, as building Spark will be cancelled if Linter build fails. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
dongjoon-hyun commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412638011 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: @yaooqinn . Ya. It's clear for that part. But, I asked the explanation about your PR description. > isodow is PostgreSQL specific but iso as a suffix is more commonly used across platforms Let me ask again specifically? Could you enumerate the name of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] [spark] xuanyuanking commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception
xuanyuanking commented on a change in pull request #28255: URL: https://github.com/apache/spark/pull/28255#discussion_r412636713 ## File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala ## @@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging { val finalOutputName = ReliableCheckpointRDD.checkpointFileName(ctx.partitionId()) val finalOutputPath = new Path(outputDir, finalOutputName) -val tempOutputPath = - new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}") +val tempOutputPath = new Path(outputDir, + s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}") Review comment: Yes, `taskAttemptId`, we also use it in the shuffle map file for making the file name unique. https://github.com/apache/spark/pull/24892#issuecomment-526215837 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
HeartSaVioR commented on a change in pull request #28215: URL: https://github.com/apache/spark/pull/28215#discussion_r412635508 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala ## @@ -21,15 +21,11 @@ import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} import java.util.Properties -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest - @DockerTest -@Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker Review comment: Sorry I should be more clearer - the change is removed, in other words, rolled back. No change. ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala ## @@ -21,15 +21,11 @@ import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} import java.util.Properties -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest - @DockerTest -@Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker Review comment: Sorry for the confusion, I should be more clearer - the change is removed, in other words, rolled back. No change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on issue #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
vanzin commented on issue #28215: URL: https://github.com/apache/spark/pull/28215#issuecomment-617521569 The check failure is valid, it's failing to compile in that environment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
vanzin commented on a change in pull request #28215: URL: https://github.com/apache/spark/pull/28215#discussion_r412634890 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala ## @@ -21,15 +21,11 @@ import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} import java.util.Properties -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest - @DockerTest -@Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker Review comment: I don't see it removed. It's still there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
AmplabJenkins commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617520936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
AmplabJenkins removed a comment on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617520936 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
SparkQA commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617520487 **[Test build #121603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)** for PR 28285 at commit [`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] johnlinp edited a comment on issue #25785: [SPARK-27812][CORE] Explicit System.exit after job's main
johnlinp edited a comment on issue #25785: URL: https://github.com/apache/spark/pull/25785#issuecomment-617518819 @igorcalabria After upgrading Spark to 2.4.5 (with your PR #26152 included), there is no more non-daemon threads produced by Spark. In addition to that, `SparkSession` implements `Closeable`, so shutdown hooks should call `stop()` implicitly. Does that mean we don't need to explicitly call `stop()` in application anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
AmplabJenkins removed a comment on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617472629 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26279/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build
HyukjinKwon commented on issue #28285: URL: https://github.com/apache/spark/pull/28285#issuecomment-617519305 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] johnlinp commented on issue #25785: [SPARK-27812][CORE] Explicit System.exit after job's main
johnlinp commented on issue #25785: URL: https://github.com/apache/spark/pull/25785#issuecomment-617518819 @igorcalabria After upgrading Spark to 2.4.5 (with your PR #26152 included), there is no more non-daemon threads produced by Spark. Does that mean we don't need to call `stop()` in application anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
HyukjinKwon commented on issue #28224: URL: https://github.com/apache/spark/pull/28224#issuecomment-617518085 Thanks @huaxingao! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception
viirya commented on a change in pull request #28255: URL: https://github.com/apache/spark/pull/28255#discussion_r412628546 ## File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala ## @@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging { val finalOutputName = ReliableCheckpointRDD.checkpointFileName(ctx.partitionId()) val finalOutputPath = new Path(outputDir, finalOutputName) -val tempOutputPath = - new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}") +val tempOutputPath = new Path(outputDir, + s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}") Review comment: Do you mean `taskAttemptId`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412626974 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) + private lazy val addOrReplaceExprs = names.zip(valExprs) + + override def checkInputDataTypes(): TypeCheckResult = { +val expectedStructType = StructType(Nil).typeName +if (children.size % 2 == 0) { + TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of arguments.") +} else if (ogStructExpr.dataType.typeName != expectedStructType) { + TypeCheckResult.TypeCheckFailure( +s"Only $expectedStructType is allowed to appear at first position, got: " + + s"${ogStructExpr.dataType.typeName}.") +} else if (names.contains(null)) { + TypeCheckResult.TypeCheckFailure("Field name should not be null.") +} else if (nameExprs.exists(e => !(e.foldable && e.dataType == StringType))) { Review comment: We could but where do you recommend putting the function? I don't think it's worth the characters saved to create a shared Utilities object just for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO week of week-based-year and a newly added `isodow` from PostgreSQL for ISO week of week-based-year. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g.
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO week of week-based-year and a newly added `isodow` from PostgreSQL for ISO week of week-based-year. Many other systems only have one week-numbering system support and use either full names or abbreviations. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g.
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412623176 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) + private lazy val addOrReplaceExprs = names.zip(valExprs) + + override def checkInputDataTypes(): TypeCheckResult = { +val expectedStructType = StructType(Nil).typeName +if (children.size % 2 == 0) { + TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of arguments.") +} else if (ogStructExpr.dataType.typeName != expectedStructType) { + TypeCheckResult.TypeCheckFailure( +s"Only $expectedStructType is allowed to appear at first position, got: " + + s"${ogStructExpr.dataType.typeName}.") +} else if (names.contains(null)) { + TypeCheckResult.TypeCheckFailure("Field name should not be null.") +} else if (nameExprs.exists(e => !(e.foldable && e.dataType == StringType))) { + TypeCheckResult.TypeCheckFailure( +s"Only foldable ${StringType.catalogString} expressions are allowed to appear at even " + + "position.") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: StructType = { +val ogStructFields = ogStructType.fields.map { f => + (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable)) +} +val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) => + (name, StructField(name, expr.dataType, expr.nullable)) +} +val newStructFields = addOrReplace(ogStructFields, addOrReplaceStructFields).map(_._2) +StructType(newStructFields) + } + + override def foldable: Boolean = children.forall(_.foldable) Review comment: I understand your point about `nameExprs`. What about the original `structExpr` though? i.e. shouldn't it be `override def foldable: Boolean = structExpr.foldable && valExprs.forall(_.foldable)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412622174 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) + private lazy val addOrReplaceExprs = names.zip(valExprs) + + override def checkInputDataTypes(): TypeCheckResult = { +val expectedStructType = StructType(Nil).typeName +if (children.size % 2 == 0) { + TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of arguments.") +} else if (ogStructExpr.dataType.typeName != expectedStructType) { + TypeCheckResult.TypeCheckFailure( +s"Only $expectedStructType is allowed to appear at first position, got: " + + s"${ogStructExpr.dataType.typeName}.") +} else if (names.contains(null)) { + TypeCheckResult.TypeCheckFailure("Field name should not be null.") +} else if (nameExprs.exists(e => !(e.foldable && e.dataType == StringType))) { + TypeCheckResult.TypeCheckFailure( +s"Only foldable ${StringType.catalogString} expressions are allowed to appear at even " + + "position.") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: StructType = { +val ogStructFields = ogStructType.fields.map { f => + (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable)) +} +val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) => + (name, StructField(name, expr.dataType, expr.nullable)) +} +val newStructFields = addOrReplace(ogStructFields, addOrReplaceStructFields).map(_._2) +StructType(newStructFields) + } + + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = false + + override def prettyName: String = "add_fields" + + def toCreateNamedStruct: CreateNamedStruct = { +val ogStructExprs = ogStructType.fieldNames.zipWithIndex.map { + case (name, i) => (name, GetStructField(ogStructExpr, i, Some(name))) +} +val newStructExprs = addOrReplace(ogStructExprs, addOrReplaceExprs).flatMap { + case (name, valExpr) => Seq(Literal(name), valExpr) +} +CreateNamedStruct(newStructExprs) + } + + def addOrReplace[T]( +existingFields: Seq[(String, T)], +addOrReplaceFields: Seq[(String, T)]): Seq[(String, T)] = { + +addOrReplaceFields.foldLeft(existingFields) { case (newFields, field @ (fieldName, _)) => + if (newFields.exists { case (name, _) => name == fieldName }) { Review comment: Unfortunately, the existing struct might contain multiple fields with the same name (there is a test case for this scenario). As a result, I don't think this can be rewritten in a way that is meaningfully simpler using a map. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO week of week-based-year and a newly added `isodow` from PostgreSQL for ISO week of week-based-year. Many other systems only have one week-numbering system support. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow` implemented for representing a non-ISO week of week-based-year and a newly added `isodow` for ISO week of week-based-year. Many other systems do not only have one week-numbering system support. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use cases
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO week of week-based-year and a newly added `isodow` from PostgreSQL for ISO week of week-based-year. Many other systems do not only have one week-numbering system support. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and
[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function
yaooqinn commented on a change in pull request #28284: URL: https://github.com/apache/spark/pull/28284#discussion_r412621357 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -2033,108 +1991,26 @@ case class MakeTimestamp( override def prettyName: String = "make_timestamp" } -case class Millennium(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getMillennium(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)") - } -} - -case class Century(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getCentury(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)") - } -} - -case class Decade(child: Expression) extends UnaryExpression with ImplicitCastInputTypes { - - override def inputTypes: Seq[AbstractDataType] = Seq(DateType) - - override def dataType: DataType = IntegerType - - override protected def nullSafeEval(date: Any): Any = { -DateTimeUtils.getDecade(date.asInstanceOf[Int]) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") - } -} - -case class Epoch(child: Expression, timeZoneId: Option[String] = None) -extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { - - override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) - // DecimalType is used to not lose precision while converting microseconds to - // the fractional part of seconds. Scale 6 is taken to have all microseconds as - // the fraction. The precision 20 should cover whole valid range of years [1, ] - // plus negative years that can be used in some cases though are not officially supported. - override def dataType: DataType = DecimalType(20, 6) - override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = -copy(timeZoneId = Option(timeZoneId)) - - override protected def nullSafeEval(timestamp: Any): Any = { -DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId) - } - - override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val dtu = DateTimeUtils.getClass.getName.stripSuffix("$") -val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName) -defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)") - } -} - object DatePart { def parseExtractField( extractField: String, source: Expression, errorHandleFunc: => Nothing): Expression = extractField.toUpperCase(Locale.ROOT) match { -case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source) -case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source) -case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source) case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source) -case "ISOYEAR" => IsoYear(source) +case "YEAROFWEEK" => YearOfWeek(source) case "QUARTER" | "QTR" => Quarter(source) case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source) case "WEEK" | "W" | "WEEKS" => WeekOfYear(source) case "DAY" | "D" | "DAYS" => DayOfMonth(source) case "DAYOFWEEK" | "DOW" => DayOfWeek(source) -case "ISODOW" => Add(WeekDay(source), Literal(1)) +case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1)) Review comment: Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] implemented for representing a non-ISO week of week-based-year and a newly added `isodow` for ISO week of week-based-year. Many other systems do not only have one week-numbering system support. Things in spark become a little bit complicated. 1. because of the existence of `isodow`, so we need to add iso-prefix to `dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and `isodow`] 2. because there are rare `iso`-prefixed systems and more systems choose `iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, `dowiso`] 3. `dayofweekiso` looks nice and has use cases in the platforms listed above, e.g. snowflake, but `dowiso` looks weird and no use cases
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412620131 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) + private lazy val addOrReplaceExprs = names.zip(valExprs) + + override def checkInputDataTypes(): TypeCheckResult = { +val expectedStructType = StructType(Nil).typeName +if (children.size % 2 == 0) { + TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of arguments.") +} else if (ogStructExpr.dataType.typeName != expectedStructType) { + TypeCheckResult.TypeCheckFailure( +s"Only $expectedStructType is allowed to appear at first position, got: " + + s"${ogStructExpr.dataType.typeName}.") +} else if (names.contains(null)) { + TypeCheckResult.TypeCheckFailure("Field name should not be null.") +} else if (nameExprs.exists(e => !(e.foldable && e.dataType == StringType))) { + TypeCheckResult.TypeCheckFailure( +s"Only foldable ${StringType.catalogString} expressions are allowed to appear at even " + + "position.") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: StructType = { +val ogStructFields = ogStructType.fields.map { f => + (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable)) +} +val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) => + (name, StructField(name, expr.dataType, expr.nullable)) +} +val newStructFields = addOrReplace(ogStructFields, addOrReplaceStructFields).map(_._2) +StructType(newStructFields) Review comment: done. ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) + private lazy val addOrReplaceExprs = names.zip(valExprs) + + override def checkInputDataTypes():
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412619800 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head Review comment: og = original. Since this isn't as obvious as I thought, I've changed the variable name to be clearer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception
cloud-fan commented on a change in pull request #28255: URL: https://github.com/apache/spark/pull/28255#discussion_r412619438 ## File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala ## @@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging { val finalOutputName = ReliableCheckpointRDD.checkpointFileName(ctx.partitionId()) val finalOutputPath = new Path(outputDir, finalOutputName) -val tempOutputPath = - new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}") +val tempOutputPath = new Path(outputDir, + s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}") Review comment: If we just want a unique file name, can we use the task id? It's unique within the Spark application. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412619541 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { + + private lazy val ogStructExpr = children.head + private lazy val ogStructType = ogStructExpr.dataType.asInstanceOf[StructType] + private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map { +case Seq(name, value) => (name, value) + }.toList.unzip + private lazy val names = nameExprs.map(e => Option(e.eval()).map(_.toString).orNull) Review comment: Your suggestion will return a `Seq[Any]`. We need `Seq[String]` for the code to compile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class
fqaiser94 commented on a change in pull request #27066: URL: https://github.com/apache/spark/pull/27066#discussion_r412619329 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala ## @@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: Expression, keyValueDelim: E override def prettyName: String = "str_to_map" } + +/** + * Adds/replaces fields in struct by name. + * + * @param children Seq(struct, name1, val1, name2, val2, ...) + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces fields in struct by name.", + examples = """ +Examples: + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3); + {"a":1,"b":2,"c":3} + > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3); + {"a":1,"b":3} + > SELECT _FUNC_(CAST(NULL AS struct), "c", 3); + {"a":null,"b":null,"c":3} + > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 'b', 2, 'b', 3)) AS T(a)); + {"a":1,"b":100,"b":100} + > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES (NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a)); + {"a":{"a":1,"b":2,"c":3}} + """) +// scalastyle:on line.size.limit +case class AddFields(children: Seq[Expression]) extends Unevaluable { Review comment: You're right, makes more sense to call it `WithFields`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #27066: [SPARK-31317][SQL] Add withField method to Column class
SparkQA commented on issue #27066: URL: https://github.com/apache/spark/pull/27066#issuecomment-617507065 **[Test build #121602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121602/testReport)** for PR 27066 at commit [`fba3bd8`](https://github.com/apache/spark/commit/fba3bd89ae41f8603bd04da3649296fd8c97e312). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
huaxingao commented on issue #28224: URL: https://github.com/apache/spark/pull/28224#issuecomment-617506869 Sounds good. I will fix this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
maropu edited a comment on issue #28224: URL: https://github.com/apache/spark/pull/28224#issuecomment-617504762 How about `Scala Functions` -> `Built-in Functions`, then adding a short description and a link to `sql-ref-functions.html#built-in-functions`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
maropu commented on issue #28224: URL: https://github.com/apache/spark/pull/28224#issuecomment-617504762 How about `Scala Functions` -> `Built-in Functions`, then adding a short description and a link to `sql-ref-functions.html#built-in-functions`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
HeartSaVioR commented on a change in pull request #28215: URL: https://github.com/apache/spark/pull/28215#discussion_r412616088 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala ## @@ -21,15 +21,11 @@ import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} import java.util.Properties -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest - @DockerTest -@Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker Review comment: And it's removed in code diff as of now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions
huaxingao commented on issue #28224: URL: https://github.com/apache/spark/pull/28224#issuecomment-617502561 I just saw this https://user-images.githubusercontent.com/13592258/79932190-c26d5400-8401-11ea-92f8-aa688aa4e616.png;> Do we need this scalar function sections in getting started page? We either need to delete this section or write a short description and link it to the built-in functions page. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values
HyukjinKwon commented on a change in pull request #28264: URL: https://github.com/apache/spark/pull/28264#discussion_r412613571 ## File path: docs/sql-ref-datatypes.md ## @@ -706,3 +708,53 @@ The following table shows the type names as well as aliases used in Spark SQL pa + +### Floating Point Special Values + +Spark SQL supports several special floating point values in a case-insensitive manner: + + * Inf/+Inf/Infinity/+Infinity: positive infinity + * ```FloatType```: 1.0f / 0.0f, which is equal to the value returned by java.lang.Float.intBitsToFloat(0x7f80). + * ```DoubleType```: 1.0 / 0.0, which is equal to the value returned by java.lang.Double.longBitsToDouble(0x7ff0L). + * -Inf/-Infinity: negative infinity + * ```FloatType```: -1.0f / 0.0f, which is equal to the value returned by java.lang.Float.intBitsToFloat(0xff80). + * ```DoubleType```: -1.0 / 0.0, which is equal to the value returned by java.lang.Double.longBitsToDouble(0xfff0L). + * NaN: not a number + * ```FloatType```: 0.0f / 0.0f, which is equivalent to the value returned by java.lang.Float.intBitsToFloat(0x7fc0). + * ```DoubleType```: 0.0d / 0.0, which is equivalent to the value returned by java.lang.Double.longBitsToDouble(0x7ff8L). + + Examples + +{% highlight sql %} +SELECT double('infinity'); + ++ Review comment: Yeah, let's remove two leading spaces just to be consistent with other DBMS's doc references. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
AmplabJenkins commented on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617501069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
AmplabJenkins removed a comment on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617501069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl
SparkQA commented on issue #28260: URL: https://github.com/apache/spark/pull/28260#issuecomment-617500691 **[Test build #121601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)** for PR 28260 at commit [`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE
SparkQA commented on issue #28271: URL: https://github.com/apache/spark/pull/28271#issuecomment-617500658 **[Test build #121600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121600/testReport)** for PR 28271 at commit [`1228d2e`](https://github.com/apache/spark/commit/1228d2e439197eee986473f7c9af82e86d11a449). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins removed a comment on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617499480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
AmplabJenkins commented on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617499480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE
AmplabJenkins commented on issue #28271: URL: https://github.com/apache/spark/pull/28271#issuecomment-617498892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA removed a comment on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617421177 **[Test build #121597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121597/testReport)** for PR 28208 at commit [`7792741`](https://github.com/apache/spark/commit/77927418e0ffcd6872f698d9a0f2a3246ec5dd45). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE
AmplabJenkins removed a comment on issue #28271: URL: https://github.com/apache/spark/pull/28271#issuecomment-617498892 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API
SparkQA commented on issue #28208: URL: https://github.com/apache/spark/pull/28208#issuecomment-617498868 **[Test build #121597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121597/testReport)** for PR 28208 at commit [`7792741`](https://github.com/apache/spark/commit/77927418e0ffcd6872f698d9a0f2a3246ec5dd45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on issue #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklist
venkata91 commented on issue #28287: URL: https://github.com/apache/spark/pull/28287#issuecomment-617497333 Can you please review @squito @tgravescs @mridulm ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector
HeartSaVioR commented on a change in pull request #28215: URL: https://github.com/apache/spark/pull/28215#discussion_r412606230 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala ## @@ -21,15 +21,11 @@ import java.math.BigDecimal import java.sql.{Connection, Date, Timestamp} import java.util.Properties -import org.scalatest.Ignore - import org.apache.spark.sql.Row import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, StructType} import org.apache.spark.tags.DockerTest - @DockerTest -@Ignore // AMPLab Jenkins needs to be updated before shared memory works on docker Review comment: Yes, as other tests don't have this. It's not a kind of "should be", but "can be". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #28283: [BRANCH-3.0][SPARK-31344][CORE] Polish implementation of barrier() and allGather()
Ngone51 commented on issue #28283: URL: https://github.com/apache/spark/pull/28283#issuecomment-617493792 Sure, I'll follow it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org