date:20200421

[GitHub] [spark] TJX2014 opened a new pull request #28291: [SPARK-31400][ML,MLlib]The catalogString doesn't distinguish Vectors in ml and mllib

2020-04-21 Thread GitBox



TJX2014 opened a new pull request #28291:
URL: https://github.com/apache/spark/pull/28291


   
   
   ### What changes were proposed in this pull request?
   add package check in org.apache.spark.ml.util.SchemaUtils#checkColumnType
   and add unit test in org.scalatest.FunSuiteLike#test
   
   
   ### Why are the changes needed?
   the catalogString doesn't distinguish Vectors in ml and mllib when mllib 
vector misused in ml 
   [https://issues.apache.org/jira/browse/SPARK-31400](url)
   
   ### Does this PR introduce any user-facing change?
   No
   
   
   ### How was this patch tested?
   Unit test is added including both negative and positive cases.
   `test("Valid class of dataType") {
   val schema = StructType(Array[StructField] {
 StructField("features", new org.apache.spark.mllib.linalg.VectorUDT)
   })
   val e = intercept[IllegalArgumentException] {
 SchemaUtils.checkColumnType(schema, "features", new VectorUDT)
   }
   assert(e.getMessage.contains("must belong to ml package"), "package is 
not valid to ml")
   val normalSchema = StructType(Array[StructField] {
 StructField("features", new VectorUDT)
   })
   SchemaUtils.checkColumnType(normalSchema, "features", new VectorUDT)
 }`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #28272: [SPARK-31489][SPARK-31488][SQL] Translate date values of pushed down filters to `java.sql.Date`

2020-04-21 Thread GitBox



MaxGekk commented on issue #28272:
URL: https://github.com/apache/spark/pull/28272#issuecomment-617564224


   @cloud-fan @HyukjinKwon @dongjoon-hyun Please, take a look at the PR if you 
have time.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



SparkQA commented on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617560789


   **[Test build #121607 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)**
 for PR 28290 at commit 
[`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



SparkQA removed a comment on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617557202


   **[Test build #121607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)**
 for PR 28290 at commit 
[`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617560892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617560892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



huaxingao commented on a change in pull request #28290:
URL: https://github.com/apache/spark/pull/28290#discussion_r412681949



##
File path: docs/sql-getting-started.md
##
@@ -347,7 +347,8 @@ For example:
 
 
 ## Scalar Functions
-(to be filled soon)
+
+Scalar functions are functions that return a single value per row, as opposed 
to aggregation functions, which return a value for a group of rows. Spark SQL 
supports a variety of built-in scalar functions such as [Array 
Functions](sql-ref-functions-builtin.html#array-functions), [Map 
Functions](sql-ref-functions-builtin.html#map-functions), [Date and Timestamp 
Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions), etc. 
It also supports [User Defined Scalar 
Functions](sql-ref-functions-udf-scalar.html).

Review comment:
   I used Scalar function instead of built-in function (because the next 
paragraph is for aggregations). I don't want to talk about window function 
because I think window function is still sort of aggregation, even thought it 
returns result for each row.
   I can't link it to builtin function because builtin function contains 
aggregation function, so I link to each group of the builtin scalar functions. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



maropu commented on a change in pull request #28290:
URL: https://github.com/apache/spark/pull/28290#discussion_r412678792



##
File path: docs/sql-getting-started.md
##
@@ -347,7 +347,8 @@ For example:
 
 
 ## Scalar Functions
-(to be filled soon)
+
+Scalar functions are functions that return a single value per row, as opposed 
to aggregation functions, which return a value for a group of rows. Spark SQL 
supports a variety of built-in scalar functions such as [Array 
Functions](sql-ref-functions-builtin.html#array-functions), [Map 
Functions](sql-ref-functions-builtin.html#map-functions), [Date and Timestamp 
Functions](sql-ref-functions-builtin.html#date-and-timestamp-functions), etc. 
It also supports [User Defined Scalar 
Functions](sql-ref-functions-udf-scalar.html).

Review comment:
   Any reason not to describe aggregate/window functions? Also, I think its 
better not to describe each function group name here because we might forget to 
update this page every time we add a new group.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617555485







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



SparkQA commented on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617557202


   **[Test build #121607 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121607/testReport)**
 for PR 28290 at commit 
[`e5e69e2`](https://github.com/apache/spark/commit/e5e69e238c5fd8c74ae6541f024be7e09466ae5f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



huaxingao commented on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617556913


   @HyukjinKwon @maropu 
   I quickly wrote a paragraph for scalar function. It has to be scalar 
function because the following paragraph is for aggregation. The definition for 
scalar function is a bit confusing. Snowflake 
(https://docs.snowflake.com/en/sql-reference/functions.html) and Oracle 
(https://docs.oracle.com/cd/E57185_01/IRWUG/ch12s04s01.html) define scalar 
function as functions opposite to aggregation function, but seems IBM 
(https://www.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/apsg/src/tpc/db2z_sqlscalarfn.html)
 uses scalar function for scalar UDF. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28290:
URL: https://github.com/apache/spark/pull/28290#issuecomment-617555485







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao opened a new pull request #28290: [MINOR][SQL][DOCS] Add a paragraph for scalar function in sql getting started

2020-04-21 Thread GitBox



huaxingao opened a new pull request #28290:
URL: https://github.com/apache/spark/pull/28290


   
   
   ### What changes were proposed in this pull request?
   Add a paragraph for scalar function in sql getting started
   
   
   ### Why are the changes needed?
   To make 3.0 doc complete.
   
   
   ### Does this PR introduce any user-facing change?
   before:
   https://user-images.githubusercontent.com/13592258/79943182-16d1fd00-841d-11ea-9744-9cdd58d83f81.png;>
   
   after:
   https://user-images.githubusercontent.com/13592258/79943178-133e7600-841d-11ea-91dc-ff66485b679a.png;>
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28289:
URL: https://github.com/apache/spark/pull/28289#issuecomment-617544598







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28289:
URL: https://github.com/apache/spark/pull/28289#issuecomment-617544598







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412663151



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   thanks @dongjoon-hyun. PR description updated





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename

2020-04-21 Thread GitBox



SparkQA commented on issue #28289:
URL: https://github.com/apache/spark/pull/28289#issuecomment-617544275


   **[Test build #121606 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121606/testReport)**
 for PR 28289 at commit 
[`59e99f6`](https://github.com/apache/spark/commit/59e99f6d7aad7c464c68fbb6e568f9347f6774ac).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on issue #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename

2020-04-21 Thread GitBox



viirya commented on issue #28289:
URL: https://github.com/apache/spark/pull/28289#issuecomment-617543743


   cc @cloud-fan @xuanyuanking @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya opened a new pull request #28289: [SPARK-31484][Core][Flollowup] Use taskAttemptId in checkpoint filename

2020-04-21 Thread GitBox



viirya opened a new pull request #28289:
URL: https://github.com/apache/spark/pull/28289


   
   
   ### What changes were proposed in this pull request?
   
   
   As suggested by 
https://github.com/apache/spark/pull/28255#discussion_r412619438, this patch 
proposes to use taskAttemptId in checkpoint filename, instead of 
stageAttemptNumber + attemptNumber.
   
   ### Why are the changes needed?
   
   
   To simplify checkpoint simplified and unique.
   
   ### Does this PR introduce any user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Existing tests.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

2020-04-21 Thread GitBox



HyukjinKwon commented on a change in pull request #28085:
URL: https://github.com/apache/spark/pull/28085#discussion_r412644686



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1135,6 +1136,27 @@ private[spark] class DAGScheduler(
 }
   }
 
+  /**
+   * `PythonRunner` needs to know what the pyspark memory and cores settings 
are for the profile
+   * being run. Pass them in the local properties of the task if it's set for 
the stage profile.
+   */
+  private def addPysparkConfigsToProperties(stage: Stage, properties: 
Properties): Unit = {

Review comment:
   sorry, nit:`addPysparkConfigsToProperties` -> 
`addPysparkConfigsToProperties`

##
File path: python/pyspark/resource/executorrequests.py
##
@@ -0,0 +1,169 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from pyspark.resource.taskrequests import TaskResourceRequest
+from pyspark.util import _parse_memory
+
+
+class ExecutorResourceRequest(object):
+"""
+.. note:: Evolving
+
+An Executor resource request. This is used in conjunction with the 
ResourceProfile to
+programmatically specify the resources needed for an RDD that will be 
applied at the
+stage level.
+
+This is used to specify what the resource requirements are for an Executor 
and how
+Spark can find out specific details about those resources. Not all the 
parameters are
+required for every resource type. Resources like GPUs are supported and 
have same limitations
+as using the global spark configs spark.executor.resource.gpu.*. The 
amount, discoveryScript,
+and vendor parameters for resources are all the same parameters a user 
would specify through the
+configs: spark.executor.resource.{resourceName}.{amount, discoveryScript, 
vendor}.
+
+For instance, a user wants to allocate an Executor with GPU resources on 
YARN. The user has
+to specify the resource name (gpu), the amount or number of GPUs per 
Executor,
+the discovery script would be specified so that when the Executor starts 
up it can
+discovery what GPU addresses are available for it to use because YARN 
doesn't tell
+Spark that, then vendor would not be used because its specific for 
Kubernetes.
+
+See the configuration and cluster specific docs for more details.
+
+Use `pyspark.ExecutorResourceRequests` class as a convenience API.
+
+:param resourceName: Name of the resource
+:param amount: Amount requesting
+:param discoveryScript: Optional script used to discover the resources. 
This is required on some
+cluster managers that don't tell Spark the addresses of the resources
+allocated. The script runs on Executors startup to discover the 
addresses
+of the resources available.
+:param vendor: Vendor, required for some cluster managers
+
+.. versionadded:: 3.1.0
+"""
+def __init__(self, resourceName, amount, discoveryScript="", vendor=""):
+self._name = resourceName
+self._amount = amount
+self._discoveryScript = discoveryScript

Review comment:
   nit `_discoveryScript` -> `_discovery_script`

##
File path: python/pyspark/rdd.py
##
@@ -2587,6 +2616,7 @@ def pipeline_func(split, iterator):
 self._prev_jrdd = prev._prev_jrdd  # maintain the pipeline
 self._prev_jrdd_deserializer = prev._prev_jrdd_deserializer
 self.is_cached = False
+self.has_resourceProfile = False

Review comment:
   nit `has_resourceProfile` -> `has_resource_profile`. Basically the logic 
is that we should keep everything `a_b_c` when it's not an API inspired from 
Java (or Scala API). IIRC, this logic complies pep8.

##
File path: python/pyspark/resource/executorrequests.py
##
@@ -0,0 +1,169 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at

[GitHub] [spark] viirya commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception

2020-04-21 Thread GitBox



viirya commented on a change in pull request #28255:
URL: https://github.com/apache/spark/pull/28255#discussion_r412659291



##
File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala
##
@@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging 
{
 
 val finalOutputName = 
ReliableCheckpointRDD.checkpointFileName(ctx.partitionId())
 val finalOutputPath = new Path(outputDir, finalOutputName)
-val tempOutputPath =
-  new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}")
+val tempOutputPath = new Path(outputDir,
+  
s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}")

Review comment:
   OK. Let me create a follow-up for it. Thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone

2020-04-21 Thread GitBox



SparkQA commented on issue #28288:
URL: https://github.com/apache/spark/pull/28288#issuecomment-617540272


   **[Test build #121605 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121605/testReport)**
 for PR 28288 at commit 
[`4961eb3`](https://github.com/apache/spark/commit/4961eb39e282d6b98ca00cc7e4aa49380948ed73).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28288:
URL: https://github.com/apache/spark/pull/28288#issuecomment-617538573







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617538904


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121601/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617538900


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617538904







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28288:
URL: https://github.com/apache/spark/pull/28288#issuecomment-617538573







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



SparkQA removed a comment on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617500691


   **[Test build #121601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)**
 for PR 28260 at commit 
[`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



SparkQA commented on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617538600


   **[Test build #121601 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)**
 for PR 28260 at commit 
[`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xuanyuanking opened a new pull request #28288: [SPARK-31515][SQL] Canonicalize Cast should consider the value of needTimeZone

2020-04-21 Thread GitBox



xuanyuanking opened a new pull request #28288:
URL: https://github.com/apache/spark/pull/28288


   ### What changes were proposed in this pull request?
   Override the canonicalized fields with respect to the result of 
`needsTimeZone`.
   
   
   ### Why are the changes needed?
   The current approach breaks sematic equal of two cast expressions that don't 
relate with datetime type. If we don't need to use `timeZone` information 
casting `from` type to `to` type, then the timeZoneId should not influence the 
canonicalize result. 
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   New UT added.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412650911



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Got it. Thanks. I understand that Snowflake is considered important. 
Could you add some of the above 
comment(https://github.com/apache/spark/pull/28284#discussion_r412621357) to 
the PR description then?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412648326



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Specifically, snowflake use dayofweek_iso except db2. I couldn't find 
more because most platforms do not have two day-of-week system implemented.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412648326



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Specifically, snowflake use dayofweek_iso except db2. I couldn't find 
more because most platforms do not have two week-numbering system implemented.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mridulm commented on a change in pull request #28257: [SPARK-31485][CORE] Avoid application hang if only partial barrier tasks launched

2020-04-21 Thread GitBox



mridulm commented on a change in pull request #28257:
URL: https://github.com/apache/spark/pull/28257#discussion_r412639758



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -401,9 +401,7 @@ private[spark] class TaskSchedulerImpl(
 // addresses are the same as that we allocated in 
taskResourceAssignments since it's
 // synchronized. We don't remove the exact addresses allocated 
because the current
 // approach produces the identical result with less time 
complexity.
-availableResources(i).getOrElse(rName,
-  throw new SparkException(s"Try to acquire resource $rName 
that doesn't exist."))

Review comment:
   Agree, this can be removed and simplified as below.
   

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -468,8 +466,9 @@ private[spark] class TaskSchedulerImpl(
   resourceProfileIds: Array[Int],
   availableCpus: Array[Int],
   availableResources: Array[Map[String, Buffer[String]]],
-  rpId: Int): Int = {
-val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(rpId)
+  taskSet: TaskSetManager): Int = {
+val resourceProfile = sc.resourceProfileManager.resourceProfileFromId(
+  taskSet.taskSet.resourceProfileId)
 val offersForResourceProfile = resourceProfileIds.zipWithIndex.filter { 
case (id, _) =>

Review comment:
   Any particular reason to change this ?

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -675,11 +676,15 @@ private[spark] class TaskSchedulerImpl(
   // Check whether the barrier tasks are partially launched.
   // TODO SPARK-24818 handle the assert failure case (that can happen 
when some locality
   // requirements are not fulfilled, and we should revert the launched 
tasks).
-  require(addressesWithDescs.size == taskSet.numTasks,
-s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +
+  if (addressesWithDescs.size != taskSet.numTasks) {
+val errorMsg =
+  s"Fail current round of resource offers for barrier stage 
${taskSet.stageId} " +
   s"because only ${addressesWithDescs.size} out of a total number 
of " +
   s"${taskSet.numTasks} tasks got resource offers. The resource 
offers may have " +
-  "been blacklisted or cannot fulfill task locality requirements.")
+  "been blacklisted or cannot fulfill task locality requirements."
+logError(errorMsg)

Review comment:
   nit: logWarning

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -675,11 +676,15 @@ private[spark] class TaskSchedulerImpl(
   // Check whether the barrier tasks are partially launched.
   // TODO SPARK-24818 handle the assert failure case (that can happen 
when some locality
   // requirements are not fulfilled, and we should revert the launched 
tasks).
-  require(addressesWithDescs.size == taskSet.numTasks,
-s"Skip current round of resource offers for barrier stage 
${taskSet.stageId} " +

Review comment:
   This is the primary issue causing the bug - and is overly strict.
   Unfortunately, the way barrier scheduling has been written, the only way to 
get it working currently is to disable delay scheduling entirely.
   We will need to relook how to get barrier scheduling working more gracefully 
with task locality and executor failures (and in future, dynamic allocation).

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -484,9 +483,11 @@ private[spark] class TaskSchedulerImpl(
 numTasksPerExecCores
   } else {
 val taskLimit = 
resourceProfile.taskResources.get(limitingResource).map(_.amount)
-  .getOrElse(throw new SparkException("limitingResource returns from 
ResourceProfile" +
-s" $resourceProfile doesn't actually contain that task resource!")
-  )
+  .getOrElse {
+taskSet.abort("limitingResource returns from ResourceProfile " +
+  s"$resourceProfile doesn't actually contain that task resource!")
+return -1

Review comment:
   We still need to throw the exception after the abort - else subsequent 
code will continue.
   This is a repeated pattern in this PR - when replacing an 
exception/require/etc (which would have thrown an exception earlier) with a 
task set abort, please ensure that the exception continues to be thrown.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries

[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617529054







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28085:
URL: https://github.com/apache/spark/pull/28085#issuecomment-617529211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617529054







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28085:
URL: https://github.com/apache/spark/pull/28085#issuecomment-617529211







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



SparkQA commented on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617528964


   **[Test build #121603 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)**
 for PR 28285 at commit 
[`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

2020-04-21 Thread GitBox



SparkQA commented on issue #28085:
URL: https://github.com/apache/spark/pull/28085#issuecomment-617528793


   **[Test build #121604 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121604/testReport)**
 for PR 28085 at commit 
[`528094c`](https://github.com/apache/spark/commit/528094cdf0266a63080a92c42354ee2bc7037bf7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



SparkQA removed a comment on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617520487


   **[Test build #121603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)**
 for PR 28285 at commit 
[`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28085: [SPARK-29641][PYTHON][CORE] Stage Level Sched: Add python api's and tests

2020-04-21 Thread GitBox



HyukjinKwon commented on issue #28085:
URL: https://github.com/apache/spark/pull/28085#issuecomment-617528646


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617525552







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO day-of-week and a newly added `isodow` 
from PostgreSQL for ISO day-of-week. Many other systems only have one 
week-numbering system support and use either full names or abbreviations. 
Things in spark become a little bit complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g. snowflake, but `dowiso`

[GitHub] [spark] AmplabJenkins commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617525552







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



SparkQA removed a comment on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617454217


   **[Test build #121598 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121598/testReport)**
 for PR 28208 at commit 
[`da88978`](https://github.com/apache/spark/commit/da889787c2f430daa82802fea275d21b43b5d06a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



SparkQA commented on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617525009


   **[Test build #121598 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121598/testReport)**
 for PR 28208 at commit 
[`da88978`](https://github.com/apache/spark/commit/da889787c2f430daa82802fea275d21b43b5d06a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412638011



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   @yaooqinn . Ya. It's clear for that part. Could you enumerate the name 
of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering 
that specifically.
   
   It's because you wrote like `Many other systems` in the PR description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412638011



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   @yaooqinn . Ya. It's clear for that part. Could you enumerate the name 
of systems which supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering 
that specifically.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412638011



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   @yaooqinn . Ya. It's clear for that part. But, I asked the explanation 
about your PR description.
   > isodow is PostgreSQL specific but iso as a suffix is more commonly used 
across platforms
   
   Let me ask again specifically. Could you enumerate the name of systems which 
supports `DAYOFWEEK_ISO`? (except IBM DB i). I'm wondering that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail:

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412638011



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   @yaooqinn . I asked the explanation about your PR description.
   > isodow is PostgreSQL specific but iso as a suffix is more commonly used 
across platforms
   
   Let me ask again specifically? Could you enumerate the name of systems which 
supports `DAYOFWEEK_ISO`? (except IBM DB i).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] [spark] HeartSaVioR commented on issue #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



HeartSaVioR commented on issue #28215:
URL: https://github.com/apache/spark/pull/28215#issuecomment-617524078


   Oh, nice catch. That compilation error seems to only come with JDK11.
   
   That said, Spark PR builder may need to run at least two different 
environments; that's lucky we found the issue from Github Action, but it has 
been the source of false alarm hence not considered seriously in many case. 
E.g. we may not able to find this if we see failure in Linter build side on 
Github Action, as building Spark will be cancelled if Linter build fails.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



dongjoon-hyun commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412638011



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   @yaooqinn . Ya. It's clear for that part. But, I asked the explanation 
about your PR description.
   > isodow is PostgreSQL specific but iso as a suffix is more commonly used 
across platforms
   
   Let me ask again specifically? Could you enumerate the name of systems which 
supports `DAYOFWEEK_ISO`? (except IBM DB i).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For

[GitHub] [spark] xuanyuanking commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception

2020-04-21 Thread GitBox



xuanyuanking commented on a change in pull request #28255:
URL: https://github.com/apache/spark/pull/28255#discussion_r412636713



##
File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala
##
@@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging 
{
 
 val finalOutputName = 
ReliableCheckpointRDD.checkpointFileName(ctx.partitionId())
 val finalOutputPath = new Path(outputDir, finalOutputName)
-val tempOutputPath =
-  new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}")
+val tempOutputPath = new Path(outputDir,
+  
s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}")

Review comment:
   Yes, `taskAttemptId`, we also use it in the shuffle map file for making 
the file name unique. 
https://github.com/apache/spark/pull/24892#issuecomment-526215837





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



HeartSaVioR commented on a change in pull request #28215:
URL: https://github.com/apache/spark/pull/28215#discussion_r412635508



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
##
@@ -21,15 +21,11 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
-
 @DockerTest
-@Ignore // AMPLab Jenkins needs to be updated before shared memory works on 
docker

Review comment:
   Sorry I should be more clearer - the change is removed, in other words, 
rolled back. No change.

##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
##
@@ -21,15 +21,11 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
-
 @DockerTest
-@Ignore // AMPLab Jenkins needs to be updated before shared memory works on 
docker

Review comment:
   Sorry for the confusion, I should be more clearer - the change is 
removed, in other words, rolled back. No change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on issue #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



vanzin commented on issue #28215:
URL: https://github.com/apache/spark/pull/28215#issuecomment-617521569


   The check failure is valid, it's failing to compile in that environment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



vanzin commented on a change in pull request #28215:
URL: https://github.com/apache/spark/pull/28215#discussion_r412634890



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
##
@@ -21,15 +21,11 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
-
 @DockerTest
-@Ignore // AMPLab Jenkins needs to be updated before shared memory works on 
docker

Review comment:
   I don't see it removed. It's still there.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617520936







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617520936







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



SparkQA commented on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617520487


   **[Test build #121603 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121603/testReport)**
 for PR 28285 at commit 
[`a1a81f9`](https://github.com/apache/spark/commit/a1a81f9f288953bb18a2bc19ce458663166453a2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] johnlinp edited a comment on issue #25785: [SPARK-27812][CORE] Explicit System.exit after job's main

2020-04-21 Thread GitBox



johnlinp edited a comment on issue #25785:
URL: https://github.com/apache/spark/pull/25785#issuecomment-617518819


   @igorcalabria After upgrading Spark to 2.4.5 (with your PR #26152 included), 
there is no more non-daemon threads produced by Spark. In addition to that, 
`SparkSession` implements `Closeable`, so shutdown hooks should call `stop()` 
implicitly. Does that mean we don't need to explicitly call `stop()` in 
application anymore?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617472629


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26279/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28285: [SPARK-31510][R][BUILD] Set setwd in R documentation build

2020-04-21 Thread GitBox



HyukjinKwon commented on issue #28285:
URL: https://github.com/apache/spark/pull/28285#issuecomment-617519305


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] johnlinp commented on issue #25785: [SPARK-27812][CORE] Explicit System.exit after job's main

2020-04-21 Thread GitBox



johnlinp commented on issue #25785:
URL: https://github.com/apache/spark/pull/25785#issuecomment-617518819


   @igorcalabria After upgrading Spark to 2.4.5 (with your PR #26152 included), 
there is no more non-daemon threads produced by Spark. Does that mean we don't 
need to call `stop()` in application anymore?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-04-21 Thread GitBox



HyukjinKwon commented on issue #28224:
URL: https://github.com/apache/spark/pull/28224#issuecomment-617518085


   Thanks @huaxingao!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception

2020-04-21 Thread GitBox



viirya commented on a change in pull request #28255:
URL: https://github.com/apache/spark/pull/28255#discussion_r412628546



##
File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala
##
@@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging 
{
 
 val finalOutputName = 
ReliableCheckpointRDD.checkpointFileName(ctx.partitionId())
 val finalOutputPath = new Path(outputDir, finalOutputName)
-val tempOutputPath =
-  new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}")
+val tempOutputPath = new Path(outputDir,
+  
s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}")

Review comment:
   Do you mean `taskAttemptId`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412626974



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)
+  private lazy val addOrReplaceExprs = names.zip(valExprs)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expectedStructType = StructType(Nil).typeName
+if (children.size % 2 == 0) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of 
arguments.")
+} else if (ogStructExpr.dataType.typeName != expectedStructType) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only $expectedStructType is allowed to appear at first position, 
got: " +
+  s"${ogStructExpr.dataType.typeName}.")
+} else if (names.contains(null)) {
+  TypeCheckResult.TypeCheckFailure("Field name should not be null.")
+} else if (nameExprs.exists(e => !(e.foldable && e.dataType == 
StringType))) {

Review comment:
   We could but where do you recommend putting the function? I don't think 
it's worth the characters saved to create a shared Utilities object just for 
this. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` from PostgreSQL for ISO week of week-based-year. Many other 
systems only have one week-numbering system support and use either full names 
or abbreviations. Things in spark become a little bit complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g.

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` from PostgreSQL for ISO week of week-based-year. Many other 
systems only have one week-numbering system support and use either full names 
or abbreviations. Things in spark become a little bit complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g.

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412623176



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)
+  private lazy val addOrReplaceExprs = names.zip(valExprs)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expectedStructType = StructType(Nil).typeName
+if (children.size % 2 == 0) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of 
arguments.")
+} else if (ogStructExpr.dataType.typeName != expectedStructType) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only $expectedStructType is allowed to appear at first position, 
got: " +
+  s"${ogStructExpr.dataType.typeName}.")
+} else if (names.contains(null)) {
+  TypeCheckResult.TypeCheckFailure("Field name should not be null.")
+} else if (nameExprs.exists(e => !(e.foldable && e.dataType == 
StringType))) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only foldable ${StringType.catalogString} expressions are allowed to 
appear at even " +
+  "position.")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: StructType = {
+val ogStructFields = ogStructType.fields.map { f =>
+  (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable))
+}
+val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) =>
+  (name, StructField(name, expr.dataType, expr.nullable))
+}
+val newStructFields = addOrReplace(ogStructFields, 
addOrReplaceStructFields).map(_._2)
+StructType(newStructFields)
+  }
+
+  override def foldable: Boolean = children.forall(_.foldable)

Review comment:
   I understand your point about `nameExprs`. What about the original 
`structExpr` though? 
   i.e. shouldn't it be `override def foldable: Boolean = structExpr.foldable 
&& valExprs.forall(_.foldable)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412622174



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)
+  private lazy val addOrReplaceExprs = names.zip(valExprs)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expectedStructType = StructType(Nil).typeName
+if (children.size % 2 == 0) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of 
arguments.")
+} else if (ogStructExpr.dataType.typeName != expectedStructType) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only $expectedStructType is allowed to appear at first position, 
got: " +
+  s"${ogStructExpr.dataType.typeName}.")
+} else if (names.contains(null)) {
+  TypeCheckResult.TypeCheckFailure("Field name should not be null.")
+} else if (nameExprs.exists(e => !(e.foldable && e.dataType == 
StringType))) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only foldable ${StringType.catalogString} expressions are allowed to 
appear at even " +
+  "position.")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: StructType = {
+val ogStructFields = ogStructType.fields.map { f =>
+  (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable))
+}
+val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) =>
+  (name, StructField(name, expr.dataType, expr.nullable))
+}
+val newStructFields = addOrReplace(ogStructFields, 
addOrReplaceStructFields).map(_._2)
+StructType(newStructFields)
+  }
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def nullable: Boolean = false
+
+  override def prettyName: String = "add_fields"
+
+  def toCreateNamedStruct: CreateNamedStruct = {
+val ogStructExprs = ogStructType.fieldNames.zipWithIndex.map {
+  case (name, i) => (name, GetStructField(ogStructExpr, i, Some(name)))
+}
+val newStructExprs = addOrReplace(ogStructExprs, 
addOrReplaceExprs).flatMap {
+  case (name, valExpr) => Seq(Literal(name), valExpr)
+}
+CreateNamedStruct(newStructExprs)
+  }
+
+  def addOrReplace[T](
+existingFields: Seq[(String, T)],
+addOrReplaceFields: Seq[(String, T)]): Seq[(String, T)] = {
+
+addOrReplaceFields.foldLeft(existingFields) { case (newFields, field @ 
(fieldName, _)) =>
+  if (newFields.exists { case (name, _) => name == fieldName }) {

Review comment:
   Unfortunately, the existing struct might contain multiple fields with 
the same name (there is a test case for this scenario). As a result, I don't 
think this can be rewritten in a way that is meaningfully simpler using a map. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` from PostgreSQL for ISO week of week-based-year. Many other 
systems only have one week-numbering system support. Things in spark become a 
little bit complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g. snowflake, but `dowiso` looks weird and no use

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow` 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` for ISO week of week-based-year. Many other systems do not only 
have one week-numbering system support. Things in spark become a little bit 
complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g. snowflake, but `dowiso` looks weird and no use cases

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` from PostgreSQL for ISO week of week-based-year. Many other 
systems do not only have one week-numbering system support. Things in spark 
become a little bit complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g. snowflake, but `dowiso` looks weird and

[GitHub] [spark] yaooqinn commented on a change in pull request #28284: [SPARK-31507][SQL] Remove uncommon fields support and update some fields with meaningful names for extract function

2020-04-21 Thread GitBox



yaooqinn commented on a change in pull request #28284:
URL: https://github.com/apache/spark/pull/28284#discussion_r412621357



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -2033,108 +1991,26 @@ case class MakeTimestamp(
   override def prettyName: String = "make_timestamp"
 }
 
-case class Millennium(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getMillennium(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getMillennium($c)")
-  }
-}
-
-case class Century(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getCentury(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getCentury($c)")
-  }
-}
-
-case class Decade(child: Expression) extends UnaryExpression with 
ImplicitCastInputTypes {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType)
-
-  override def dataType: DataType = IntegerType
-
-  override protected def nullSafeEval(date: Any): Any = {
-DateTimeUtils.getDecade(date.asInstanceOf[Int])
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
-  }
-}
-
-case class Epoch(child: Expression, timeZoneId: Option[String] = None)
-extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
-  // DecimalType is used to not lose precision while converting microseconds to
-  // the fractional part of seconds. Scale 6 is taken to have all microseconds 
as
-  // the fraction. The precision 20 should cover whole valid range of years 
[1, ]
-  // plus negative years that can be used in some cases though are not 
officially supported.
-  override def dataType: DataType = DecimalType(20, 6)
-  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression =
-copy(timeZoneId = Option(timeZoneId))
-
-  override protected def nullSafeEval(timestamp: Any): Any = {
-DateTimeUtils.getEpoch(timestamp.asInstanceOf[Long], zoneId)
-  }
-
-  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
-val dtu = DateTimeUtils.getClass.getName.stripSuffix("$")
-val zid = ctx.addReferenceObj("zoneId", zoneId, classOf[ZoneId].getName)
-defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
-  }
-}
-
 object DatePart {
 
   def parseExtractField(
   extractField: String,
   source: Expression,
   errorHandleFunc: => Nothing): Expression = 
extractField.toUpperCase(Locale.ROOT) match {
-case "MILLENNIUM" | "MILLENNIA" | "MIL" | "MILS" => Millennium(source)
-case "CENTURY" | "CENTURIES" | "C" | "CENT" => Century(source)
-case "DECADE" | "DECADES" | "DEC" | "DECS" => Decade(source)
 case "YEAR" | "Y" | "YEARS" | "YR" | "YRS" => Year(source)
-case "ISOYEAR" => IsoYear(source)
+case "YEAROFWEEK" => YearOfWeek(source)
 case "QUARTER" | "QTR" => Quarter(source)
 case "MONTH" | "MON" | "MONS" | "MONTHS" => Month(source)
 case "WEEK" | "W" | "WEEKS" => WeekOfYear(source)
 case "DAY" | "D" | "DAYS" => DayOfMonth(source)
 case "DAYOFWEEK" | "DOW" => DayOfWeek(source)
-case "ISODOW" => Add(WeekDay(source), Literal(1))
+case "DAYOFWEEK_ISO" | "DOW_ISO" => Add(WeekDay(source), Literal(1))

Review comment:
   Hi @dongjoon-hyun, for historical reasons, we have [`dayofweek`, `dow`] 
implemented for representing a non-ISO week of week-based-year and a newly 
added `isodow` for ISO week of week-based-year. Many other systems do not only 
have one week-numbering system support. Things in spark become a little bit 
complicated. 
   1. because of the existence of `isodow`, so we need to add iso-prefix to 
`dayofweek` to make a pair for it too. [`dayofweek`, `isodayofweek`, `dow` and 
`isodow`] 
   2. because there are rare `iso`-prefixed systems and more systems choose 
`iso`-suffixed way, so we may result in [`dayofweek`, `dayofweekiso`, `dow`, 
`dowiso`]
   3. `dayofweekiso` looks nice and has use cases in the platforms listed 
above, e.g. snowflake, but `dowiso` looks weird and no use cases

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412620131



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)
+  private lazy val addOrReplaceExprs = names.zip(valExprs)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val expectedStructType = StructType(Nil).typeName
+if (children.size % 2 == 0) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName expects an odd number of 
arguments.")
+} else if (ogStructExpr.dataType.typeName != expectedStructType) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only $expectedStructType is allowed to appear at first position, 
got: " +
+  s"${ogStructExpr.dataType.typeName}.")
+} else if (names.contains(null)) {
+  TypeCheckResult.TypeCheckFailure("Field name should not be null.")
+} else if (nameExprs.exists(e => !(e.foldable && e.dataType == 
StringType))) {
+  TypeCheckResult.TypeCheckFailure(
+s"Only foldable ${StringType.catalogString} expressions are allowed to 
appear at even " +
+  "position.")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: StructType = {
+val ogStructFields = ogStructType.fields.map { f =>
+  (f.name, f.copy(nullable = ogStructExpr.nullable || f.nullable))
+}
+val addOrReplaceStructFields = addOrReplaceExprs.map { case (name, expr) =>
+  (name, StructField(name, expr.dataType, expr.nullable))
+}
+val newStructFields = addOrReplace(ogStructFields, 
addOrReplaceStructFields).map(_._2)
+StructType(newStructFields)

Review comment:
   done. 

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)
+  private lazy val addOrReplaceExprs = names.zip(valExprs)
+
+  override def checkInputDataTypes():

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412619800



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head

Review comment:
   og = original. Since this isn't as obvious as I thought, I've changed 
the variable name to be clearer. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28255: [SPARK-31484][Core] Add stage attempt number to temp checkpoint filename to avoid file already existing exception

2020-04-21 Thread GitBox



cloud-fan commented on a change in pull request #28255:
URL: https://github.com/apache/spark/pull/28255#discussion_r412619438



##
File path: core/src/main/scala/org/apache/spark/rdd/ReliableCheckpointRDD.scala
##
@@ -199,8 +199,8 @@ private[spark] object ReliableCheckpointRDD extends Logging 
{
 
 val finalOutputName = 
ReliableCheckpointRDD.checkpointFileName(ctx.partitionId())
 val finalOutputPath = new Path(outputDir, finalOutputName)
-val tempOutputPath =
-  new Path(outputDir, s".$finalOutputName-attempt-${ctx.attemptNumber()}")
+val tempOutputPath = new Path(outputDir,
+  
s".$finalOutputName-attempt-${ctx.stageAttemptNumber()}-${ctx.attemptNumber()}")

Review comment:
   If we just want a unique file name, can we use the task id? It's unique 
within the Spark application.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412619541



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {
+
+  private lazy val ogStructExpr = children.head
+  private lazy val ogStructType = 
ogStructExpr.dataType.asInstanceOf[StructType]
+  private lazy val (nameExprs, valExprs) = children.drop(1).grouped(2).map {
+case Seq(name, value) => (name, value)
+  }.toList.unzip
+  private lazy val names = nameExprs.map(e => 
Option(e.eval()).map(_.toString).orNull)

Review comment:
   Your suggestion will return a `Seq[Any]`. We need `Seq[String]` for the 
code to compile. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r412619329



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##
@@ -514,3 +514,98 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces fields in struct by name.
+ *
+ * @param children Seq(struct, name1, val1, name2, val2, ...)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name1, val1, name2, val2, ...) - Adds/replaces 
fields in struct by name.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+   {"a":1,"b":2,"c":3}
+  > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+   {"a":1,"b":3}
+  > SELECT _FUNC_(CAST(NULL AS struct), "c", 3);
+   {"a":null,"b":null,"c":3}
+  > SELECT _FUNC_(a, 'b', 100) AS a FROM (VALUES (NAMED_STRUCT('a', 1, 
'b', 2, 'b', 3)) AS T(a));
+   {"a":1,"b":100,"b":100}
+  > SELECT _FUNC_(a, 'a', _FUNC_(a.a, 'c', 3)) AS a FROM (VALUES 
(NAMED_STRUCT('a', NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+   {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class AddFields(children: Seq[Expression]) extends Unevaluable {

Review comment:
   You're right, makes more sense to call it `WithFields`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27066: [SPARK-31317][SQL] Add withField method to Column class

2020-04-21 Thread GitBox



SparkQA commented on issue #27066:
URL: https://github.com/apache/spark/pull/27066#issuecomment-617507065


   **[Test build #121602 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121602/testReport)**
 for PR 27066 at commit 
[`fba3bd8`](https://github.com/apache/spark/commit/fba3bd89ae41f8603bd04da3649296fd8c97e312).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-04-21 Thread GitBox



huaxingao commented on issue #28224:
URL: https://github.com/apache/spark/pull/28224#issuecomment-617506869


   Sounds good. I will fix this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu edited a comment on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-04-21 Thread GitBox



maropu edited a comment on issue #28224:
URL: https://github.com/apache/spark/pull/28224#issuecomment-617504762


   How about `Scala Functions` -> `Built-in Functions`, then adding a short 
description and a link to `sql-ref-functions.html#built-in-functions`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-04-21 Thread GitBox



maropu commented on issue #28224:
URL: https://github.com/apache/spark/pull/28224#issuecomment-617504762


   How about `Scala Functions` -> `Built-in Functions`, then adding a short 
description and a link to 
   `sql-ref-functions.html#built-in-functions`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



HeartSaVioR commented on a change in pull request #28215:
URL: https://github.com/apache/spark/pull/28215#discussion_r412616088



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
##
@@ -21,15 +21,11 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
-
 @DockerTest
-@Ignore // AMPLab Jenkins needs to be updated before shared memory works on 
docker

Review comment:
   And it's removed in code diff as of now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #28224: [SPARK-31429][SQL][DOC] Automatically generates a SQL document for built-in functions

2020-04-21 Thread GitBox



huaxingao commented on issue #28224:
URL: https://github.com/apache/spark/pull/28224#issuecomment-617502561


   I just saw this 
   
   https://user-images.githubusercontent.com/13592258/79932190-c26d5400-8401-11ea-92f8-aa688aa4e616.png;>
   
   Do we need this scalar function sections in getting started page? We either 
need to delete this section or write a short description and link it to the 
built-in functions page. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28264: [SPARK-31491][SQL][DOCS] Re-arrange Data Types page to document Floating Point Special Values

2020-04-21 Thread GitBox



HyukjinKwon commented on a change in pull request #28264:
URL: https://github.com/apache/spark/pull/28264#discussion_r412613571



##
File path: docs/sql-ref-datatypes.md
##
@@ -706,3 +708,53 @@ The following table shows the type names as well as 
aliases used in Spark SQL pa
 
 
 
+
+### Floating Point Special Values
+
+Spark SQL supports several special floating point values in a case-insensitive 
manner:
+
+ * Inf/+Inf/Infinity/+Infinity: positive infinity
+   * ```FloatType```: 1.0f / 0.0f, which is equal to the value returned by 
java.lang.Float.intBitsToFloat(0x7f80).
+   * ```DoubleType```: 1.0 / 0.0, which is equal to the value returned by 
java.lang.Double.longBitsToDouble(0x7ff0L).
+ * -Inf/-Infinity: negative infinity
+   * ```FloatType```: -1.0f / 0.0f, which is equal to the value returned by 
java.lang.Float.intBitsToFloat(0xff80).
+   * ```DoubleType```: -1.0 / 0.0, which is equal to the value returned by 
java.lang.Double.longBitsToDouble(0xfff0L).
+ * NaN: not a number
+   * ```FloatType```: 0.0f / 0.0f, which is equivalent to the value returned 
by java.lang.Float.intBitsToFloat(0x7fc0).
+   * ```DoubleType```:  0.0d / 0.0, which is equivalent to the value returned 
by java.lang.Double.longBitsToDouble(0x7ff8L).
+
+ Examples
+
+{% highlight sql %}
+SELECT double('infinity');
+  ++

Review comment:
   Yeah, let's remove two leading spaces just to be consistent with other 
DBMS's doc references.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617501069







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617501069







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28260: [SPARK-31487][CORE] Move slots check of barrier job from DAGScheduler to TaskSchedulerImpl

2020-04-21 Thread GitBox



SparkQA commented on issue #28260:
URL: https://github.com/apache/spark/pull/28260#issuecomment-617500691


   **[Test build #121601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121601/testReport)**
 for PR 28260 at commit 
[`3b98b86`](https://github.com/apache/spark/commit/3b98b86a500c611ea0014b0dafa7a99ce70064c9).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE

2020-04-21 Thread GitBox



SparkQA commented on issue #28271:
URL: https://github.com/apache/spark/pull/28271#issuecomment-617500658


   **[Test build #121600 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121600/testReport)**
 for PR 28271 at commit 
[`1228d2e`](https://github.com/apache/spark/commit/1228d2e439197eee986473f7c9af82e86d11a449).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617499480







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617499480







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE

2020-04-21 Thread GitBox



AmplabJenkins commented on issue #28271:
URL: https://github.com/apache/spark/pull/28271#issuecomment-617498892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



SparkQA removed a comment on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617421177


   **[Test build #121597 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121597/testReport)**
 for PR 28208 at commit 
[`7792741`](https://github.com/apache/spark/commit/77927418e0ffcd6872f698d9a0f2a3246ec5dd45).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28271: [SPARK-31495][SQL] Support formatted explain for AQE

2020-04-21 Thread GitBox



AmplabJenkins removed a comment on issue #28271:
URL: https://github.com/apache/spark/pull/28271#issuecomment-617498892







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28208: [SPARK-31440][SQL] Improve SQL Rest API

2020-04-21 Thread GitBox



SparkQA commented on issue #28208:
URL: https://github.com/apache/spark/pull/28208#issuecomment-617498868


   **[Test build #121597 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121597/testReport)**
 for PR 28208 at commit 
[`7792741`](https://github.com/apache/spark/commit/77927418e0ffcd6872f698d9a0f2a3246ec5dd45).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] venkata91 commented on issue #28287: [SPARK-31418][SCHEDULER] Request more executors in case of dynamic allocation is enabled and a task becomes unschedulable due to spark's blacklist

2020-04-21 Thread GitBox



venkata91 commented on issue #28287:
URL: https://github.com/apache/spark/pull/28287#issuecomment-617497333


   Can you please review @squito @tgravescs @mridulm ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28215: [SPARK-31272][SQL] Support DB2 Kerberos login in JDBC connector

2020-04-21 Thread GitBox



HeartSaVioR commented on a change in pull request #28215:
URL: https://github.com/apache/spark/pull/28215#discussion_r412606230



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
##
@@ -21,15 +21,11 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
-import org.scalatest.Ignore
-
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
 
-
 @DockerTest
-@Ignore // AMPLab Jenkins needs to be updated before shared memory works on 
docker

Review comment:
   Yes, as other tests don't have this. It's not a kind of "should be", but 
"can be".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on issue #28283: [BRANCH-3.0][SPARK-31344][CORE] Polish implementation of barrier() and allGather()

2020-04-21 Thread GitBox



Ngone51 commented on issue #28283:
URL: https://github.com/apache/spark/pull/28283#issuecomment-617493792


   Sure, I'll follow it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 515 matches

Mail list logo