[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23214 maybe add some detailed test result in description and explain the reason for this in code comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23152 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23152 @juliuszsompolski I have updated it accordingly, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237808846 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + + test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not correct") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("all_null") { +sql("create table all_null (attrInt int)") +sql("insert into all_null values (null)") +sql("analyze table all_null compute statistics for columns attrInt") +checkAnswer(sql("select * from all_null where attrInt < 1"), Nil) --- End diff -- I checked and it failed without this patch... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237808676 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -2276,4 +2276,16 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { } } + + test("SPARK-26181 hasMinMaxStats method of ColumnStatsMap is not correct") { +withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + withTable("all_null") { +sql("create table all_null (attrInt int)") +sql("insert into all_null values (null)") +sql("analyze table all_null compute statistics for columns attrInt") +checkAnswer(sql("select * from all_null where attrInt < 1"), Nil) --- End diff -- actually, when I remove the code change from source code and rerun this test, there's an exception being thrown. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237735460 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -879,13 +879,13 @@ case class ColumnStatsMap(originalMap: AttributeMap[ColumnStat]) { } def hasCountStats(a: Attribute): Boolean = -get(a).map(_.hasCountStats).getOrElse(false) +get(a).exists(_.hasCountStats) def hasDistinctCount(a: Attribute): Boolean = -get(a).map(_.distinctCount.isDefined).getOrElse(false) +get(a).exists(_.distinctCount.isDefined) def hasMinMaxStats(a: Attribute): Boolean = -get(a).map(_.hasCountStats).getOrElse(false) +get(a).exists(_.hasMinMaxStats) --- End diff -- sure, thank you all --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237380128 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -821,6 +822,32 @@ class FilterEstimationSuite extends StatsEstimationTestBase { expectedRowCount = 3) } + test("ColumnStatsMap tests") { --- End diff -- There's no public call to `evaluateBinary` method actually. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/23152#discussion_r237359357 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala --- @@ -821,6 +822,32 @@ class FilterEstimationSuite extends StatsEstimationTestBase { expectedRowCount = 3) } + test("ColumnStatsMap tests") { --- End diff -- Hi @viirya , thanks for your review. Here the `FilterEstimation` works because we only call `FilterEstimation.estimate` to check it, in which we check if `rowCount.isEmpty` first and shortcut the estimation. If `rowCount` is not empty, we should always has a `min` and `max` in the stats, unless it is malformed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23152 @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of `Colum...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/23152 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23152: [SPARK-26181][SQL] the `hasMinMaxStats` method of...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/23152 [SPARK-26181][SQL] the `hasMinMaxStats` method of `ColumnStatsMap` is not correct ## What changes were proposed in this pull request? For now the `hasMinMaxStats` will return the same as `hasCountStats`, which is obviously not as expected. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark minmaxstats Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23152.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23152 commit 1c1edccc43ac9cdb6da53ad93ac998b06bf8b6dd Author: Daoyuan Wang Date: 2018-11-27T06:18:34Z fix ColumnStatsMap.hasMinMaxStats --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21282: [SPARK-23934][SQL] Adding map_from_entries functi...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/21282#discussion_r187291692 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -118,6 +120,229 @@ case class MapValues(child: Expression) override def prettyName: String = "map_values" } +/** + * Returns a map created from the given array of entries. + */ +@ExpressionDescription( + usage = "_FUNC_(arrayOfEntries) - Returns a map created from the given array of entries.", + examples = """ +Examples: + > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b'))); + {1:"a",2:"b"} + """, + since = "2.4.0") +case class MapFromEntries(child: Expression) extends UnaryExpression +{ --- End diff -- Nit: style --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21227: Backport [SPARK-24133][SQL] Check for integer ove...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/21227#discussion_r185801594 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -81,7 +81,9 @@ public void close() { } public void reserve(int requiredCapacity) { -if (requiredCapacity > capacity) { +if (requiredCapacity < 0) { + throwUnsupportedException(requiredCapacity, null); --- End diff -- thanks for the explanation! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21227: [SPARK-24133][SQL] Check for integer overflows wh...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/21227#discussion_r185768453 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -81,7 +81,9 @@ public void close() { } public void reserve(int requiredCapacity) { -if (requiredCapacity > capacity) { +if (requiredCapacity < 0) { + throwUnsupportedException(requiredCapacity, null); --- End diff -- how about use `ArithmeticException` instead of null? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94798265 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) + } + + protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val evalChildren = children.map(_.genCode(ctx)) +val target = evalChildren(0) +val targetDataType = children(0).dataType +val rest = evalChildren.drop(1) +val restDataType = children.drop(1).map(_.dataType) + +def updateEval(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], Int]): String = { + val ((eval, dataType), index) = evalWithIndex + s""" +${eval.code} +if (${dataType.equals(targetDataType)} + && ${ctx.genEqual(targetDataType, eval.value, target.value)}) { + ${ev.value} = ${index}; +} + """ +} + +def genIfElseStructure(code1: String, code2: String): String = { + s""" + ${code1} + else { + ${code2} + } + """ +} + +def dataTypeEqualsTarget(evalWithIndex: Tuple2[Tuple2[ExprCode, DataType], Int]): Boolean = { --- End diff -- `def dataTypeEqualsTarget(evalWithIndex: ((ExprCode, DataType), Int)): Boolean` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94797482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters' types should be subtypes of AtomicType. + */ +@ExpressionDescription( + usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the str1,str2,... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + """) +case class Field(children: Seq[Expression]) extends Expression { + + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") +} else + TypeCheckResult.TypeCheckSuccess + } + + override def dataType: DataType = IntegerType + + override def eval(input: InternalRow): Any = { +val target = children.head.eval(input) +val targetDataType = children.head.dataType +def findEqual(target: Any, params: Seq[Expression], index: Int): Int = { + params.toList match { +case Nil => 0 +case head::tail if targetDataType == head.dataType + && head.eval(input) != null && ordering.equiv(target, head.eval(input)) => index +case _ => findEqual(target, params.tail, index + 1) + } +} +if(target == null) + 0 +else + findEqual(target, children.tail, 1) --- End diff -- Nit: one more space before `findEqual` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r94796481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +341,91 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of str in (str1, str2, ...) list or 0 if not found. --- End diff -- Nit: delete a space before * --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12391: [SPARK-14631][SQL]drop database cascade should dr...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/12391#discussion_r92030896 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala --- @@ -778,6 +778,24 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac "db2", "tbl1", Seq(part1.spec), ignoreIfNotExists = false, purge = false, retainData = false) assert(fs.exists(partPath)) } + + test("drop database cascade with function defined") { --- End diff -- @gatorsmile Looks like it is not a problem now... I'll close this, thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12391: [SPARK-14631][SQL]drop database cascade should dr...
Github user adrian-wang closed the pull request at: https://github.com/apache/spark/pull/12391 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12391: [SPARK-14631][SQL]drop database cascade should drop func...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/12391 I'll update this ASAP --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/15011 Thanks for your kind review @cloud-fan @gatorsmile , this helps our enabling for BigBench on Spark 2.x --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/15011 @hvanhovell I have checked with Hive and MySQL, they all support dropping current database. By asking user to switch to another database before drop the current one is not enough though, if there are multiple users connected to the same metadata, even you are not using the certain database, some one else may be using that. What's more, if you want to drop a database but without the privilege to access databases created by other user, you will always leave one empty database behind. In Spark's implementation, we ensure the database exists before we do anything, so drop current database is OK. This is also the way other systems adopt. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15011: [SPARK-17122][SQL]support drop current database
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/15011 @jameszhouyi @chenghao-intel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15011: [SPARK-17122][SQL]support drop current database
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/15011 [SPARK-17122][SQL]support drop current database ## What changes were proposed in this pull request? In Spark 1.6 and earlier, we can drop the database we are using. In Spark 2.0, native implementation prevent us from dropping current database, which may break some old queries. This PR would re-enable the feature and set current database to default instead. ## How was this patch tested? one new unit test in `SessionCatalogSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark dropcurrent Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15011 commit 551b27252c0f5fb0de6d0958407cebac0b19ba64 Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-09-08T06:56:32Z support drop current database --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14991: [SPARK-17427][SQL] function SIZE should return -1...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/14991 [SPARK-17427][SQL] function SIZE should return -1 when parameter is null ## What changes were proposed in this pull request? `select size(null)` returns -1 in Hive. In order to be compatible, we should return `-1`. ## How was this patch tested? unit test in `CollectionFunctionsSuite` and `DataFrameFunctionsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14991.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14991 commit 1ccbe6bd41b1e60ea62a157771d4b3ca37f8678f Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-09-07T04:52:58Z size(null)=-1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14366: [SPARK-16732][SQL] Remove unused codes in subexpressionE...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14366 good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14280: [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` o...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/14280#discussion_r71477889 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -64,14 +67,19 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { import spark.implicits._ test("script") { -val df = Seq(("x1", "y1", "z1"), ("x2", "y2", "z2")).toDF("c1", "c2", "c3") -df.createOrReplaceTempView("script_table") -val query1 = sql( - """ -|SELECT col1 FROM (from(SELECT c1, c2, c3 FROM script_table) tempt_table -|REDUCE c1, c2, c3 USING 'bash src/test/resources/test_script.sh' AS -|(col1 STRING, col2 STRING)) script_test_table""".stripMargin) -checkAnswer(query1, Row("x1_y1") :: Row("x2_y2") :: Nil) +if (testCommandAvailable("bash") && testCommandAvailable("echo | awk '{print $0}'")) { + val df = Seq(("x1", "y1", "z1"), ("x2", "y2", "z2")).toDF("c1", "c2", "c3") + df.createOrReplaceTempView("script_table") + val query1 = sql( +""" + |SELECT col1 FROM (from(SELECT c1, c2, c3 FROM script_table) tempt_table + |REDUCE c1, c2, c3 USING 'bash src/test/resources/test_script.sh' AS + |(col1 STRING, col2 STRING)) script_test_table""".stripMargin) + checkAnswer(query1, Row("x1_y1") :: Row("x2_y2") :: Nil) +} +else { --- End diff -- Nit: unnecessary line break. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14169 @rxin Only those script transformation cases which use LazySimpleSerde would be affected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [SPARK-16515][SQL]set default record reader and writer f...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14169 @rxin In Spark 2.0, those conf values start with "hive.", which have default value in HiveConf, cannot get the default value now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [WIP][SPARK-16515][SQL]set default record reader and wri...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14169 I have updated my code and switch to use bash as test case. Hope it will work for Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14169: [WIP][SPARK-16515][SQL]set default record reader and wri...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/14169 This is strange because I can pass the specific test on my local. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14169: [SPARK-16515][SQL]set default record reader and w...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/14169 [SPARK-16515][SQL]set default record reader and writer for script transformation ## What changes were proposed in this pull request? In `ScriptInputOutputSchema`, we read default `RecordReader` and `RecordWriter` from conf. Since Spark 2.0 has deleted those config keys from hive conf, we have to set default reader/writer class name by ourselves. ## How was this patch tested? added a test case in SQLQuerySuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark script Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14169 commit 671a2ba1e1e19fe2d6115c42b59644e619138c84 Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-07-12T13:29:08Z set default record reader and writer for script transformation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14089: [SPARK-16415][SQL] fix catalog string error
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/14089 [SPARK-16415][SQL] fix catalog string error ## What changes were proposed in this pull request? In #13537 we truncate `simpleString` if it is a long `StructType`. But sometimes we need `catalogString` to reconstruct `TypeInfo`, for example in description of [SPARK-16415 ](https://issues.apache.org/jira/browse/SPARK-16415). So we need to keep the implementation of `catalogString` not affected by our truncate. ## How was this patch tested? added a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark catalogstring Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14089.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14089 commit dc81b385a68c842715b3377f15f7b3009e45f0ce Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-07-06T03:18:07Z fix catalog string commit eb1018108a879a06701c3dff539ef8d10ab2b118 Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-07-07T11:41:39Z add a unit test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezo...
Github user adrian-wang closed the pull request at: https://github.com/apache/spark/pull/13783 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezone for ...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/13783 Closing, thx! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/13784#discussion_r67795207 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -886,23 +886,45 @@ object DateTimeUtils { } /** + * Convert the timestamp `ts` from one timezone to another. + * + * TODO: Because of DST, the conversion between UTC and human time is not exactly one-to-one + * mapping, the conversion here may return wrong result, we should make the timestamp + * timezone-aware. + */ + def convertTz(ts: SQLTimestamp, fromZone: TimeZone, toZone: TimeZone): SQLTimestamp = { +// We always use local timezone to parse or format a timestamp +val localZone = threadLocalLocalTimeZone.get() +val utcTs = if (fromZone.getID == localZone.getID) { + ts +} else { + // get the human time using local time zone, that actually is in fromZone. + val localTs = ts + localZone.getOffset(ts / 1000L) * 1000L // in fromZone + localTs - getOffsetFromLocalMillis(localTs / 1000L, fromZone) * 1000L +} +if (toZone.getID == localZone.getID) { + utcTs +} else { + val localTs2 = utcTs + toZone.getOffset(utcTs / 1000L) * 1000L // in toZone + // treat it as local timezone, convert to UTC (we could get the expected human time back) + localTs2 - getOffsetFromLocalMillis(localTs2 / 1000L, localZone) * 1000L +} + } + + /** * Returns a timestamp of given timezone from utc timestamp, with the same string * representation in their timezone. */ def fromUTCTime(time: SQLTimestamp, timeZone: String): SQLTimestamp = { -val tz = TimeZone.getTimeZone(timeZone) -val offset = tz.getOffset(time / 1000L) -time + offset * 1000L +convertTz(time, TimeZoneGMT, TimeZone.getTimeZone(timeZone)) --- End diff -- For `fromUTCTime`, this would result in a little bit overhead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13783: [HOTFIX][SPARK-15613][SQL]Set test runtime timezo...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/13783 [HOTFIX][SPARK-15613][SQL]Set test runtime timezone for DateTimeUtilSuite ## What changes were proposed in this pull request? With not default timezone setting in DateTimeUtilsSuite and use `Timestamp.valueOf` method could cause problems in a different time zone from PST. This PR fix this reported in #13652 by @robbinspg and @lw-lin ## How was this patch tested? Existed tests do that. cc @davies You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark timezone Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13783.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13783 commit f5cea66b3bada0afa07b73691753bdc9c983e86d Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-06-20T18:54:39Z set test runtime timezone for DateTimeUtilSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13652: [SPARK-15613] [SQL] Fix incorrect days to millis convers...
Github user adrian-wang commented on the issue: https://github.com/apache/spark/pull/13652 @lw-lin @robbinspg I submit a PR to fix the failure: #13783 . Thanks for your information! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13652: [SPARK-15613] [SQL] Fix incorrect days to millis ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/13652#discussion_r67289086 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -851,6 +851,29 @@ object DateTimeUtils { } /** + * Lookup the offset for given millis seconds since 1970-01-01 00:00:00 in a timezone. --- End diff -- since 1970-01-01 00:00:00 UTC or local timezone? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220278544 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-220250927 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/13186 [SPARK-15397] [SQL] fix string udf locate as hive ## What changes were proposed in this pull request? in hive, `locate("aa", "aaa", 0)` would yield 0, `locate("aa", "aaa", 1)` would yield 1 and `locate("aa", "aaa", 2)` would yield 2, while in Spark, `locate("aa", "aaa", 0)` would yield 1, `locate("aa", "aaa", 1)` would yield 2 and `locate("aa", "aaa", 2)` would yield 0. This results from the different understanding of the third parameter in udf `locate`. It means the starting index and starts from 1, so when we use 0, the return would always be 0. ## How was this patch tested? tested with modified `StringExpressionsSuite` and `StringFunctionsSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark locate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13186.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13186 commit 23b43d4c837d762461dd56a62b85cb998919e0ef Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-05-18T11:30:07Z fix string udf locate as hive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14021][SQL] custom context support for ...
Github user adrian-wang closed the pull request at: https://github.com/apache/spark/pull/11843 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/12551#issuecomment-212696187 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14631][SQL]drop database cascade should...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/12391#discussion_r59838034 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogSuite.scala --- @@ -17,20 +17,22 @@ package org.apache.spark.sql.hive +import java.io.File + import org.apache.hadoop.conf.Configuration import org.apache.hadoop.util.VersionInfo import org.apache.spark.SparkConf +import org.apache.spark.sql.catalyst.FunctionIdentifier import org.apache.spark.sql.catalyst.catalog._ import org.apache.spark.sql.hive.client.{HiveClient, IsolatedClientLoader} -import org.apache.spark.util.Utils /** * Test suite for the [[HiveExternalCatalog]]. */ class HiveExternalCatalogSuite extends CatalogTestCases { - private val client: HiveClient = { + private lazy val client: HiveClient = { --- End diff -- maybe `lazy` is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/12391 [SPARK-14631][SQL][WIP]drop database cascade should drop functions for HiveExternalCatalog ## What changes were proposed in this pull request? as HIVE-12304, drop database cascade of hive did not drop functions as well (even after the fix, by call `client.dropDatabase` would still not drop functions). We need to fix this when call `dropDatabase` in `HiveExternalCatalog`. Jira: https://issues.apache.org/jira/browse/SPARK-14631 See also: https://issues.apache.org/jira/browse/HIVE-12304 ## How was this patch tested? unit test in HiveExternalCatalogSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark cascadedropdb Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12391.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12391 commit 66eeea0cce58ab65d06bee1220bbd61226505716 Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-04-14T11:44:09Z drop database cascade should drop functions for HiveExternalCatalog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14631][SQL][WIP]drop database cascade s...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/12391#issuecomment-209895482 @chenghao-intel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14021][SQL][WIP] custom context support...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/11843 [SPARK-14021][SQL][WIP] custom context support for SparkSQLEnv ## What changes were proposed in this pull request? This is to create a custom context for command `bin/spark-sql` and `sbin/start-thriftserver`. Any context that is derived from `HiveContext` is acceptable. User need to configure the class name of custom context in a config of `spark.sql.context.class`, and make sure the class in classpath. This is to provide a more elegant way for custom configurations and changes for infrastructure team. ## How was this patch tested? Added a unit test in `CliSuite` to track this. cc @chenghao-intel You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark customcontext Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11843.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11843 commit 654397e0f8ea3774ccce2c482784a986b212527b Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-03-19T08:26:20Z SPARK-14021 custom context support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11815#discussion_r56745951 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -435,6 +435,11 @@ object SQLConf { defaultValue = Some(true), doc = "When false, we will treat bucketed table as normal table") + val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal", --- End diff -- Sounds good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11815#discussion_r56745738 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -435,6 +435,11 @@ object SQLConf { defaultValue = Some(true), doc = "When false, we will treat bucketed table as normal table") + val ORDER_BY_ORDINAL = booleanConf("spark.sql.orderByOrdinal", --- End diff -- There's a similar configuration value in hive since 0.11 and later, which is ` hive.groupby.orderby.position.alias`(default as `false`). See: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy I think we also need to read from ConfVar for this config. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12855][MINOR] remove spark.sql.dialect ...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/11758 [SPARK-12855][MINOR] remove spark.sql.dialect from doc and test ## What changes were proposed in this pull request? Since developer API of plug-able parser has been removed in #10801 , docs should be updated accordingly. ## How was this patch tested? This patch will not affect the real code path. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark spark12855 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11758.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11758 commit b0ffbf6fd81e30a6df9d08b020fc1215e6dcc7d4 Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-03-16T09:01:56Z [MINOR] remove spark.sql.dialect from doc and test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][TEST][SQL] Remove wrong "expected" par...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/11718#issuecomment-196712224 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13427][SQL] Support USING clause in JOI...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11297#discussion_r56101451 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1169,6 +1170,7 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { Join(newLeft, newRight, LeftOuter, newJoinCond) case FullOuter => f case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") +case UsingJoin(_, _) => sys.error("Untransformed Using join node") --- End diff -- JoinType is sealed, so we need to put something in this pattern match --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13648] Add Hive Cli to classes for isol...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/11495#issuecomment-193670192 Why we need `hive-cli` here? This is not version specific. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/9589#discussion_r53545337 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -104,29 +104,39 @@ private[hive] class HiveClientImpl( } val ret = try { - val initialConf = new HiveConf(classOf[SessionState]) - // HiveConf is a Hadoop Configuration, which has a field of classLoader and - // the initial value will be the current thread's context class loader - // (i.e. initClassLoader at here). - // We call initialConf.setClassLoader(initClassLoader) at here to make - // this action explicit. - initialConf.setClassLoader(initClassLoader) - config.foreach { case (k, v) => -if (k.toLowerCase.contains("password")) { - logDebug(s"Hive Config: $k=xxx") -} else { - logDebug(s"Hive Config: $k=$v") + // originState will be created if not exists, will never be null + val originalState = SessionState.get() + if (originalState.isInstanceOf[CliSessionState]) { --- End diff -- the `SessionState.get()` method would create a instance of `SessionState` if not exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/9589#issuecomment-185523881 @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r53115148 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Here we'll call `changePrecision` on `input` here, which would affect the orignal data. I agree that this is a bad idea, maybe we need to propose a separate pr to work around this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968764 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -523,11 +523,45 @@ case class Atan2(left: Expression, right: Expression) case class Pow(left: Expression, right: Expression) extends BinaryMathExpression(math.pow, "POWER") { - override def genCode(ctx: CodegenContext, ev: ExprCode): String = { -defineCodeGen(ctx, ev, (c1, c2) => s"java.lang.Math.pow($c1, $c2)") - } -} + override def inputTypes: Seq[AbstractDataType] = Seq(NumericType, NumericType) + + override def dataType: DataType = (left.dataType, right.dataType) match { +case (dt: DecimalType, ByteType | ShortType | IntegerType) => dt +case _ => DoubleType + } + + protected override def nullSafeEval(input1: Any, input2: Any): Any = +(left.dataType, right.dataType) match { + case (dt: DecimalType, ByteType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Byte]) + case (dt: DecimalType, ShortType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Short]) + case (dt: DecimalType, IntegerType) => +input1.asInstanceOf[Decimal].pow(input2.asInstanceOf[Int]) + case (dt: DecimalType, FloatType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Float]) + case (dt: DecimalType, DoubleType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Double]) + case (dt1: DecimalType, dt2: DecimalType) => +math.pow(input1.asInstanceOf[Decimal].toDouble, input2.asInstanceOf[Decimal].toDouble) --- End diff -- Shall we cast the result of `math.pow` back to `DecimalType` for these three cases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968376 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java --- @@ -170,6 +170,7 @@ public void write(int ordinal, double value) { } public void write(int ordinal, Decimal input, int precision, int scale) { +input = input.clone(); --- End diff -- Better add a comment that explains why we need to clone before write. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13332][SQL] Decimal datatype suppo...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968287 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala --- @@ -351,6 +350,20 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { } test("pow") { +testBinary(Pow, (d: Decimal, n: Byte) => d.pow(n), + (-5 to 5).map(v => (Decimal(v * 1.0), v.toByte))) --- End diff -- maybe `v.toDouble` is better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SQL] Decimal datatype support for pow
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11212#discussion_r52968214 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathFunctionsSuite.scala --- @@ -103,8 +103,7 @@ class MathFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } else { domain.foreach { case (v1, v2) => -checkEvaluation(c(Literal(v1), Literal(v2)), f(v1 + 0.0, v2 + 0.0), EmptyRow) -checkEvaluation(c(Literal(v2), Literal(v1)), f(v2 + 0.0, v1 + 0.0), EmptyRow) +checkEvaluation(c(Literal(v1), Literal(v2)), f(v1, v2), EmptyRow) --- End diff -- keep the test of `f(v2, v1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/9589#issuecomment-183811059 @marmbrus We have instantiated and started a instance of `CliSessionState`, and when we init `SparkSQLEnv`, we will create a `SessionState`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/11071#issuecomment-183791219 @srowen you are right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11071#discussion_r51851087 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -55,10 +56,19 @@ object DateTimeUtils { // this is year -17999, calculation: 50 * daysIn400Year final val YearZero = -17999 final val toYearZero = to2001 + 7304850 - final val TimeZoneGMT = TimeZone.getTimeZone("GMT") @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the TimeZone object as it is expensive to create in each method call. + final val timeZones = new ConcurrentHashMap[String, TimeZone] --- End diff -- This map could be quite big, because the string varies. Actually `ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11071#discussion_r51851099 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -55,10 +56,19 @@ object DateTimeUtils { // this is year -17999, calculation: 50 * daysIn400Year final val YearZero = -17999 final val toYearZero = to2001 + 7304850 - final val TimeZoneGMT = TimeZone.getTimeZone("GMT") @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the TimeZone object as it is expensive to create in each method call. + final val timeZones = new ConcurrentHashMap[String, TimeZone] --- End diff -- This map could be quite big, because the string varies. Actually `ZoneInfoFile` does provide a cache for different `ID`s. Let's find out whether the boost you mentioned comes from reusing `TimeZone` or `Calendar` instances. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11071#discussion_r51852765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -55,10 +56,19 @@ object DateTimeUtils { // this is year -17999, calculation: 50 * daysIn400Year final val YearZero = -17999 final val toYearZero = to2001 + 7304850 - final val TimeZoneGMT = TimeZone.getTimeZone("GMT") @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the TimeZone object as it is expensive to create in each method call. + final val timeZones = new ConcurrentHashMap[String, TimeZone] --- End diff -- By use this map we can skip a lot of calls to `getTimeZone`, which is a synchronized method, `ConcurrentHashMap` can help improve performance, that's true. Do we need add a `transient`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/11071#discussion_r51853631 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -55,10 +56,19 @@ object DateTimeUtils { // this is year -17999, calculation: 50 * daysIn400Year final val YearZero = -17999 final val toYearZero = to2001 + 7304850 - final val TimeZoneGMT = TimeZone.getTimeZone("GMT") @transient lazy val defaultTimeZone = TimeZone.getDefault + // Reuse the TimeZone object as it is expensive to create in each method call. + final val timeZones = new ConcurrentHashMap[String, TimeZone] --- End diff -- Actually, we only need to change all `getTimeZone(String ID)` to `getTimeZone(String ID, boolean fallback)`, (use true as fallback) to workaround the synchronized tag here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13185][SQL] Improve the performance of ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/11071#issuecomment-180201265 The map key could be like "UTC+01:00". "American/Los Angeles", "PST", etc., they are already cached in `getTimeZone`, but the method itself is a synchronized one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL] Natural join follow-up
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/11070#issuecomment-179664171 LGTM, thx! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-179642503 @cloud-fan any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10964#discussion_r51392316 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2046,6 +2046,15 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { ) } + test("SPARK-13056: Null in map value causes NPE") { +val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value") +withTempTable("maptest") { + df.registerTempTable("maptest") + checkAnswer(sql("SELECT value['abc'] FROM maptest where key = 1"), Row("somestring")) --- End diff -- Yes, you are right --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10964#issuecomment-177877914 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10964#discussion_r51381272 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2056,4 +2056,11 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { ) } } + + test("SPARK-13056: Null in map value causes NPE") { +val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value") +df.registerTempTable("maptest") +checkAnswer(sql("SELECT value['abc'] FROM maptest"), Row("somestring")) +checkAnswer(sql("SELECT value['cba'] FROM maptest"), Row(null)) --- End diff -- @tejasapatil OK, actually I used a udf to generate the map and got your exception, later I changed to this following the guide from @cloud-fan but didn't verify it myself. I'll modify the test case here, Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10964#discussion_r51240308 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1463,4 +1463,11 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { |FROM (SELECT '{"f1": "value1", "f2": 12}' json, 'hello' as str) test """.stripMargin), Row("value1", "12", BigDecimal("3.14"), "hello")) } + + test("SPARK-13056: Null in map value causes NPE") { +val df = Seq(1 -> Map("abc" -> "somestring", "cba" -> null)).toDF("key", "value") +df.registerTempTable("maptest") --- End diff -- That suite mainly test df api functionality, I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r51295569 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -171,13 +171,21 @@ case class Join( def selfJoinResolved: Boolean = left.outputSet.intersect(right.outputSet).isEmpty - // Joins are only resolved if they don't introduce ambiguous expression ids. - override lazy val resolved: Boolean = { + // if not a natural join, it is resolved. if it is a natural join, we still need to + // eliminate natural before we mark it resolved, but the node should be ready for + // resolution only if everything else is resolved here. + lazy val resolvedExceptNatural: Boolean = { childrenResolved && expressions.forall(_.resolved) && selfJoinResolved && condition.forall(_.dataType == BooleanType) } + + // Joins are only resolved if they don't introduce ambiguous expression ids. --- End diff -- This is original comment from the old Join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r51090849 --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g --- @@ -328,6 +328,8 @@ KW_WEEK: 'WEEK'|'WEEKS'; KW_MILLISECOND: 'MILLISECOND'|'MILLISECONDS'; KW_MICROSECOND: 'MICROSECOND'|'MICROSECONDS'; +KW_NATURAL: 'NATURAL'; --- End diff -- I think it should be reserved. See: MySQL: https://dev.mysql.com/doc/refman/5.5/en/keywords.html PostgreSQL: http://www.postgresql.org/docs/7.3/static/sql-keywords-appendix.html and it's also reserved in SQL92 and SQL99(as declared in the doc of PostgreSQL). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r51094651 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -104,6 +105,9 @@ trait CheckAnalysis { s"filter expression '${f.condition.prettyString}' " + s"of type ${f.condition.dataType.simpleString} is not a boolean.") + case j @ Join(_, _, NaturalJoin(_), _) => +failAnalysis(s"natural join not resolved.") --- End diff -- This will never happen but when there were remaining natural joins after 100 iterations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r51106564 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -919,6 +919,7 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper { (rightFilterConditions ++ commonFilterCondition). reduceLeftOption(And).map(Filter(_, newJoin)).getOrElse(newJoin) case FullOuter => f // DO Nothing for Full Outer Join +case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node") --- End diff -- Yes, but joinType is a sealed abstract class, so we have to put something here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10964#discussion_r51226292 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -218,7 +218,7 @@ case class GetArrayItem(child: Expression, ordinal: Expression) protected override def nullSafeEval(value: Any, ordinal: Any): Any = { val baseValue = value.asInstanceOf[ArrayData] val index = ordinal.asInstanceOf[Number].intValue() -if (index >= baseValue.numElements() || index < 0) { +if (index >= baseValue.numElements() || index < 0 || baseValue.isNullAt(index)) { null } else { baseValue.get(index, dataType) --- End diff -- Just realized it is ok to let it return null here, without the condition above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-176532100 Acutally MySQL and Oracle does not support normal full outer join either. PostgreSQL does support natural full outer join: http://www.postgresql.org/docs/9.1/static/queries-table-expressions.html I think we should support natural full outer join since we support full outer join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r51226827 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -248,4 +249,29 @@ class AnalysisSuite extends AnalysisTest { val plan = relation.select(CaseWhen(Seq((Literal(true), 'a.attr)), 'b).as("val")) assertAnalysisSuccess(plan) } + + test("check resolve of natural join") { --- End diff -- OK, I'll update it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-176006912 Sure, I'll update this pr today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13056][SQL] map column would throw NPE ...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/10964 [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would be thrown. You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark npewriter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10964.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10964 commit 5b2d942c3f9825eca0b286c42f31a51e6bda0cdd Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-01-28T05:49:45Z map column would throw NPE if value is null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-173804613 When the parser calls the constructor, how can we get the schema of tables? We need schema to build project list and conditions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-173465114 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10731#discussion_r50063958 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -445,6 +445,26 @@ class Analyzer( val newOrdering = resolveSortOrders(ordering, child, throws = false) Sort(newOrdering, global, child) + // Resolve the order index to be a specific column + case s @ Sort(ordering, global, child) if child.resolved && s.resolved => +def indexToColumn(index: Int, direction: SortDirection) = { + val orderNodes = child.output + if (index > 0 && index <= orderNodes.size) { +SortOrder(orderNodes(index - 1), direction) + } else { +throw new UnresolvedException(s, + s"""Order by position: $index does not exist \n + |The Select List is indexed from 1 to ${orderNodes.size}""".stripMargin) + } +} +val newOrdering = ordering map { --- End diff -- If there are nested attributes, integer literal could be used to reference nested fields, so we the integer literal should be on top level of order list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-172424340 order by 2 should be the second column, I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/10762 [SPARK-12828][SQL]add natural join support Jira: https://issues.apache.org/jira/browse/SPARK-12828 You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark naturaljoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10762.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10762 commit 91a4a8739673740792c0e2c95ad7ae37497b7a8b Author: Daoyuan Wang <daoyuan.w...@intel.com> Date: 2016-01-14T08:31:00Z add natural join support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r49813610 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -179,10 +180,15 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser { ) protected lazy val joinedRelation: Parser[LogicalPlan] = -relationFactor ~ rep1(joinType.? ~ (JOIN ~> relationFactor) ~ joinConditions.?) ^^ { +relationFactor ~ --- End diff -- Yes, I will update the hive parser today, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r49825914 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2067,4 +2067,24 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { ) } } + + test("natural join") { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r49825919 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -179,10 +180,15 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser { ) protected lazy val joinedRelation: Parser[LogicalPlan] = -relationFactor ~ rep1(joinType.? ~ (JOIN ~> relationFactor) ~ joinConditions.?) ^^ { +relationFactor ~ --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r49828096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -144,6 +147,35 @@ case class Join( } } + def outerProjectList: Seq[NamedExpression] = { --- End diff -- yes I mean "column"... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10762#discussion_r49827571 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -144,6 +147,35 @@ case class Join( } } + def outerProjectList: Seq[NamedExpression] = { --- End diff -- For select * from a natural join, we need to use a Project to get rid of redundant columns. In mysql, select * from natural join would only contain one row if both size have the row of the same name. but user do can use something like `t1.a` or `t2.a` to explicitly reference the column here. Do you think we should implement it exactly the same? I'm afraid it would be a too complicated change, but I can do that if necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12828][SQL]add natural join support
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/10762#issuecomment-171895087 @rxin, Thanks for you time, I'll draft another version accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10731#discussion_r49539497 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -22,8 +22,9 @@ import java.sql.Timestamp import org.apache.spark.AccumulatorSuite import org.apache.spark.sql.catalyst.DefaultParserDialect -import org.apache.spark.sql.catalyst.analysis.FunctionRegistry +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry} --- End diff -- import order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10731#discussion_r49539533 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -441,7 +442,7 @@ class Analyzer( // When resolve `SortOrder`s in Sort based on child, don't report errors as // we still have chance to resolve it based on grandchild - case s @ Sort(ordering, global, child) if child.resolved && !s.resolved => --- End diff -- why remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10731#discussion_r49539463 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -22,8 +22,9 @@ import java.sql.Timestamp import org.apache.spark.AccumulatorSuite import org.apache.spark.sql.catalyst.DefaultParserDialect -import org.apache.spark.sql.catalyst.analysis.FunctionRegistry +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry} import org.apache.spark.sql.catalyst.errors.DialectException +import org.apache.spark.sql.catalyst.expressions.SortOrder --- End diff -- unused? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10731#discussion_r49540540 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -22,8 +22,9 @@ import java.sql.Timestamp import org.apache.spark.AccumulatorSuite import org.apache.spark.sql.catalyst.DefaultParserDialect -import org.apache.spark.sql.catalyst.analysis.FunctionRegistry +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, FunctionRegistry} import org.apache.spark.sql.catalyst.errors.DialectException +import org.apache.spark.sql.catalyst.expressions.SortOrder --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11624][SPARK-11972][SQL]fix commands th...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/9589#discussion_r48829056 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientWrapper.scala --- @@ -151,29 +152,34 @@ private[hive] class ClientWrapper( // Switch to the initClassLoader. Thread.currentThread().setContextClassLoader(initClassLoader) val ret = try { - val initialConf = new HiveConf(classOf[SessionState]) - // HiveConf is a Hadoop Configuration, which has a field of classLoader and - // the initial value will be the current thread's context class loader - // (i.e. initClassLoader at here). - // We call initialConf.setClassLoader(initClassLoader) at here to make - // this action explicit. - initialConf.setClassLoader(initClassLoader) - config.foreach { case (k, v) => -if (k.toLowerCase.contains("password")) { - logDebug(s"Hive Config: $k=xxx") -} else { - logDebug(s"Hive Config: $k=$v") + val registeredState = SessionState.get + if (registeredState != null && registeredState.isInstanceOf[CliSessionState]) { --- End diff -- When we have a `CliSessionState`, we are using Spark SQL CLI, in this case we never need second `SessionState` here. Creating another `SessionState` would fail some cases since `CliSessionState` is inherited from `SessionState`, which could lead to `ClassCastException`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12260][wip][Streaming]Graceful Shutdown...
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/10252#discussion_r48232238 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -330,6 +330,20 @@ abstract class DStream[T: ClassTag] ( } /** + * compute the last valid time till input time --- End diff -- Computes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org