[GitHub] [spark] SparkQA commented on issue #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long
SparkQA commented on issue #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long URL: https://github.com/apache/spark/pull/24019#issuecomment-473168593 **[Test build #103526 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103526/testReport)** for PR 24019 at commit [`2af3224`](https://github.com/apache/spark/commit/2af3224f7d470b138c8bacd0bd336bcb1548297e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin edited a comment on issue #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array
ueshin edited a comment on issue #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array URL: https://github.com/apache/spark/pull/24073#issuecomment-473164953 ~LGTM.~ I rethought after https://github.com/apache/spark/pull/24073#discussion_r265854866, I agree with @kiszk to skip traversing the arraybuffer after null found. @srowen Could you take another look please? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] weixiuli commented on issue #23393: [SPARK-26288][CORE]add initRegisteredExecutorsDB
weixiuli commented on issue #23393: [SPARK-26288][CORE]add initRegisteredExecutorsDB URL: https://github.com/apache/spark/pull/23393#issuecomment-473165307 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kiszk commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array
kiszk commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array URL: https://github.com/apache/spark/pull/24073#discussion_r265854866 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -3112,29 +3112,30 @@ case class ArrayDistinct(child: Expression) (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) } else { (data: Array[AnyRef]) => { - var foundNullElement = false - var pos = 0 + val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] + var alreadyStoredNull = false + var found = false for (i <- 0 until data.length) { -if (data(i) == null) { - if (!foundNullElement) { -foundNullElement = true -pos = pos + 1 +if (data(i) != null) { + found = false + var j = 0; + while (!found && j < arrayBuffer.size) { +val va = arrayBuffer(j) +found = (va != null) && ordering.equiv(va, data(i)) Review comment: @dilipbiswal explained my intention in the code of `null` part. @srowen is code simple and easy for reading, but it may include # of iterations if we have already seen `null`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on issue #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array
ueshin commented on issue #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array URL: https://github.com/apache/spark/pull/24073#issuecomment-473164953 LGTM. @srowen Could you take another look please? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array
ueshin commented on a change in pull request #24073: [SPARK-27134][SQL] array_distinct function does not work correctly with columns containing array of array URL: https://github.com/apache/spark/pull/24073#discussion_r265854753 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala ## @@ -3112,29 +3112,30 @@ case class ArrayDistinct(child: Expression) (data: Array[AnyRef]) => new GenericArrayData(data.distinct.asInstanceOf[Array[Any]]) } else { (data: Array[AnyRef]) => { - var foundNullElement = false - var pos = 0 + val arrayBuffer = new scala.collection.mutable.ArrayBuffer[Any] + var alreadyStoredNull = false + var found = false for (i <- 0 until data.length) { -if (data(i) == null) { - if (!foundNullElement) { -foundNullElement = true -pos = pos + 1 +if (data(i) != null) { + found = false + var j = 0; + while (!found && j < arrayBuffer.size) { +val va = arrayBuffer(j) +found = (va != null) && ordering.equiv(va, data(i)) +j += 1 } -} else { - var j = 0 - var done = false - while (j <= i && !done) { -if (data(j) != null && ordering.equiv(data(j), data(i))) { - done = true -} -j = j + 1 + if (!found) { +arrayBuffer += data(i) } - if (i == j - 1) { -pos = pos + 1 +} else { + // De-duplicate the null values. + if (!alreadyStoredNull) { +arrayBuffer += data(i) +alreadyStoredNull = true } } } - new GenericArrayData(data.slice(0, pos)) Review comment: Good catch. Actually I'm not sure how I could miss this. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#issuecomment-473163647 @felixcheung @HyukjinKwon I have addressed the comments. I have kept the added tests for now. Please let me know if that is okay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r265849322 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.types._ + +/** + * This aims to handle a nested column aliasing pattern inside the `ColumnPruning` optimizer rule. + * If a project or its child references to nested fields, and not all the fields + * in a nested attribute are used, we can substitute them by alias attributes; then a project + * of the nested fields as aliases on the children of the child will be created. + */ +object NestedColumnAliasing { + + def unapply(plan: LogicalPlan) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan match { +case Project(_, child) if canProjectPushThrough(child) => + getAliasSubMap(plan, child) +case _ => None + } + + /** + * Replace nested columns to prune unused nested columns later. + */ + def replaceToAliases( + plan: LogicalPlan, + nestedFieldToAlias: Map[GetStructField, Alias], + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match { +case Project(projectList, child) => + Project( +getNewProjectList(projectList, nestedFieldToAlias), +replaceChildrenWithAliases(child, attrToAliases)) + } + + /** + * Return a replaced project list. + */ + private def getNewProjectList( + projectList: Seq[NamedExpression], + nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = { +projectList.map(_.transform { + case f: GetStructField if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +}.asInstanceOf[NamedExpression]) + } + + /** + * Return a plan with new childen replaced with aliases. + */ + private def replaceChildrenWithAliases( + plan: LogicalPlan, + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { +plan.withNewChildren(plan.children.map { plan => + Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) +}) + } + + /** + * Returns true for those operators that project can be pushed through. + */ + private def canProjectPushThrough(plan: LogicalPlan) = plan match { +case _: GlobalLimit => true +case _: LocalLimit => true +case _: Repartition => true +case _: Sample => true +case _ => false + } + + /** + * Return root references that are individually accessed as a whole, and `GetStructField`s. + */ + private def collectRootReferenceAndGetStructField(plan: LogicalPlan): Seq[Expression] = { +def helper(e: Expression): Seq[Expression] = e match { Review comment: super nit: How about `doCollectFunc`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r265850964 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.types._ + +/** + * This aims to handle a nested column aliasing pattern inside the `ColumnPruning` optimizer rule. + * If a project or its child references to nested fields, and not all the fields + * in a nested attribute are used, we can substitute them by alias attributes; then a project + * of the nested fields as aliases on the children of the child will be created. + */ +object NestedColumnAliasing { + + def unapply(plan: LogicalPlan) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan match { +case Project(_, child) if canProjectPushThrough(child) => + getAliasSubMap(plan, child) +case _ => None + } + + /** + * Replace nested columns to prune unused nested columns later. + */ + def replaceToAliases( + plan: LogicalPlan, + nestedFieldToAlias: Map[GetStructField, Alias], + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match { +case Project(projectList, child) => + Project( +getNewProjectList(projectList, nestedFieldToAlias), +replaceChildrenWithAliases(child, attrToAliases)) + } + + /** + * Return a replaced project list. + */ + private def getNewProjectList( + projectList: Seq[NamedExpression], + nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = { +projectList.map(_.transform { + case f: GetStructField if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +}.asInstanceOf[NamedExpression]) + } + + /** + * Return a plan with new childen replaced with aliases. + */ + private def replaceChildrenWithAliases( + plan: LogicalPlan, + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { +plan.withNewChildren(plan.children.map { plan => + Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) +}) + } + + /** + * Returns true for those operators that project can be pushed through. + */ + private def canProjectPushThrough(plan: LogicalPlan) = plan match { +case _: GlobalLimit => true +case _: LocalLimit => true +case _: Repartition => true +case _: Sample => true +case _ => false + } + + /** + * Return root references that are individually accessed as a whole, and `GetStructField`s. + */ + private def collectRootReferenceAndGetStructField(plan: LogicalPlan): Seq[Expression] = { +def helper(e: Expression): Seq[Expression] = e match { + case _: AttributeReference | _: GetStructField => Seq(e) + case es if es.children.nonEmpty => es.children.flatMap(helper) + case _ => Seq.empty +} +plan.expressions.flatMap(helper) + } + + /** + * Return two maps in order to replace nested fields to aliases. + * + * 1. GetStructField -> Alias: A new alias is created for each nested field. + * 2. ExprId -> Seq[Alias]: A reference attribute has multiple aliases pointing it. + */ + private def getAliasSubMap(plans: LogicalPlan*) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = { +val (nestedFieldReferences, otherRootReferences) = plans + .map(collectRootReferenceAndGetStructField).reduce(_ ++ _).partition { +case _: GetStructField => true +case _ => false + } + +val aliasSub = nestedFieldReferences.asInstanceOf[Seq[GetStructField]] Review comment: nit: Drop `.asInstanceOf[Seq[GetStructField]]`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.
[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r265850994 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.types._ + +/** + * This aims to handle a nested column aliasing pattern inside the `ColumnPruning` optimizer rule. + * If a project or its child references to nested fields, and not all the fields + * in a nested attribute are used, we can substitute them by alias attributes; then a project + * of the nested fields as aliases on the children of the child will be created. + */ +object NestedColumnAliasing { + + def unapply(plan: LogicalPlan) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan match { +case Project(_, child) if canProjectPushThrough(child) => + getAliasSubMap(plan, child) +case _ => None + } + + /** + * Replace nested columns to prune unused nested columns later. + */ + def replaceToAliases( + plan: LogicalPlan, + nestedFieldToAlias: Map[GetStructField, Alias], + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match { +case Project(projectList, child) => + Project( +getNewProjectList(projectList, nestedFieldToAlias), +replaceChildrenWithAliases(child, attrToAliases)) + } + + /** + * Return a replaced project list. + */ + private def getNewProjectList( + projectList: Seq[NamedExpression], + nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = { +projectList.map(_.transform { + case f: GetStructField if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +}.asInstanceOf[NamedExpression]) + } + + /** + * Return a plan with new childen replaced with aliases. + */ + private def replaceChildrenWithAliases( + plan: LogicalPlan, + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { +plan.withNewChildren(plan.children.map { plan => + Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) +}) + } + + /** + * Returns true for those operators that project can be pushed through. + */ + private def canProjectPushThrough(plan: LogicalPlan) = plan match { +case _: GlobalLimit => true +case _: LocalLimit => true +case _: Repartition => true +case _: Sample => true +case _ => false + } + + /** + * Return root references that are individually accessed as a whole, and `GetStructField`s. + */ + private def collectRootReferenceAndGetStructField(plan: LogicalPlan): Seq[Expression] = { +def helper(e: Expression): Seq[Expression] = e match { + case _: AttributeReference | _: GetStructField => Seq(e) + case es if es.children.nonEmpty => es.children.flatMap(helper) + case _ => Seq.empty +} +plan.expressions.flatMap(helper) + } + + /** + * Return two maps in order to replace nested fields to aliases. + * + * 1. GetStructField -> Alias: A new alias is created for each nested field. + * 2. ExprId -> Seq[Alias]: A reference attribute has multiple aliases pointing it. + */ + private def getAliasSubMap(plans: LogicalPlan*) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = { +val (nestedFieldReferences, otherRootReferences) = plans + .map(collectRootReferenceAndGetStructField).reduce(_ ++ _).partition { +case _: GetStructField => true +case _ => false + } + +val aliasSub = nestedFieldReferences.asInstanceOf[Seq[GetStructField]] + .filter(!_.references.subsetOf(AttributeSet(otherRootReferences))) + .groupBy(_.references.head) + .flatMap { case (attr: Attribute, nestedFields: Seq[GetStructField]) => Review comment: nit: `.flatMap { case (attr, nestedFields: Seq[GetStructField]) =>`
[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r265848830 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -647,6 +647,10 @@ object ColumnPruning extends Rule[LogicalPlan] { // Can't prune the columns on LeafNode case p @ Project(_, _: LeafNode) => p +case p @ NestedColumnAliasing(nestedFieldToAlias, attrToAliases) Review comment: We don't need to compute `getAliasSubMap` in `NestedColumnAliasing` if `nestedSchemaPruningEnabled` is false, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
maropu commented on a change in pull request #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#discussion_r265849055 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.types._ + +/** + * This aims to handle a nested column aliasing pattern inside the `ColumnPruning` optimizer rule. + * If a project or its child references to nested fields, and not all the fields + * in a nested attribute are used, we can substitute them by alias attributes; then a project + * of the nested fields as aliases on the children of the child will be created. + */ +object NestedColumnAliasing { + + def unapply(plan: LogicalPlan) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = plan match { +case Project(_, child) if canProjectPushThrough(child) => + getAliasSubMap(plan, child) +case _ => None + } + + /** + * Replace nested columns to prune unused nested columns later. + */ + def replaceToAliases( + plan: LogicalPlan, + nestedFieldToAlias: Map[GetStructField, Alias], + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = plan match { +case Project(projectList, child) => + Project( +getNewProjectList(projectList, nestedFieldToAlias), +replaceChildrenWithAliases(child, attrToAliases)) + } + + /** + * Return a replaced project list. + */ + private def getNewProjectList( + projectList: Seq[NamedExpression], + nestedFieldToAlias: Map[GetStructField, Alias]): Seq[NamedExpression] = { +projectList.map(_.transform { + case f: GetStructField if nestedFieldToAlias.contains(f) => +nestedFieldToAlias(f).toAttribute +}.asInstanceOf[NamedExpression]) + } + + /** + * Return a plan with new childen replaced with aliases. + */ + private def replaceChildrenWithAliases( + plan: LogicalPlan, + attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { +plan.withNewChildren(plan.children.map { plan => + Project(plan.output.flatMap(a => attrToAliases.getOrElse(a.exprId, Seq(a))), plan) +}) + } + + /** + * Returns true for those operators that project can be pushed through. + */ + private def canProjectPushThrough(plan: LogicalPlan) = plan match { +case _: GlobalLimit => true +case _: LocalLimit => true +case _: Repartition => true +case _: Sample => true +case _ => false + } + + /** + * Return root references that are individually accessed as a whole, and `GetStructField`s. + */ + private def collectRootReferenceAndGetStructField(plan: LogicalPlan): Seq[Expression] = { +def helper(e: Expression): Seq[Expression] = e match { + case _: AttributeReference | _: GetStructField => Seq(e) + case es if es.children.nonEmpty => es.children.flatMap(helper) + case _ => Seq.empty +} +plan.expressions.flatMap(helper) + } + + /** + * Return two maps in order to replace nested fields to aliases. + * + * 1. GetStructField -> Alias: A new alias is created for each nested field. + * 2. ExprId -> Seq[Alias]: A reference attribute has multiple aliases pointing it. + */ + private def getAliasSubMap(plans: LogicalPlan*) +: Option[(Map[GetStructField, Alias], Map[ExprId, Seq[Alias]])] = { +val (nestedFieldReferences, otherRootReferences) = plans + .map(collectRootReferenceAndGetStructField).reduce(_ ++ _).partition { +case _: GetStructField => true +case _ => false + } + +val aliasSub = nestedFieldReferences.asInstanceOf[Seq[GetStructField]] + .filter(!_.references.subsetOf(AttributeSet(otherRootReferences))) + .groupBy(_.references.head) + .flatMap { case (attr: Attribute, nestedFields: Seq[GetStructField]) => +// Each expression can contain multiple nested fields. +// Note that we keep the original
[GitHub] [spark] dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#issuecomment-473160057 @felixcheung I see.. Ok.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#issuecomment-473159139 so personally my preference is not have the hardcoded list of join type and checks in R, as you imagine it's problematic to keep it up to date. problem is often time an error in SQL is not readable in R. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24086: [SPARK-27155][Build]update oracle docker image name
felixcheung commented on a change in pull request #24086: [SPARK-27155][Build]update oracle docker image name URL: https://github.com/apache/spark/pull/24086#discussion_r265849492 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala ## @@ -55,7 +56,7 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo import testImplicits._ override val db = new DatabaseOnDocker { -override val imageName = "wnameless/oracle-xe-11g:16.04" +override val imageName = "deepdiver/docker-oracle-xe-11g:2.0" Review comment: agreed there.. one possible approach is to leave the image name as a parameter and document that someone needs to build one.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24086: [SPARK-27155][Build]update oracle docker image name
felixcheung commented on a change in pull request #24086: [SPARK-27155][Build]update oracle docker image name URL: https://github.com/apache/spark/pull/24086#discussion_r265849492 ## File path: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala ## @@ -55,7 +56,7 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo import testImplicits._ override val db = new DatabaseOnDocker { -override val imageName = "wnameless/oracle-xe-11g:16.04" +override val imageName = "deepdiver/docker-oracle-xe-11g:2.0" Review comment: agreed there.. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265849362 ## File path: R/pkg/R/DataFrame.R ## @@ -2520,8 +2520,9 @@ setMethod("dropDuplicates", #' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is #' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead. #' @param joinType The type of join to perform, default 'inner'. -#' Must be one of: 'inner', 'cross', 'outer', 'full', 'full_outer', -#' 'left', 'left_outer', 'right', 'right_outer', 'left_semi', or 'left_anti'. +#' Must be one of: 'inner', 'cross', 'outer', 'full', 'fullouter', 'full_outer', +#' 'left', 'leftouter', 'left_outer', 'right', 'rightouter', 'right_outer', 'semi', +# 'leftsemi', 'left_semi', 'anti', 'leftanti', 'left_anti'. Review comment: @felixcheung Thanks a lot. Will fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265849398 ## File path: R/pkg/R/DataFrame.R ## @@ -2553,14 +2554,14 @@ setMethod("join", "outer", "full", "fullouter", "full_outer", "left", "leftouter", "left_outer", "right", "rightouter", "right_outer", -"left_semi", "leftsemi", "left_anti", "leftanti")) { +"semi", "left_semi", "leftsemi", "anti", "left_anti", "leftanti")) { joinType <- gsub("_", "", joinType) sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) } else { - stop("joinType must be one of the following types: ", - "'inner', 'cross', 'outer', 'full', 'full_outer',", - "'left', 'left_outer', 'right', 'right_outer',", - "'left_semi', or 'left_anti'.") + stop(paste("joinType must be one of the following types: ", Review comment: @felixcheung Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#issuecomment-473158056 @felixcheung > I would prefer expect_error as well Yeah.. i had already made the change after @HyukjinKwon 's comment :-). I was running the test to make sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
dilipbiswal commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265848916 ## File path: R/pkg/tests/fulltests/test_sparkSQL.R ## @@ -2356,40 +2356,96 @@ test_that("join(), crossJoin() and merge() on a DataFrame", { expect_equal(names(joined2), c("age", "name", "name", "test")) expect_equal(count(joined2), 3) - joined3 <- join(df, df2, df$name == df2$name, "rightouter") + joined3 <- join(df, df2, df$name == df2$name, "right") expect_equal(names(joined3), c("age", "name", "name", "test")) expect_equal(count(joined3), 4) expect_true(is.na(collect(orderBy(joined3, joined3$age))$age[2])) - - joined4 <- select(join(df, df2, df$name == df2$name, "outer"), -alias(df$age + 5, "newAge"), df$name, df2$test) - expect_equal(names(joined4), c("newAge", "name", "test")) + + joined4 <- join(df, df2, df$name == df2$name, "right_outer") + expect_equal(names(joined4), c("age", "name", "name", "test")) expect_equal(count(joined4), 4) - expect_equal(collect(orderBy(joined4, joined4$name))$newAge[3], 24) + expect_true(is.na(collect(orderBy(joined4, joined4$age))$age[2])) - joined5 <- join(df, df2, df$name == df2$name, "leftouter") + joined5 <- join(df, df2, df$name == df2$name, "rightouter") expect_equal(names(joined5), c("age", "name", "name", "test")) - expect_equal(count(joined5), 3) - expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[1])) - - joined6 <- join(df, df2, df$name == df2$name, "inner") - expect_equal(names(joined6), c("age", "name", "name", "test")) - expect_equal(count(joined6), 3) + expect_equal(count(joined5), 4) + expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[2])) - joined7 <- join(df, df2, df$name == df2$name, "leftsemi") - expect_equal(names(joined7), c("age", "name")) - expect_equal(count(joined7), 3) - joined8 <- join(df, df2, df$name == df2$name, "left_outer") - expect_equal(names(joined8), c("age", "name", "name", "test")) - expect_equal(count(joined8), 3) - expect_true(is.na(collect(orderBy(joined8, joined8$age))$age[1])) - - joined9 <- join(df, df2, df$name == df2$name, "right_outer") - expect_equal(names(joined9), c("age", "name", "name", "test")) + joined6 <- select(join(df, df2, df$name == df2$name, "outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined6), c("newAge", "name", "test")) + expect_equal(count(joined6), 4) + expect_equal(collect(orderBy(joined6, joined6$name))$newAge[3], 24) + + joined7 <- select(join(df, df2, df$name == df2$name, "full"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined7), c("newAge", "name", "test")) + expect_equal(count(joined7), 4) + expect_equal(collect(orderBy(joined7, joined7$name))$newAge[3], 24) + + joined8 <- select(join(df, df2, df$name == df2$name, "fullouter"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined8), c("newAge", "name", "test")) + expect_equal(count(joined8), 4) + expect_equal(collect(orderBy(joined8, joined8$name))$newAge[3], 24) + + joined9 <- select(join(df, df2, df$name == df2$name, "full_outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined9), c("newAge", "name", "test")) expect_equal(count(joined9), 4) - expect_true(is.na(collect(orderBy(joined9, joined9$age))$age[2])) - + expect_equal(collect(orderBy(joined9, joined9$name))$newAge[3], 24) + + joined10 <- join(df, df2, df$name == df2$name, "left") + expect_equal(names(joined10), c("age", "name", "name", "test")) + expect_equal(count(joined10), 3) + expect_true(is.na(collect(orderBy(joined10, joined10$age))$age[1])) + + joined11 <- join(df, df2, df$name == df2$name, "leftouter") + expect_equal(names(joined11), c("age", "name", "name", "test")) + expect_equal(count(joined11), 3) + expect_true(is.na(collect(orderBy(joined11, joined11$age))$age[1])) + + joined12 <- join(df, df2, df$name == df2$name, "left_outer") + expect_equal(names(joined12), c("age", "name", "name", "test")) + expect_equal(count(joined12), 3) + expect_true(is.na(collect(orderBy(joined12, joined12$age))$age[1])) + + joined13 <- join(df, df2, df$name == df2$name, "inner") + expect_equal(names(joined13), c("age", "name", "name", "test")) + expect_equal(count(joined13), 3) + + joined14 <- join(df, df2, df$name == df2$name, "semi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined14 <- join(df, df2, df$name == df2$name, "leftsemi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined15 <- join(df, df2, df$name == df2$name, "left_semi") + expect_equal(names(joined15), c("age", "name")) +
[GitHub] [spark] sandeep-katta commented on issue #24067: [SPARK-27135][WebUI]Add ToolTip support for overflow text
sandeep-katta commented on issue #24067: [SPARK-27135][WebUI]Add ToolTip support for overflow text URL: https://github.com/apache/spark/pull/24067#issuecomment-473156967 @gengliangwang is this tooltip approach is okay ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on issue #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#issuecomment-473157006 this is https://github.com/apache/spark/pull/24087#discussion_r265847681 we need to fix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long
felixcheung commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long URL: https://github.com/apache/spark/pull/24019#discussion_r265848278 ## File path: R/pkg/NAMESPACE ## @@ -245,6 +245,7 @@ exportMethods("%<=>%", "current_date", "current_timestamp", "hash", + "xxhash64", Review comment: this is sorted (or should be, except obvious problem with "hash") This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long
felixcheung commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long URL: https://github.com/apache/spark/pull/24019#discussion_r265848388 ## File path: R/pkg/R/generics.R ## @@ -889,6 +889,10 @@ setGeneric("create_map", function(x, ...) { standardGeneric("create_map") }) #' @name NULL setGeneric("hash", function(x, ...) { standardGeneric("hash") }) +#' @rdname column_misc_functions +#' @name NULL +setGeneric("xxhash64", function(x, ...) { standardGeneric("xxhash64") }) + Review comment: same should be sort here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chakravarthiT commented on issue #24051: [SPARK-26879][SQL] Standardize one-based column indexing for stack and json_tuple function
chakravarthiT commented on issue #24051: [SPARK-26879][SQL] Standardize one-based column indexing for stack and json_tuple function URL: https://github.com/apache/spark/pull/24051#issuecomment-473156733 @maropu @viirya please review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265847869 ## File path: R/pkg/R/DataFrame.R ## @@ -2553,14 +2554,14 @@ setMethod("join", "outer", "full", "fullouter", "full_outer", "left", "leftouter", "left_outer", "right", "rightouter", "right_outer", -"left_semi", "leftsemi", "left_anti", "leftanti")) { +"semi", "left_semi", "leftsemi", "anti", "left_anti", "leftanti")) { joinType <- gsub("_", "", joinType) sdf <- callJMethod(x@sdf, "join", y@sdf, joinExpr@jc, joinType) } else { - stop("joinType must be one of the following types: ", - "'inner', 'cross', 'outer', 'full', 'full_outer',", - "'left', 'left_outer', 'right', 'right_outer',", - "'left_semi', or 'left_anti'.") + stop(paste("joinType must be one of the following types: ", Review comment: remove the space at the end of `types: ` - paste adds space This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265848085 ## File path: R/pkg/tests/fulltests/test_sparkSQL.R ## @@ -2356,40 +2356,96 @@ test_that("join(), crossJoin() and merge() on a DataFrame", { expect_equal(names(joined2), c("age", "name", "name", "test")) expect_equal(count(joined2), 3) - joined3 <- join(df, df2, df$name == df2$name, "rightouter") + joined3 <- join(df, df2, df$name == df2$name, "right") expect_equal(names(joined3), c("age", "name", "name", "test")) expect_equal(count(joined3), 4) expect_true(is.na(collect(orderBy(joined3, joined3$age))$age[2])) - - joined4 <- select(join(df, df2, df$name == df2$name, "outer"), -alias(df$age + 5, "newAge"), df$name, df2$test) - expect_equal(names(joined4), c("newAge", "name", "test")) + + joined4 <- join(df, df2, df$name == df2$name, "right_outer") + expect_equal(names(joined4), c("age", "name", "name", "test")) expect_equal(count(joined4), 4) - expect_equal(collect(orderBy(joined4, joined4$name))$newAge[3], 24) + expect_true(is.na(collect(orderBy(joined4, joined4$age))$age[2])) - joined5 <- join(df, df2, df$name == df2$name, "leftouter") + joined5 <- join(df, df2, df$name == df2$name, "rightouter") expect_equal(names(joined5), c("age", "name", "name", "test")) - expect_equal(count(joined5), 3) - expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[1])) - - joined6 <- join(df, df2, df$name == df2$name, "inner") - expect_equal(names(joined6), c("age", "name", "name", "test")) - expect_equal(count(joined6), 3) + expect_equal(count(joined5), 4) + expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[2])) - joined7 <- join(df, df2, df$name == df2$name, "leftsemi") - expect_equal(names(joined7), c("age", "name")) - expect_equal(count(joined7), 3) - joined8 <- join(df, df2, df$name == df2$name, "left_outer") - expect_equal(names(joined8), c("age", "name", "name", "test")) - expect_equal(count(joined8), 3) - expect_true(is.na(collect(orderBy(joined8, joined8$age))$age[1])) - - joined9 <- join(df, df2, df$name == df2$name, "right_outer") - expect_equal(names(joined9), c("age", "name", "name", "test")) + joined6 <- select(join(df, df2, df$name == df2$name, "outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined6), c("newAge", "name", "test")) + expect_equal(count(joined6), 4) + expect_equal(collect(orderBy(joined6, joined6$name))$newAge[3], 24) + + joined7 <- select(join(df, df2, df$name == df2$name, "full"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined7), c("newAge", "name", "test")) + expect_equal(count(joined7), 4) + expect_equal(collect(orderBy(joined7, joined7$name))$newAge[3], 24) + + joined8 <- select(join(df, df2, df$name == df2$name, "fullouter"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined8), c("newAge", "name", "test")) + expect_equal(count(joined8), 4) + expect_equal(collect(orderBy(joined8, joined8$name))$newAge[3], 24) + + joined9 <- select(join(df, df2, df$name == df2$name, "full_outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined9), c("newAge", "name", "test")) expect_equal(count(joined9), 4) - expect_true(is.na(collect(orderBy(joined9, joined9$age))$age[2])) - + expect_equal(collect(orderBy(joined9, joined9$name))$newAge[3], 24) + + joined10 <- join(df, df2, df$name == df2$name, "left") + expect_equal(names(joined10), c("age", "name", "name", "test")) + expect_equal(count(joined10), 3) + expect_true(is.na(collect(orderBy(joined10, joined10$age))$age[1])) + + joined11 <- join(df, df2, df$name == df2$name, "leftouter") + expect_equal(names(joined11), c("age", "name", "name", "test")) + expect_equal(count(joined11), 3) + expect_true(is.na(collect(orderBy(joined11, joined11$age))$age[1])) + + joined12 <- join(df, df2, df$name == df2$name, "left_outer") + expect_equal(names(joined12), c("age", "name", "name", "test")) + expect_equal(count(joined12), 3) + expect_true(is.na(collect(orderBy(joined12, joined12$age))$age[1])) + + joined13 <- join(df, df2, df$name == df2$name, "inner") + expect_equal(names(joined13), c("age", "name", "name", "test")) + expect_equal(count(joined13), 3) + + joined14 <- join(df, df2, df$name == df2$name, "semi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined14 <- join(df, df2, df$name == df2$name, "leftsemi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined15 <- join(df, df2, df$name == df2$name, "left_semi") + expect_equal(names(joined15), c("age", "name")) +
[GitHub] [spark] felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265847681 ## File path: R/pkg/R/DataFrame.R ## @@ -2520,8 +2520,9 @@ setMethod("dropDuplicates", #' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is #' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead. #' @param joinType The type of join to perform, default 'inner'. -#' Must be one of: 'inner', 'cross', 'outer', 'full', 'full_outer', -#' 'left', 'left_outer', 'right', 'right_outer', 'left_semi', or 'left_anti'. +#' Must be one of: 'inner', 'cross', 'outer', 'full', 'fullouter', 'full_outer', +#' 'left', 'leftouter', 'left_outer', 'right', 'rightouter', 'right_outer', 'semi', +# 'leftsemi', 'left_semi', 'anti', 'leftanti', 'left_anti'. Review comment: missing `'` in `#'` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r
felixcheung commented on a change in pull request #24087: [SPARK-27096][SQL][FOLLOWUP] Do the correct validation of join types in R side and fix join docs for scala, python and r URL: https://github.com/apache/spark/pull/24087#discussion_r265848033 ## File path: R/pkg/tests/fulltests/test_sparkSQL.R ## @@ -2356,40 +2356,96 @@ test_that("join(), crossJoin() and merge() on a DataFrame", { expect_equal(names(joined2), c("age", "name", "name", "test")) expect_equal(count(joined2), 3) - joined3 <- join(df, df2, df$name == df2$name, "rightouter") + joined3 <- join(df, df2, df$name == df2$name, "right") expect_equal(names(joined3), c("age", "name", "name", "test")) expect_equal(count(joined3), 4) expect_true(is.na(collect(orderBy(joined3, joined3$age))$age[2])) - - joined4 <- select(join(df, df2, df$name == df2$name, "outer"), -alias(df$age + 5, "newAge"), df$name, df2$test) - expect_equal(names(joined4), c("newAge", "name", "test")) + + joined4 <- join(df, df2, df$name == df2$name, "right_outer") + expect_equal(names(joined4), c("age", "name", "name", "test")) expect_equal(count(joined4), 4) - expect_equal(collect(orderBy(joined4, joined4$name))$newAge[3], 24) + expect_true(is.na(collect(orderBy(joined4, joined4$age))$age[2])) - joined5 <- join(df, df2, df$name == df2$name, "leftouter") + joined5 <- join(df, df2, df$name == df2$name, "rightouter") expect_equal(names(joined5), c("age", "name", "name", "test")) - expect_equal(count(joined5), 3) - expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[1])) - - joined6 <- join(df, df2, df$name == df2$name, "inner") - expect_equal(names(joined6), c("age", "name", "name", "test")) - expect_equal(count(joined6), 3) + expect_equal(count(joined5), 4) + expect_true(is.na(collect(orderBy(joined5, joined5$age))$age[2])) - joined7 <- join(df, df2, df$name == df2$name, "leftsemi") - expect_equal(names(joined7), c("age", "name")) - expect_equal(count(joined7), 3) - joined8 <- join(df, df2, df$name == df2$name, "left_outer") - expect_equal(names(joined8), c("age", "name", "name", "test")) - expect_equal(count(joined8), 3) - expect_true(is.na(collect(orderBy(joined8, joined8$age))$age[1])) - - joined9 <- join(df, df2, df$name == df2$name, "right_outer") - expect_equal(names(joined9), c("age", "name", "name", "test")) + joined6 <- select(join(df, df2, df$name == df2$name, "outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined6), c("newAge", "name", "test")) + expect_equal(count(joined6), 4) + expect_equal(collect(orderBy(joined6, joined6$name))$newAge[3], 24) + + joined7 <- select(join(df, df2, df$name == df2$name, "full"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined7), c("newAge", "name", "test")) + expect_equal(count(joined7), 4) + expect_equal(collect(orderBy(joined7, joined7$name))$newAge[3], 24) + + joined8 <- select(join(df, df2, df$name == df2$name, "fullouter"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined8), c("newAge", "name", "test")) + expect_equal(count(joined8), 4) + expect_equal(collect(orderBy(joined8, joined8$name))$newAge[3], 24) + + joined9 <- select(join(df, df2, df$name == df2$name, "full_outer"), +alias(df$age + 5, "newAge"), df$name, df2$test) + expect_equal(names(joined9), c("newAge", "name", "test")) expect_equal(count(joined9), 4) - expect_true(is.na(collect(orderBy(joined9, joined9$age))$age[2])) - + expect_equal(collect(orderBy(joined9, joined9$name))$newAge[3], 24) + + joined10 <- join(df, df2, df$name == df2$name, "left") + expect_equal(names(joined10), c("age", "name", "name", "test")) + expect_equal(count(joined10), 3) + expect_true(is.na(collect(orderBy(joined10, joined10$age))$age[1])) + + joined11 <- join(df, df2, df$name == df2$name, "leftouter") + expect_equal(names(joined11), c("age", "name", "name", "test")) + expect_equal(count(joined11), 3) + expect_true(is.na(collect(orderBy(joined11, joined11$age))$age[1])) + + joined12 <- join(df, df2, df$name == df2$name, "left_outer") + expect_equal(names(joined12), c("age", "name", "name", "test")) + expect_equal(count(joined12), 3) + expect_true(is.na(collect(orderBy(joined12, joined12$age))$age[1])) + + joined13 <- join(df, df2, df$name == df2$name, "inner") + expect_equal(names(joined13), c("age", "name", "name", "test")) + expect_equal(count(joined13), 3) + + joined14 <- join(df, df2, df$name == df2$name, "semi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined14 <- join(df, df2, df$name == df2$name, "leftsemi") + expect_equal(names(joined14), c("age", "name")) + expect_equal(count(joined14), 3) + + joined15 <- join(df, df2, df$name == df2$name, "left_semi") + expect_equal(names(joined15), c("age", "name")) +
[GitHub] [spark] sujith71955 commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
sujith71955 commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#discussion_r265847639 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ## @@ -155,7 +155,7 @@ object HiveAnalysis extends Rule[LogicalPlan] { CreateTableCommand(tableDesc, ignoreIfExists = mode == SaveMode.Ignore) case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => - DDLUtils.checkDataColNames(tableDesc) + DDLUtils.checkDataColNames(tableDesc.copy(schema = query.schema)) Review comment: Both are called from different rules, i will check how to unify This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
SparkQA removed a comment on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#issuecomment-473111856 **[Test build #103524 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103524/testReport)** for PR 24075 at commit [`6d162fc`](https://github.com/apache/spark/commit/6d162fc190843d56eef3d81698425ef5ce98ddb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
SparkQA commented on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#issuecomment-473155253 **[Test build #103524 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103524/testReport)** for PR 24075 at commit [`6d162fc`](https://github.com/apache/spark/commit/6d162fc190843d56eef3d81698425ef5ce98ddb7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sujith71955 commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
sujith71955 commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#discussion_r265847163 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ## @@ -155,7 +155,7 @@ object HiveAnalysis extends Rule[LogicalPlan] { CreateTableCommand(tableDesc, ignoreIfExists = mode == SaveMode.Ignore) case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => - DDLUtils.checkDataColNames(tableDesc) + DDLUtils.checkDataColNames(tableDesc.copy(schema = query.schema)) Review comment: sure, let me check. thanks for your input. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
AmplabJenkins removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473155058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103522/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
AmplabJenkins commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473155058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103522/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
AmplabJenkins removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473154849 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
AmplabJenkins commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473154849 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
cloud-fan commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265845089 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords Review comment: Yea, this document is about keywords, not everything about the ansi mode. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
cloud-fan commented on a change in pull request #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#discussion_r265844898 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ## @@ -155,7 +155,7 @@ object HiveAnalysis extends Rule[LogicalPlan] { CreateTableCommand(tableDesc, ignoreIfExists = mode == SaveMode.Ignore) case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => - DDLUtils.checkDataColNames(tableDesc) + DDLUtils.checkDataColNames(tableDesc.copy(schema = query.schema)) Review comment: can we unify this check for both data source table and hive serde table? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
dongjoon-hyun edited a comment on issue #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#issuecomment-473152043 Do you have any other concerns, @maropu and @viirya ? Every comments are welcome. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition
dongjoon-hyun commented on issue #23964: [SPARK-26975][SQL] Support nested-column pruning over limit/sample/repartition URL: https://github.com/apache/spark/pull/23964#issuecomment-473152043 Do you have any other concerns, @maropu and @viirya ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #24066: [SPARK-27132][SQL] Improve file source V2 framework
cloud-fan closed pull request #24066: [SPARK-27132][SQL] Improve file source V2 framework URL: https://github.com/apache/spark/pull/24066 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265843483 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ## @@ -96,6 +98,11 @@ private[sql] class SessionState( hadoopConf } + def newHadoopConfWithCaseInsensitiveOptions(options: CaseInsensitiveStringMap): Configuration = { Review comment: Then we should document it in `CaseInsensitiveMap`. data source developers can't access `SessionState` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24066: [SPARK-27132][SQL] Improve file source V2 framework
cloud-fan commented on issue #24066: [SPARK-27132][SQL] Improve file source V2 framework URL: https://github.com/apache/spark/pull/24066#issuecomment-473150319 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265842993 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/util/CaseInsensitiveStringMap.java ## @@ -78,11 +81,13 @@ public String get(Object key) { @Override public String put(String key, String value) { +original.put(key, value); Review comment: The thing worries me most is the inconsistency between the case insensitive map and the original map. I think we should either fail or keep the latter entry if `a -> 1, A -> 2` appears together. One thing we can simplify is, `CaseInsensitiveStringMap` is read by data source and can be read-only. Then it can be easier to resolve conflicting entries at the beginning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24047: [SPARK-25196][SQL] Extends Analyze commands for cached tables
dongjoon-hyun commented on a change in pull request #24047: [SPARK-25196][SQL] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/24047#discussion_r265842727 ## File path: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ## @@ -470,4 +471,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("analyzes column statistics in cached query") { +withTempView("cachedTempView", "tempView") { + spark.sql( +"""CACHE TABLE cachedTempView AS + | SELECT c0, avg(c1) AS v1, avg(c2) AS v2 + | FROM (SELECT id % 3 AS c0, id % 5 AS c1, 2 AS c2 FROM range(1, 30)) + | GROUP BY c0 +""".stripMargin) + + // Analyzes one column in the cached logical plan + spark.sql("ANALYZE TABLE cachedTempView COMPUTE STATISTICS FOR COLUMNS v1") + val queryStats1 = spark.table("cachedTempView").queryExecution +.optimizedPlan.stats.attributeStats + assert(queryStats1.map(_._1.name).toSet === Set("v1")) + + // Analyzes two more columns + spark.sql("ANALYZE TABLE cachedTempView COMPUTE STATISTICS FOR COLUMNS c0, v2") + val queryStats2 = spark.table("cachedTempView").queryExecution +.optimizedPlan.stats.attributeStats + assert(queryStats2.map(_._1.name).toSet === Set("c0", "v1", "v2")) + + // Analyzes in a temporary table + spark.sql("CREATE TEMPORARY VIEW tempView AS SELECT * FROM range(1, 30)") + val errMsg = intercept[NoSuchTableException] { +spark.sql("ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS id") + }.getMessage + assert(errMsg.contains("Table or view 'tempView' not found in database 'default'")) +} Review comment: Also, please add a test coverage on the global temp view. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24047: [SPARK-25196][SQL] Extends Analyze commands for cached tables
dongjoon-hyun commented on a change in pull request #24047: [SPARK-25196][SQL] Extends Analyze commands for cached tables URL: https://github.com/apache/spark/pull/24047#discussion_r265842545 ## File path: sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ## @@ -470,4 +471,34 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared } } } + + test("analyzes column statistics in cached query") { +withTempView("cachedTempView", "tempView") { + spark.sql( +"""CACHE TABLE cachedTempView AS Review comment: Maybe, `cachedQuery` is better than `cachedTempView`? For me, `cachedTempView` sounds like the following. ```sql CREATE TEMPORARY VIEW tempView AS ... CACHE TABLE tempView ``` We can rename this from `cachedTempView` to `cachedQuery` first. Then, we can add a new test case for the real cached temp views of the above SQL case before line 496. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473149230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103521/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473149230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103521/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473149020 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473149020 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #23680: [SPARK-26756][SQL] Support session conf for thriftserver
HyukjinKwon commented on issue #23680: [SPARK-26756][SQL] Support session conf for thriftserver URL: https://github.com/apache/spark/pull/23680#issuecomment-473146980 Can you provide reproducible steps? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #23680: [SPARK-26756][SQL] Support session conf for thriftserver
HyukjinKwon commented on issue #23680: [SPARK-26756][SQL] Support session conf for thriftserver URL: https://github.com/apache/spark/pull/23680#issuecomment-473146948 ?? do you mean we cannot set the configuration by `set ...` via Spark thriftserver if we use `beeline`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
gengliangwang commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265840525 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ## @@ -96,6 +98,11 @@ private[sql] class SessionState( hadoopConf } + def newHadoopConfWithCaseInsensitiveOptions(options: CaseInsensitiveStringMap): Configuration = { Review comment: Otherwise, developers might not be aware of using `.getOriginalMap` if they want to create Hadoop configuration from CaseInsensitiveStringMap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
dongjoon-hyun closed pull request #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting
dongjoon-hyun commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting URL: https://github.com/apache/spark/pull/24092#issuecomment-473145953 Yes, right, @sadhen . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
dongjoon-hyun commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473145603 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
SparkQA removed a comment on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473100183 **[Test build #103522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103522/testReport)** for PR 24098 at commit [`9263218`](https://github.com/apache/spark/commit/9263218ae5436b3fb780b6e733876ff92c7d81a5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on issue #24090: [SPARK-27157][DOCS] Add Executor level metrics to monitoring docs
LantaoJin commented on issue #24090: [SPARK-27157][DOCS] Add Executor level metrics to monitoring docs URL: https://github.com/apache/spark/pull/24090#issuecomment-473143954 > This is probably OK, but are these metrics things that Spark generates or that are generated automatically by Ganglia et al? that is, do we need to document them or point at existing external docs? @srowen They are generated by Spark, see `ExecutorMetricType` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level
SparkQA commented on issue #24098: [SPARK-27166][SQL] Improve `printSchema` to print up to the given level URL: https://github.com/apache/spark/pull/24098#issuecomment-473145100 **[Test build #103522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103522/testReport)** for PR 24098 at commit [`9263218`](https://github.com/apache/spark/commit/9263218ae5436b3fb780b6e733876ff92c7d81a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473144556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103520/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473144556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103520/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadhen commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting
sadhen commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting URL: https://github.com/apache/spark/pull/24092#issuecomment-473144398 @dongjoon-hyun @cloud-fan Do you mean generating an ORC file with DecimalType, and read it using the native reader with predicate push down? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473144293 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
dongjoon-hyun closed pull request #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
AmplabJenkins commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473144293 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
SparkQA removed a comment on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473090853 **[Test build #103521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103521/testReport)** for PR 24096 at commit [`91536da`](https://github.com/apache/spark/commit/91536da18f3d01ea9820b64b38ad54320337151b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
dongjoon-hyun commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473143440 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
SparkQA removed a comment on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473090832 **[Test build #103520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103520/testReport)** for PR 24097 at commit [`2da98a7`](https://github.com/apache/spark/commit/2da98a7c68768f1450775ee931e6c562984200ec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5
SparkQA commented on issue #24096: [SPARK-27165][SPARK-27107][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24096#issuecomment-473143393 **[Test build #103521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103521/testReport)** for PR 24096 at commit [`91536da`](https://github.com/apache/spark/commit/91536da18f3d01ea9820b64b38ad54320337151b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
dongjoon-hyun commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473143223 Merged to branch-2.4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5
SparkQA commented on issue #24097: [SPARK-27165][SPARK-27107][BRANCH-2.4][BUILD][SQL] Upgrade Apache ORC to 1.5.5 URL: https://github.com/apache/spark/pull/24097#issuecomment-473143176 **[Test build #103520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103520/testReport)** for PR 24097 at commit [`2da98a7`](https://github.com/apache/spark/commit/2da98a7c68768f1450775ee931e6c562984200ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265837482 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ## @@ -96,6 +98,11 @@ private[sql] class SessionState( hadoopConf } + def newHadoopConfWithCaseInsensitiveOptions(options: CaseInsensitiveStringMap): Configuration = { Review comment: I don't think we should pollute `SessionState` with the case insensitive map stuff. Can we inline this method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265837478 ## File path: sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ## @@ -96,6 +98,11 @@ private[sql] class SessionState( hadoopConf } + def newHadoopConfWithCaseInsensitiveOptions(options: CaseInsensitiveStringMap): Configuration = { Review comment: I don't think we should pollute `SessionState` with the case insensitive map stuff. Can we inline this method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on a change in pull request #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#discussion_r265837236 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/util/CaseInsensitiveStringMap.java ## @@ -40,9 +40,12 @@ public static CaseInsensitiveStringMap empty() { return new CaseInsensitiveStringMap(new HashMap<>(0)); } + private final Map original; + private final Map delegate; public CaseInsensitiveStringMap(Map originalMap) { +this.original = new HashMap<>(originalMap); Review comment: this should be `new HashMap<>(originalMap.size);`, otherwise we add data to it twice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap
cloud-fan commented on issue #24094: [SPARK-27162][SQL] Add new method getOriginalMap in CaseInsensitiveStringMap URL: https://github.com/apache/spark/pull/24094#issuecomment-473142525 AFAIK hadoop conf can be set in 3 ways: 1. global level, via `SparkContext.hadoopConfiguration` 2. session level, via `SparkSession.conf` 3. operation level, via `DataFrameReader/Writer.option` 1 and 2 are fine, as they are case sensitive. The problem is 3, as data source v2 treats options as case-insensitive. There are 2 solutions I can think of 1. Do not support operation level hadoop conf for data source v2. 2. Keep the original case sensitive map. I think 2 is more reasonable, which is this PR trying to do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473140844 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473140850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103519/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473140850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103519/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473140844 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer edited a comment on issue #23950: [SPARK-27140][SQL]The feature is 'insert overwrite local directory' has an inconsistent behavior in different environment.
beliefer edited a comment on issue #23950: [SPARK-27140][SQL]The feature is 'insert overwrite local directory' has an inconsistent behavior in different environment. URL: https://github.com/apache/spark/pull/23950#issuecomment-472740651 cc @maropu @gatorsmile @dongjoon-hyun @janewangfb @cloud-fan Please help me,to find the reason.Thanks a lot! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
SparkQA commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473139818 **[Test build #103519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103519/testReport)** for PR 24072 at commit [`09f9b47`](https://github.com/apache/spark/commit/09f9b4767b3f8b94b8ef1ae956d46e7158d50b9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
SparkQA removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473085987 **[Test build #103519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/103519/testReport)** for PR 24072 at commit [`09f9b47`](https://github.com/apache/spark/commit/09f9b4767b3f8b94b8ef1ae956d46e7158d50b9d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sujith71955 commented on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS`
sujith71955 commented on issue #24075: [SPARK-26176][SQL] Verify column names for CTAS with `STORED AS` URL: https://github.com/apache/spark/pull/24075#issuecomment-473139102 > Thank you for pinging me, @sujith71955 . > > * I updated the PR description slightly and triggered a new testing since there was no successful run until now. > * In addition, I update this JIRA as an `Improvement` since the previous and new behavior are just the same except raising the better exceptions for UX. Sure. Thanks :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting
dongjoon-hyun commented on issue #24092: [SPARK-27160][SQL] Fix DecimalType literal casting URL: https://github.com/apache/spark/pull/24092#issuecomment-473138559 +1 for @cloud-fan 's opinion. @sadhen , could you add another test case for that? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin closed pull request #23380: [SPARK-26343][KUBERNETES] Try to speed up running local k8s integration tests
vanzin closed pull request #23380: [SPARK-26343][KUBERNETES] Try to speed up running local k8s integration tests URL: https://github.com/apache/spark/pull/23380 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on issue #23380: [SPARK-26343][KUBERNETES] Try to speed up running local k8s integration tests
vanzin commented on issue #23380: [SPARK-26343][KUBERNETES] Try to speed up running local k8s integration tests URL: https://github.com/apache/spark/pull/23380#issuecomment-473137259 Merging to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long
maropu commented on a change in pull request #24019: [SPARK-27099][SQL] Add 'xxhash64' for hashing arbitrary columns to Long URL: https://github.com/apache/spark/pull/24019#discussion_r265832533 ## File path: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ## @@ -2167,6 +2167,19 @@ object functions { new Murmur3Hash(cols.map(_.expr)) } + /** + * Calculates the hash code of given columns using the 64-bit + * variant of the xxHash algorithm, and returns the result as a long + * column. + * + * @group misc_funcs + * @since 2.4.1 + */ + @scala.annotation.varargs + def xxhash64(cols: Column*): Column = withExpr { Review comment: Ah, I see. Its ok as it it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vanzin commented on a change in pull request #24088: [SPARK-27122][core] Jetty classes must not be return via getters in org.apache.spark.ui.WebUI
vanzin commented on a change in pull request #24088: [SPARK-27122][core] Jetty classes must not be return via getters in org.apache.spark.ui.WebUI URL: https://github.com/apache/spark/pull/24088#discussion_r265832256 ## File path: core/src/main/scala/org/apache/spark/ui/WebUI.scala ## @@ -95,6 +101,16 @@ private[spark] abstract class WebUI( serverInfo.foreach(_.addHandler(handler, securityManager)) } + /** Attaches a handler to this UI. */ + def attachHandler(contextPath: String, Review comment: This is not the right style. See the class's constructor for an example. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265824998 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords --- -In Spark SQL, there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a -special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names, -column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as -table alias, but can be used as other identifiers. +When `spark.sql.parser.ansi.enabled` is true, Spark SQL has two kinds of keywords: +* Reserved keywords: Keywords that reserved and can't be used as identifiers for table, view, column, alias, etc. Review comment: nit: `* Reserved keywords: Keywords that are reserved and can't be used as identifiers for tables, views, columns, aliases, etc.`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265825069 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords --- -In Spark SQL, there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a -special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names, -column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as -table alias, but can be used as other identifiers. +When `spark.sql.parser.ansi.enabled` is true, Spark SQL has two kinds of keywords: +* Reserved keywords: Keywords that reserved and can't be used as identifiers for table, view, column, alias, etc. +* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. Review comment: nit: `in other contexts.` -> `in the other contexts, e.g., SELECT 1 WEEK means interval type data, but WEEK can be used as identifiers`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265828883 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -1215,6 +1232,9 @@ nonReserved | YEARS ; +// +// Start of the keywords list +// SELECT: 'SELECT'; Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265828436 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords --- -In Spark SQL, there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a -special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names, -column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as -table alias, but can be used as other identifiers. +When `spark.sql.parser.ansi.enabled` is true, Spark SQL has two kinds of keywords: +* Reserved keywords: Keywords that reserved and can't be used as identifiers for table, view, column, alias, etc. +* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. -The list of reserved and non-reserved keywords can change according to the config -`spark.sql.parser.ansi.enabled`, which is false by default. +When `spark.sql.parser.ansi.enabled` is false, Spark SQL has two kinds of keywords: +* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. +* Strict-non-reserved keywords: A strict version of non-reserved keywords, which can not be used as table alias. + +By default `spark.sql.parser.ansi.enabled` is false. + +Below is a list of all the keywords in Spark SQL. Review comment: ok, I'll check and fix as followup. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265830237 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords --- -In Spark SQL, there are 2 kinds of keywords: non-reserved and reserved. Non-reserved keywords have a -special meaning only in particular contexts and can be used as identifiers (e.g., table names, view names, -column names, column aliases, table aliases) in other contexts. Reserved keywords can't be used as -table alias, but can be used as other identifiers. +When `spark.sql.parser.ansi.enabled` is true, Spark SQL has two kinds of keywords: +* Reserved keywords: Keywords that reserved and can't be used as identifiers for table, view, column, alias, etc. +* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. -The list of reserved and non-reserved keywords can change according to the config -`spark.sql.parser.ansi.enabled`, which is false by default. +When `spark.sql.parser.ansi.enabled` is false, Spark SQL has two kinds of keywords: +* Non-reserved keywords: Keywords that have a special meaning only in particular contexts and can be used as identifiers in other contexts. +* Strict-non-reserved keywords: A strict version of non-reserved keywords, which can not be used as table alias. Review comment: Great and this new group is easy-to-understand. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords
maropu commented on a change in pull request #24093: [SPARK-27161][SQL] improve the document of SQL keywords URL: https://github.com/apache/spark/pull/24093#discussion_r265830090 ## File path: docs/sql-keywords.md ## @@ -1,16 +1,20 @@ --- layout: global -title: SQL Reserved/Non-Reserved Keywords -displayTitle: SQL Reserved/Non-Reserved Keywords +title: Spark SQL Keywords +displayTitle: Spark SQL Keywords Review comment: `spark.sql.parser.ansi.enabled` affects parsing behaviours, too, e.g., when true, it makes `interval` optional. In future, we could change the behaivour of overflow handling in execution for the more strict ANSI compliance. These behaivour changes affected by the ANSI option should be documented not in this document but in another document? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #24069: [SPARK-27136][SQL] Remove data source option check_files_exist
cloud-fan closed pull request #24069: [SPARK-27136][SQL] Remove data source option check_files_exist URL: https://github.com/apache/spark/pull/24069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473134300 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins removed a comment on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473134307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103514/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473134300 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered …
AmplabJenkins commented on issue #24072: [SPARK-27112] : Create a resource ordering between threads to resolve the deadlocks encountered … URL: https://github.com/apache/spark/pull/24072#issuecomment-473134307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/103514/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org