[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68889173 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, Literal, NonFoldableLiteral} +import org.apache.spark.sql.types.StringType +import org.apache.spark.unsafe.types.UTF8String + +/** + * Test suite for various xpath functions. + */ +class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def testBoolean[T](xml: String, path: String, expected: T): Unit = { +checkEvaluation( + XPathBoolean(Literal.create(xml, StringType), Literal.create(path, StringType)), + expected) + } + + test("xpath_boolean") { +testBoolean("b", "a/b", true) +testBoolean("b", "a/c", false) +testBoolean("b", "a/b = \"b\"", true) +testBoolean("b", "a/b = \"c\"", false) +testBoolean("10", "a/b < 10", false) +testBoolean("10", "a/b = 10", true) + +// null input +testBoolean(null, null, null) +testBoolean(null, "a", null) +testBoolean("10", null, null) + +// exception handling for invalid input +intercept[Exception] { + testBoolean("/a>", "a", null) +} + } + + test("xpath_boolean path cache invalidation") { +// This is a test to ensure the expression is not reusing the path for different strings +val xml = NonFoldableLiteral("b") +val path = NonFoldableLiteral("a/b") +val expr = XPathBoolean(xml, path) + +// Run evaluation once +assert(expr.eval(null) == true) + +// Change the input path and make sure we don't screw up caching +path.value = UTF8String.fromString("a/c") +assert(expr.eval(null) == false) --- End diff -- updated! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13930: [SPARK-16228][SQL] HiveSessionCatalog should return `dou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13930 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13930: [SPARK-16228][SQL] HiveSessionCatalog should return `dou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61446/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13930: [SPARK-16228][SQL] HiveSessionCatalog should return `dou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13930 **[Test build #61446 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61446/consoleFull)** for PR 13930 at commit [`b8df028`](https://github.com/apache/spark/commit/b8df0284aa7bd4328ff7f8e1ebdce55272e549d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r6729 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, Literal, NonFoldableLiteral} +import org.apache.spark.sql.types.StringType +import org.apache.spark.unsafe.types.UTF8String + +/** + * Test suite for various xpath functions. + */ +class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def testBoolean[T](xml: String, path: String, expected: T): Unit = { +checkEvaluation( + XPathBoolean(Literal.create(xml, StringType), Literal.create(path, StringType)), + expected) + } + + test("xpath_boolean") { +testBoolean("b", "a/b", true) +testBoolean("b", "a/c", false) +testBoolean("b", "a/b = \"b\"", true) +testBoolean("b", "a/b = \"c\"", false) +testBoolean("10", "a/b < 10", false) +testBoolean("10", "a/b = 10", true) + +// null input +testBoolean(null, null, null) +testBoolean(null, "a", null) +testBoolean("10", null, null) + +// exception handling for invalid input +intercept[Exception] { + testBoolean("/a>", "a", null) +} + } + + test("xpath_boolean path cache invalidation") { +// This is a test to ensure the expression is not reusing the path for different strings +val xml = NonFoldableLiteral("b") +val path = NonFoldableLiteral("a/b") +val expr = XPathBoolean(xml, path) + +// Run evaluation once +assert(expr.eval(null) == true) + +// Change the input path and make sure we don't screw up caching +path.value = UTF8String.fromString("a/c") +assert(expr.eval(null) == false) --- End diff -- To test the changing of input, I think it's more clear to use `BoundReference`: ``` val expr = XPathBoolean(Literal("b"), 'path.string.at(1)) checkEvaluation(expr, true, create_row("a/b")) checkEvaluation(expr, false, create_row("a/c")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13860: [SPARK-16157] [SQL] Add New Methods for comments in Stru...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13860 **[Test build #61451 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61451/consoleFull)** for PR 13860 at commit [`b6ded4d`](https://github.com/apache/spark/commit/b6ded4d5381378781a41645950c348095cbf1292). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13886 **[Test build #61450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61450/consoleFull)** for PR 13886 at commit [`e4cc35d`](https://github.com/apache/spark/commit/e4cc35d88e87941d8b9d2e3a2f754a43d29d4c70). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r6152 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -52,6 +52,23 @@ class DataTypeSuite extends SparkFunSuite { assert(StructField("b", LongType, false) === struct("b")) } + test("construct with add from StructField with comments") { +// Test creation from StructField using four different ways +val struct = (new StructType) + .add("a", "int", true, "test1") + .add("c", StringType, true, "test3") --- End diff -- Sure. Will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13966#discussion_r6095 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -162,6 +162,42 @@ case class ConcatWs(children: Seq[Expression]) } } +@ExpressionDescription( + usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string", + extended = "> SELECT _FUNC_(1, 'scala', 'java') FROM src LIMIT 1;\n" + "'scala'") +case class Elt(children: Seq[Expression]) + extends Expression with ExpectsInputTypes with CodegenFallback { + + require(children.nonEmpty, "elt requires at least one argument.") --- End diff -- but then we would not be able to reuse ExpectsInputTypes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r6077 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -52,6 +52,23 @@ class DataTypeSuite extends SparkFunSuite { assert(StructField("b", LongType, false) === struct("b")) } + test("construct with add from StructField with comments") { +// Test creation from StructField using four different ways +val struct = (new StructType) + .add("a", "int", true, "test1") + .add("c", StringType, true, "test3") --- End diff -- a very minor comment: can we name these fields `a, b, c, d` instead of `a, c, d, e`? The missing `b` is kind of annoying to me... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r6030 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -301,6 +302,7 @@ object FunctionRegistry { expression[UnBase64]("unbase64"), expression[Unhex]("unhex"), expression[Upper]("upper"), +expression[XPathBoolean]("xpath_boolean"), --- End diff -- hm let's not register the xml ones there yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r6029 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -363,6 +363,31 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be spark.range(10).write.orc(dir) } + test("column nullability and comment - write and then read") { --- End diff -- This is from the original PR: https://github.com/apache/spark/pull/13764 In that PR, we only did the test case coverage for SQL interface. We removed the test cases for non-SQL interfaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r6009 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, Literal, NonFoldableLiteral} +import org.apache.spark.sql.types.StringType +import org.apache.spark.unsafe.types.UTF8String + +/** + * Test suite for various xpath functions. + */ +class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def testBoolean[T](xml: String, path: String, expected: T): Unit = { +checkEvaluation( + XPathBoolean(Literal.create(xml, StringType), Literal.create(path, StringType)), + expected) + } + + test("xpath_boolean") { +testBoolean("b", "a/b", true) +testBoolean("b", "a/c", false) +testBoolean("b", "a/b = \"b\"", true) +testBoolean("b", "a/b = \"c\"", false) +testBoolean("10", "a/b < 10", false) +testBoolean("10", "a/b = 10", true) + +// null input +testBoolean(null, null, null) +testBoolean(null, "a", null) +testBoolean("10", null, null) + +// exception handling for invalid input +intercept[Exception] { + testBoolean("/a>", "a", null) +} + } + + test("xpath_boolean path cache invalidation") { --- End diff -- The underlying implementation can still exploit that (e.g. Hive's implementation does it), so I'm thinking it might be useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r68887957 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -52,6 +52,23 @@ class DataTypeSuite extends SparkFunSuite { assert(StructField("b", LongType, false) === struct("b")) } + test("construct with add from StructField with comments") { --- End diff -- ok let's keep it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/13886 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r68887844 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -52,6 +52,23 @@ class DataTypeSuite extends SparkFunSuite { assert(StructField("b", LongType, false) === struct("b")) } + test("construct with add from StructField with comments") { --- End diff -- Since this PR also add two function calls `add` in `StructType`. This test case also covers the interface changes. I am fine if you think that test is useless. Let me know if you are OK to keep them. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13906 For 3, I respect your opinion. I just make another commit for 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68887731 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -301,6 +302,7 @@ object FunctionRegistry { expression[UnBase64]("unbase64"), expression[Unhex]("unhex"), expression[Upper]("upper"), +expression[XPathBoolean]("xpath_boolean"), --- End diff -- should we also register this function in `org.apache.spark.sql.functions`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68887674 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, Literal, NonFoldableLiteral} +import org.apache.spark.sql.types.StringType +import org.apache.spark.unsafe.types.UTF8String + +/** + * Test suite for various xpath functions. + */ +class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { + + private def testBoolean[T](xml: String, path: String, expected: T): Unit = { +checkEvaluation( + XPathBoolean(Literal.create(xml, StringType), Literal.create(path, StringType)), + expected) + } + + test("xpath_boolean") { +testBoolean("b", "a/b", true) +testBoolean("b", "a/c", false) +testBoolean("b", "a/b = \"b\"", true) +testBoolean("b", "a/b = \"c\"", false) +testBoolean("10", "a/b < 10", false) +testBoolean("10", "a/b = 10", true) + +// null input +testBoolean(null, null, null) +testBoolean(null, "a", null) +testBoolean("10", null, null) + +// exception handling for invalid input +intercept[Exception] { + testBoolean("/a>", "a", null) +} + } + + test("xpath_boolean path cache invalidation") { --- End diff -- do we still need to test it? there is no cache anymore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13919: [SPARK-16222] [SQL] JDBC Sources - Handling illegal inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13919 **[Test build #61448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61448/consoleFull)** for PR 13919 at commit [`b999b8a`](https://github.com/apache/spark/commit/b999b8a1474fc9d4f8d3a4c94694fbc40572111a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13906 **[Test build #61449 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61449/consoleFull)** for PR 13906 at commit [`4d937dc`](https://github.com/apache/spark/commit/4d937dc83da661b24d2af1dd513687f4a63b29b0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13966#discussion_r68887526 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -162,6 +162,42 @@ case class ConcatWs(children: Seq[Expression]) } } +@ExpressionDescription( + usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string", + extended = "> SELECT _FUNC_(1, 'scala', 'java') FROM src LIMIT 1;\n" + "'scala'") +case class Elt(children: Seq[Expression]) + extends Expression with ExpectsInputTypes with CodegenFallback { + + require(children.nonEmpty, "elt requires at least one argument.") --- End diff -- we should use the expression type check framework to this, see `Coalesce` as an example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13893 It's a good point, looks like we can also improve the `PushDownPredicate` rule according to this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13912 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13912 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61443/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13912 **[Test build #61443 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61443/consoleFull)** for PR 13912 at commit [`0e1901e`](https://github.com/apache/spark/commit/0e1901e6aeb47edc657ad11a0bea38d5a0f9c7f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class JavaStructuredNetworkWordCount ` * `class OptionUtils(object):` * `class DataFrameReader(OptionUtils):` * `class DataFrameWriter(OptionUtils):` * `class DataStreamReader(OptionUtils):` * `case class ShowFunctionsCommand(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68887194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -301,6 +302,7 @@ object FunctionRegistry { expression[UnBase64]("unbase64"), expression[Unhex]("unhex"), expression[Upper]("upper"), +expression[XPathBoolean]("xpath_boolean"), --- End diff -- done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13912 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13912 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61444/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r68887161 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala --- @@ -363,6 +363,31 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be spark.range(10).write.orc(dir) } + test("column nullability and comment - write and then read") { --- End diff -- hmmm, why do we add this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68887178 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. + @transient private var lastPathUtf8: UTF8String = null --- End diff -- Great idea. Done! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13912: [SPARK-16216][SQL] CSV data source supports custom date ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13912 **[Test build #61444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61444/consoleFull)** for PR 13912 at commit [`d03e7a0`](https://github.com/apache/spark/commit/d03e7a0806691b2ad3290cbf7e16a771faf55af1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13966 **[Test build #3139 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3139/consoleFull)** for PR 13966 at commit [`7cea3b1`](https://github.com/apache/spark/commit/7cea3b1b2a1d34c14515242477903db5b4e6fb84). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13860: [SPARK-16157] [SQL] Add New Methods for comments ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13860#discussion_r68886986 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala --- @@ -52,6 +52,23 @@ class DataTypeSuite extends SparkFunSuite { assert(StructField("b", LongType, false) === struct("b")) } + test("construct with add from StructField with comments") { --- End diff -- Although more test is better, I think for this case we only need to test `withComment` and `getComment`, it's obvious that other `StructField` creation is calling `withComment`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13964 **[Test build #3140 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3140/consoleFull)** for PR 13964 at commit [`bdd49aa`](https://github.com/apache/spark/commit/bdd49aad79c6109046195f1f2713283a947d61f3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13966 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13966: [SPARK-16276][SQL] Implement elt SQL function
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/13966 cc @dongjoon-hyun @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/13966 [SPARK-16276][SQL] Implement elt SQL function ## What changes were proposed in this pull request? This patch implements the elt function, as it is implemented in Hive. ## How was this patch tested? Added expression unit test in StringExpressionsSuite and end-to-end test in StringFunctionsSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-16276 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13966.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13966 commit 7cea3b1b2a1d34c14515242477903db5b4e6fb84 Author: petermaxleeDate: 2016-06-29T05:19:53Z [SPARK-16276][SQL] Implement elt SQL function --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/13893 Predicates should not be reordered if a condition contains non-deterministic parts, for example, 'rand() < 0.1 AND a=1' should not be reordered to 'a=1 AND rand() < 0.1' as the number of calls rand() will change and thus output different rows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...
Github user gurvindersingh commented on a diff in the pull request: https://github.com/apache/spark/pull/13950#discussion_r68886506 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -127,7 +128,14 @@ private[deploy] class Master( logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}") webUi = new MasterWebUI(this, webUiPort) webUi.bind() -masterWebUiUrl = "http://; + masterPublicAddress + ":" + webUi.boundPort +if (reverseProxy) { + masterWebUiUrl = conf.get("spark.ui.reverseProxyUrl", null) + if (masterWebUiUrl == null) { + throw new SparkException("spark.ui.reverseProxyUrl must be provided") --- End diff -- Updated the code now to remove the exception and use public address as default and if reverseproxyURL is given then override it. Should solve the issue you seeing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13950: [SPARK-15487] [Web UI] Spark Master UI to reverse...
Github user gurvindersingh commented on a diff in the pull request: https://github.com/apache/spark/pull/13950#discussion_r68886448 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -127,7 +128,14 @@ private[deploy] class Master( logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}") webUi = new MasterWebUI(this, webUiPort) webUi.bind() -masterWebUiUrl = "http://; + masterPublicAddress + ":" + webUi.boundPort +if (reverseProxy) { + masterWebUiUrl = conf.get("spark.ui.reverseProxyUrl", null) --- End diff -- It is used in case when you are running spark master itself behind a proxy e.g. Oauth2 to provide authentication/authorization. Its to make sure "Back to Master" link works when you are on workers UI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68886429 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- Sorry I read the code wrong. You are testing cache invalidation and this must be mutable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13948: [SPARK-16259] [PYSPARK] cleanup options in DataFrame rea...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13948 Feel free to do it - but please take another careful look before you cherry pick. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13933: [SPARK-16236] [SQL] Add Path Option back to Load API in ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13933 @gatorsmile parquet, json or other file formats support both `path` and `paths` options. So that's not a problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68886317 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. + @transient private var lastPathUtf8: UTF8String = null --- End diff -- then it's more obvious that we are tring to optimize when the path is literal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68886279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. + @transient private var lastPathUtf8: UTF8String = null --- End diff -- how about ``` @transient lazy val pathLiteral: String = path match { case Literal(str: String) => str case _ => null } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13906 By the way, for complexity, it's 23 line optimizer without blank/comments. In fact, it's less than `NullPropagation`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13948: [SPARK-16259] [PYSPARK] cleanup options in DataFrame rea...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13948 @rxin why don't we merge this one to 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13955: [SPARK-16266][SQL][STREAING] Moved DataStreamRead...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13955 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13906 It sounds promising. Maybe, Spark 2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13955: [SPARK-16266][SQL][STREAING] Moved DataStreamReader/Writ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/13955 LGTM. Merging to master and 2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13906 Sounds interesting. You mean `LocalNode` that computes all local node operators on `LocalRelation`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13893 no, the predicates order doesn't matter. Our optimizer can reorder the predicates to run them more efficient. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13704#discussion_r68885567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -837,8 +837,36 @@ case class Cast(child: Expression, dataType: DataType) extends UnaryExpression w val j = ctx.freshName("j") val values = ctx.freshName("values") +val isPrimitiveFrom = ctx.isPrimitiveType(fromType) --- End diff -- we need to make sure the input array's element nullability is false, but primitive type array doesn't guarantee it. e.g. we can have `ArrayType(ByteType, true)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13906 I'm not sure if this optimization is useful 1. empty `LocalRelation` is a corner case and seems not worth to optimize. 2. the optimization rule in this PR is kind of complex. 3. if we have better handling for `LocalRelation` in the futuren(like the `LocalNode`), this rule will become useless. cc @marmbrus @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13893: [SPARK-14172][SQL] Hive table partition predicate not pa...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/13893 @cloud-fan I pushed a commit to apply predicate pushdown on deterministic parts placed before any non-deterministic predicates, should it be safe to do this optimization? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -301,6 +302,7 @@ object FunctionRegistry { expression[UnBase64]("unbase64"), expression[Unhex]("unhex"), expression[Upper]("upper"), +expression[XPathBoolean]("xpath_boolean"), --- End diff -- What about excluding at HiveSessionCatalog, too? https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala#L230 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13965: [SPARK-16236] [SQL] [FOLLOWUP] Add Path Option back to L...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13965 **[Test build #61447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61447/consoleFull)** for PR 13965 at commit [`9cfd673`](https://github.com/apache/spark/commit/9cfd67350670ef668781cf498597612713cba628). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13965: [SPARK-16236] [SQL] [FOLLOWUP] Add Path Option ba...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13965 [SPARK-16236] [SQL] [FOLLOWUP] Add Path Option back to Load API in DataFrameReader What changes were proposed in this pull request? When users only specify one and only one path, we use `options` to record the path value in `DataFrameReader`. For example, users can see the `path` option after the following API call, ```SQL spark.read.parquet("/test") ``` In Python API, we have the same issue. Thanks for identifying this issue, @zsxwing ! Below is an example: ```Python spark.read.format('json').load('python/test_support/sql/people.json') ``` How was this patch tested? Existing test cases cover the changes by this PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark optionPaths Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13965.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13965 commit 9cfd67350670ef668781cf498597612713cba628 Author: gatorsmileDate: 2016-06-29T04:38:57Z fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884564 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +/** + * End-to-end tests for XML expressions. + */ +class XmlFunctionsSuite extends QueryTest with SharedSQLContext { + + test("xpath_boolean") { +val input = "b" +val path = "a/b" --- End diff -- I've updated this. PTAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884371 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +/** + * End-to-end tests for XML expressions. + */ +class XmlFunctionsSuite extends QueryTest with SharedSQLContext { + + test("xpath_boolean") { +val input = "b" +val path = "a/b" --- End diff -- will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884363 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- What do you mean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884303 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +/** + * End-to-end tests for XML expressions. + */ +class XmlFunctionsSuite extends QueryTest with SharedSQLContext { + + test("xpath_boolean") { +val input = "b" +val path = "a/b" --- End diff -- for end-to-end test, I think it's better to use attribute as input, not literal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884274 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/XmlFunctionsSuite.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import org.apache.spark.sql.test.SharedSQLContext + +/** + * End-to-end tests for XML expressions. + */ +class XmlFunctionsSuite extends QueryTest with SharedSQLContext { + + test("xpath_boolean") { +val input = "b" +val path = "a/b" --- End diff -- how about ``` val df = Seq("b" -> "a/b").toDF("xml", "path") checkAnswer(df.select(expr("xpath_boolean(xml, path)")), Row(true)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11863 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11863 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61436/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13961: [SPARK-16271][SQL] Implement Hive's UDFXPathUtil
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13961 **[Test build #3137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3137/consoleFull)** for PR 13961 at commit [`90bf2f1`](https://github.com/apache/spark/commit/90bf2f1ac93c6f83a028edfbc79cf956777f205a). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ReusableStringReaderSuite extends SparkFunSuite ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884185 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathExpressionSuite.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, Literal, NonFoldableLiteral} +import org.apache.spark.sql.types.StringType +import org.apache.spark.unsafe.types.UTF8String + +/** + * Test suite for various xpath functions. + */ +class XPathExpressionSuite extends SparkFunSuite with ExpressionEvalHelper { --- End diff -- I wrote this one based on what's already in the code base for other expressions. Let me know if I should do anything else. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11863 **[Test build #61436 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61436/consoleFull)** for PR 11863 at commit [`b1eec57`](https://github.com/apache/spark/commit/b1eec577d64f82784afaf626ad5a325bc7a1d555). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. --- End diff -- I think a literal path is a very common case, but a literal xml is fairly unlikely. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13778 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61441/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884143 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- It's ok if we do save a lot of effort because of it, but seems we don't? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13778 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13778 **[Test build #61441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61441/consoleFull)** for PR 13778 at commit [`65a33b0`](https://github.com/apache/spark/commit/65a33b05eaeef8454f8746313075163e21f73c8f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884058 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- Oh, I agree. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13963 **[Test build #61445 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61445/consoleFull)** for PR 13963 at commit [`a314e56`](https://github.com/apache/spark/commit/a314e56457d8f6949b7d7463882e98127c24b680). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13963 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68884017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. --- End diff -- shall we also optimize when the xml string is literal but the path string is not? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13963 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61445/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68883944 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xml/XPathBoolean.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.xml + +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback +import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + + +@ExpressionDescription( + usage = "_FUNC_(xml, xpath) - Evaluates a boolean xpath expression.", + extended = "> SELECT _FUNC_('1','a/b');\ntrue") +case class XPathBoolean(xml: Expression, path: Expression) + extends BinaryExpression with ExpectsInputTypes with CodegenFallback { + + @transient private lazy val xpathUtil = new UDFXPathUtil + + // We use these to avoid converting the path from UTF8String to String if it is a constant. --- End diff -- shall we also optimize for literal xml string? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68883895 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- I thought this should be OK since the literal is non foldable and this class is only in the testing package. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13778 @cloud-fan @vlad17 Is this change good for your now? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13964#discussion_r68883797 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/NonFoldableLiteral.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ * A literal value that is not foldable. Used in expression codegen testing to test code path * that behave differently based on foldable values. */ -case class NonFoldableLiteral(value: Any, dataType: DataType) extends LeafExpression { +case class NonFoldableLiteral(var value: Any, dataType: DataType) extends LeafExpression { --- End diff -- Ur, it seems to be for a testing purpose. Is it okay? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13930: [SPARK-16228][SQL] HiveSessionCatalog should return `dou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13930 **[Test build #61446 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61446/consoleFull)** for PR 13930 at commit [`b8df028`](https://github.com/apache/spark/commit/b8df0284aa7bd4328ff7f8e1ebdce55272e549d2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13930: [SPARK-16228][SQL] HiveSessionCatalog should return `dou...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13930 Rebased to the master for https://github.com/apache/spark/pull/13939 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13778 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61435/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13778 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13778: [SPARK-16062][SPARK-15989][SQL] Fix two bugs of Python-o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13778 **[Test build #61435 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61435/consoleFull)** for PR 13778 at commit [`1583fe3`](https://github.com/apache/spark/commit/1583fe3380ad3eef8f75d7709b9769e7d4e11477). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public final class JavaStructuredNetworkWordCount ` * `final class Binarizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `final class Bucketizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `final class ChiSqSelector @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)` * `class CountVectorizer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class CountVectorizerModel(` * `class DCT @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class ElementwiseProduct @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class HashingTF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `final class IDF @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class Interaction @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer` * `class MaxAbsScaler @Since(\"2.0.0\") (@Since(\"2.0.0\") override val uid: String)` * `class MinMaxScaler @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class NGram @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class Normalizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class OneHotEncoder @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String) extends Transformer` * `class PCA @Since(\"1.5.0\") (` * `class PolynomialExpansion @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `final class QuantileDiscretizer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String)` * `class RFormula @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class SQLTransformer @Since(\"1.6.0\") (@Since(\"1.6.0\") override val uid: String) extends Transformer` * `class StandardScaler @Since(\"1.4.0\") (` * `class StopWordsRemover @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `class StringIndexer @Since(\"1.4.0\") (` * `class Tokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class RegexTokenizer @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class VectorAssembler @Since(\"1.4.0\") (@Since(\"1.4.0\") override val uid: String)` * `class VectorIndexer @Since(\"1.4.0\") (` * `final class VectorSlicer @Since(\"1.5.0\") (@Since(\"1.5.0\") override val uid: String)` * `final class Word2Vec @Since(\"1.4.0\") (` * `public class JavaPackage ` * `class OptionUtils(object):` * `class DataFrameReader(OptionUtils):` * `class DataFrameWriter(OptionUtils):` * `class DataStreamReader(OptionUtils):` * `case class ShowFunctionsCommand(` * `case class StreamingRelationExec(sourceName: String, output: Seq[Attribute]) extends LeafExecNode ` * `class TextSocketSource(host: String, port: Int, sqlContext: SQLContext)` * `class TextSocketSourceProvider extends StreamSourceProvider with DataSourceRegister with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13964 **[Test build #3138 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3138/consoleFull)** for PR 13964 at commit [`34cda07`](https://github.com/apache/spark/commit/34cda070f62677b6920174cc976107456172aeab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13964 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/13964 cc @srowen @cloud-fan @vanzin @squito If this one works, I can implement the other ones too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13964: [SPARK-16274][SQL] Implement xpath_boolean
Github user petermaxlee commented on the issue: https://github.com/apache/spark/pull/13964 cc @srowen @cloud-fan @vanzin @squito --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13964: [SPARK-16274][SQL] Implement xpath_boolean
GitHub user petermaxlee opened a pull request: https://github.com/apache/spark/pull/13964 [SPARK-16274][SQL] Implement xpath_boolean ## What changes were proposed in this pull request? This patch implements xpath_boolean expression for Spark SQL, a xpath function that returns true or false. The implementation is modelled after Hive's xpath_boolean, except that how the expression handles null inputs. Hive throws a NullPointerException at runtime if either of the input is null. This implementation returns null if either of the input is null. ## How was this patch tested? Added unit tests for expressions (based on Hive's tests and some I added myself) and an end-to-end test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/petermaxlee/spark SPARK-16274 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13964.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13964 commit 34cda070f62677b6920174cc976107456172aeab Author: petermaxleeDate: 2016-06-29T04:09:57Z [SPARK-16274][SQL] Implement xpath_boolean --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13961: [SPARK-16271][SQL] Implement Hive's UDFXPathUtil
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13961 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13906 Thank you, @cloud-fan . It seems to be a good idea to handle operators on LocalRelations. But, if possible, may I dig that on another PR? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13961: [SPARK-16271][SQL] Implement Hive's UDFXPathUtil
Github user rxin commented on the issue: https://github.com/apache/spark/pull/13961 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13494: [SPARK-15752] [SQL] support optimization for meta...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13494#discussion_r68882374 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/MetadataOnlyOptimizer.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.{AnalysisException, SparkSession} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, SessionCatalog} +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.aggregate._ +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} + +/** + * When scanning only partition columns, get results based on metadata without scanning files. + * It is used for distinct, distinct aggregations or distinct-like aggregations(example: Max/Min). + * First of all, scanning only partition columns are required, then the rule does the following + * things here: + * 1. aggregate expression is partition columns, + * e.g. SELECT col FROM tbl GROUP BY col or SELECT col FROM tbl GROUP BY cube(col). + * 2. aggregate function on partition columns with DISTINCT, + * e.g. SELECT count(DISTINCT col) FROM tbl GROUP BY col. + * 3. aggregate function on partition columns which have same result with DISTINCT keyword. + * e.g. SELECT Max(col2) FROM tbl GROUP BY col1. + */ +case class MetadataOnlyOptimizer( +sparkSession: SparkSession, +catalog: SessionCatalog) extends Rule[LogicalPlan] { + + private def canSupportMetadataOnly(a: Aggregate): Boolean = { +val aggregateExpressions = a.aggregateExpressions.flatMap { expr => + expr.collect { +case agg: AggregateExpression => agg + } +}.distinct +if (aggregateExpressions.isEmpty) { + // Support for aggregate that has no aggregateFunction when expressions are partition columns + // example: select partitionCol from table group by partitionCol. + // Moreover, multiple-distinct has been rewritted into it by RewriteDistinctAggregates. + true +} else { + aggregateExpressions.forall { agg => +if (agg.isDistinct) { + true +} else { + // If function can be evaluated on just the distinct values of a column, it can be used + // by metadata-only optimizer. + agg.aggregateFunction match { +case max: Max => true +case min: Min => true +case hyperLog: HyperLogLogPlusPlus => true +case _ => false + } +} + } +} + } + + private def convertLogicalToMetadataOnly( + project: LogicalPlan, + filter: Option[Expression], + logical: LogicalRelation, + files: HadoopFsRelation): LogicalPlan = { +val attributeMap = logical.output.map(attr => (attr.name, attr)).toMap +val partitionColumns = files.partitionSchema.map { field => + attributeMap.getOrElse(field.name, throw new AnalysisException( +s"Unable to resolve ${field.name} given [${logical.output.map(_.name).mkString(", ")}]")) +} +val projectSet = filter.map(project.references ++ _.references).getOrElse(project.references) +if (projectSet.subsetOf(AttributeSet(partitionColumns))) { + val selectedPartitions = files.location.listFiles(filter.map(Seq(_)).getOrElse(Seq.empty)) + val valuesRdd = sparkSession.sparkContext.parallelize(selectedPartitions.map(_.values), 1) + val valuesPlan = LogicalRDD(partitionColumns, valuesRdd)(sparkSession) + valuesPlan +} else { + logical +} + } + + private def convertCatalogToMetadataOnly( + project:
[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13963 **[Test build #61445 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61445/consoleFull)** for PR 13963 at commit [`a314e56`](https://github.com/apache/spark/commit/a314e56457d8f6949b7d7463882e98127c24b680). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13963: [TRIVIAL][PYSPARK] Clean up orc compression optio...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13963 [TRIVIAL][PYSPARK] Clean up orc compression option as well ## What changes were proposed in this pull request? This PR corrects ORC compression option for PySpark as well. I think this was missed in https://github.com/apache/spark/pull/13948. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark minor-orc-compress Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13963.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13963 commit a314e56457d8f6949b7d7463882e98127c24b680 Author: hyukjinkwonDate: 2016-06-29T03:54:30Z Clean up orc compression option as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13963: [TRIVIAL][PYSPARK] Clean up orc compression option as we...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/13963 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org