[GitHub] spark pull request #18932: Add reduceVertices method to GraphOps
GitHub user gilcu2 opened a pull request: https://github.com/apache/spark/pull/18932 Add reduceVertices method to GraphOps ## What changes were proposed in this pull request? Add reduceVertices method to GraphOps class in graphX ## How was this patch tested? Unitest run locally You can merge this pull request into a Git repository by running: $ git pull https://github.com/gilcu2/spark reduceVertices_graphx_operation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18932.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18932 commit 04247ea94bad429ffd2b0e0ecdce5e631ae1ae63 Author: gilcu2Date: 2017-08-13T13:57:08Z Add reduceVertices method to GraphOps --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18920 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18932: Add reduceVertices method to GraphOps
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18932 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80594/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80594/testReport)** for PR 18931 at commit [`6d600d5`](https://github.com/apache/spark/commit/6d600d5eb4a275eb6bc72ccf353d2d1ded03635f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18648 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80591/testReport)** for PR 18920 at commit [`bb29b8f`](https://github.com/apache/spark/commit/bb29b8f7e8e8be436ea028acfbe16ae6f4977169). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80588/testReport)** for PR 18914 at commit [`0de047d`](https://github.com/apache/spark/commit/0de047d87047d7eed89a1f0994b2e4c118a50d12). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80590/testReport)** for PR 18931 at commit [`0bb8c0e`](https://github.com/apache/spark/commit/0bb8c0ec70243e75f5593ca83788e830e9e4bc25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80590/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18931 Ran with the same benchmark in SPARK-21603. Before this patch: After this patch: Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz max function length of wholestagecodegen: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative codegen = F548 / 740 1.2 836.6 1.0X codegen = T372 / 433 1.8 567.5 1.5X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18931: [SPARK-21717][SQL][WIP] Decouple consume function...
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/18931 [SPARK-21717][SQL][WIP] Decouple consume functions of physical operators in whole-stage codegen ## What changes were proposed in this pull request? It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT. We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore. This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21717 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18931 commit 05274e7ad4c74e6241b5a05a9365c475f0c3c0a3 Author: Liang-Chi HsiehDate: 2017-08-13T06:06:10Z Decouple consume functions of physical operators in whole-stage codegen. commit e0e7a6ecc957b4659db9b0367ef32d09537b32fd Author: Liang-Chi Hsieh Date: 2017-08-13T07:43:17Z shouldStop is called outside consume(). commit 413707dd0c31a15514f00aea9addca77fe1dd2ce Author: Liang-Chi Hsieh Date: 2017-08-13T10:52:28Z Fix the condition and the case of using continue in consume. commit 0bb8c0ec70243e75f5593ca83788e830e9e4bc25 Author: Liang-Chi Hsieh Date: 2017-08-13T10:57:45Z More comment. commit 6d600d5eb4a275eb6bc72ccf353d2d1ded03635f Author: Liang-Chi Hsieh Date: 2017-08-13T14:17:01Z Fix aggregation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18931: [SPARK-21717][SQL][WIP] Decouple consume function...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/18931 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80592/testReport)** for PR 18914 at commit [`4212553`](https://github.com/apache/spark/commit/4212553236608f35fbab3fa1186b31f77cff92af). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18914 @heary-cao I already understand what you want to do. It makes sense to me to remove `sql("UNCACHE TABLE testData")`. My question is about `JoinSuite.afterEach()`. Do you want to call `spark.sharedState.cacheManager.clearCache()` twice at `JoinSuite.afterEach()` and `SharedSQLContext.afterEach()`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80588/testReport)** for PR 18914 at commit [`0de047d`](https://github.com/apache/spark/commit/0de047d87047d7eed89a1f0994b2e4c118a50d12). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80593/testReport)** for PR 18914 at commit [`7dd8cdb`](https://github.com/apache/spark/commit/7dd8cdbf7caac507cbd703193350503309b5f159). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80594/testReport)** for PR 18931 at commit [`6d600d5`](https://github.com/apache/spark/commit/6d600d5eb4a275eb6bc72ccf353d2d1ded03635f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18916: [SPARK-21705][CORE][DOC]Add spark.internal.config parame...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18916 I am -0 by the same reason above ^. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #80596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80596/testReport)** for PR 18875 at commit [`c64d9c4`](https://github.com/apache/spark/commit/c64d9c49c5f42b7a72b1570f00daa34a2843be1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132847782 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- If PySpark always needs to check the types, are we doing the same things in all the other function calls? In addition, why not directly checking ```Python if isinstance(length, (int, long)): ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18930: [SPARK-21677][SQL] json_tuple throws NullPointExc...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18930#discussion_r132846456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -361,10 +361,18 @@ case class JsonTuple(children: Seq[Expression]) // the fields to query are the remaining children @transient private lazy val fieldExpressions: Seq[Expression] = children.tail + // a field name given with constant null will be replaced with this pseudo field name + private val nullFieldName = "__NullFieldName" --- End diff -- @jmchung, could we maybe compute this foldable related optimization ahead - https://github.com/jmchung/spark/blob/ffa575a6731fef3e0731b73e0f7311cb024e831b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L425-L439 and remove this fake field name? I think we can make a function for the above codes first and then use it for computation for each row. Did I understand correctly? I tried a rough version I thought - https://github.com/jmchung/spark/compare/SPARK-21677...HyukjinKwon:tmp-18930?expand=1, @viirya what do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18920: [SPARK-19471][SQL]AggregationIterator does not in...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18920#discussion_r132847406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala --- @@ -449,6 +449,28 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSQLContext { ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_)) } + private def assertNoExceptions(c: Column): Unit = { +for ((wholeStage, useObjectHashAgg) <- Seq((true, false), (false, false), (false, true))) { + withSQLConf( +(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key, wholeStage.toString), +(SQLConf.USE_OBJECT_HASH_AGG.key, useObjectHashAgg.toString)) { +val df = Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4)).toDF("x", "y") +// HashAggregate --- End diff -- We need to check/compare the plans to ensure they are HashAggregate, ObjectHashAggregate and SortAggregate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132837521 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -30,6 +30,15 @@ import org.apache.spark.TestUtils.{assertNotSpilled, assertSpilled} class JoinSuite extends QueryTest with SharedSQLContext { import testImplicits._ + override def afterEach(): Unit = { +try { + // Clear the cache table for test cases + spark.sharedState.cacheManager.clearCache() --- End diff -- It was a simple reason. ``` test("broadcasted hash outer join operator selection") { spark.sharedState.cacheManager.clearCache() sql("CACHE TABLE testData") sql("CACHE TABLE testData2") Seq( ("SELECT * FROM testData LEFT JOIN testData2 ON key = a", classOf[BroadcastHashJoinExec]), ("SELECT * FROM testData RIGHT JOIN testData2 ON key = a where key = 2", classOf[BroadcastHashJoinExec]), ("SELECT * FROM testData right join testData2 ON key = a and key = 2", classOf[BroadcastHashJoinExec]) ).foreach(assertJoin) sql("UNCACHE TABLE testData") } ``` the cache table and uncache table do not match. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18914 @heary-cao Why do we need to call `spark.sharedState.cacheManager.clearCache()` in `JoinSuite` while we have already call `clearCache()` in `SharedSQLContext`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #80585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80585/testReport)** for PR 18875 at commit [`3045771`](https://github.com/apache/spark/commit/30457716075a4b061df9909b26bc427fb35cf29e). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80595/testReport)** for PR 18931 at commit [`502139a`](https://github.com/apache/spark/commit/502139aca30db03d2ef52dc9e140b83668467122). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18932: Add reduceVertices method to GraphOps
Github user gilcu2 closed the pull request at: https://github.com/apache/spark/pull/18932 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18931 Ran with the same benchmark in #18810. Before this patch: Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz max function length of wholestagecodegen: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative codegen = F572 / 733 1.1 873.5 1.0X codegen = T 2022 / 2039 0.3 3086.0 0.3X After this patch: Java HotSpot(TM) 64-Bit Server VM 1.8.0_102-b14 on Linux 4.9.36-moby Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz max function length of wholestagecodegen: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative codegen = F548 / 740 1.2 836.6 1.0X codegen = T372 / 433 1.8 567.5 1.5X --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18932: Add reduceVertices method to GraphOps
Github user gilcu2 commented on the issue: https://github.com/apache/spark/pull/18932 Need a Jira First --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80591/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80596/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18931: [SPARK-21717][SQL][WIP] Decouple consume function...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r132837981 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -149,14 +149,65 @@ trait CodegenSupport extends SparkPlan { ctx.freshNamePrefix = parent.variablePrefix val evaluated = evaluateRequiredVariables(output, inputVars, parent.usedInputs) + +// Under certain conditions, we can put the logic to consume the rows of this operator into --- End diff -- Added more comment to elaborate the idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18914 @gatorsmile , @viirya, @kiszk, With all the comments, I have modify it again. please review it again if you have time. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80591/testReport)** for PR 18920 at commit [`bb29b8f`](https://github.com/apache/spark/commit/bb29b8f7e8e8be436ea028acfbe16ae6f4977169). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18648: [SPARK-21428] Turn IsolatedClientLoader off while using ...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/18648 ping @jiangxb1987 @cloud-fan again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80593/testReport)** for PR 18914 at commit [`7dd8cdb`](https://github.com/apache/spark/commit/7dd8cdbf7caac507cbd703193350503309b5f159). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80593/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80589/testReport)** for PR 18931 at commit [`413707d`](https://github.com/apache/spark/commit/413707dd0c31a15514f00aea9addca77fe1dd2ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80589/testReport)** for PR 18931 at commit [`413707d`](https://github.com/apache/spark/commit/413707dd0c31a15514f00aea9addca77fe1dd2ce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80589/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80585/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18875 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80587/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80587/testReport)** for PR 18930 at commit [`ffa575a`](https://github.com/apache/spark/commit/ffa575a6731fef3e0731b73e0f7311cb024e831b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18914 @kiszk The original reason of this PR was a simple . test("broadcasted hash outer join operator selection") { spark.sharedState.cacheManager.clearCache() sql("CACHE TABLE testData") sql("CACHE TABLE testData2") Seq( ("SELECT * FROM testData LEFT JOIN testData2 ON key = a", classOf[BroadcastHashJoinExec]), ("SELECT * FROM testData RIGHT JOIN testData2 ON key = a where key = 2", classOf[BroadcastHashJoinExec]), ("SELECT * FROM testData right join testData2 ON key = a and key = 2", classOf[BroadcastHashJoinExec]) ).foreach(assertJoin) sql("UNCACHE TABLE testData") } the _cache table_ and _uncache table_ do not match. Later, we found that replace _uncache table_ with _spark.sharedState.cacheManager.clearCache()_ Finally, it's now modified it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18914 okay,Looks like we just need something simple to remove _uncache table_ . thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18875 **[Test build #80596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80596/testReport)** for PR 18875 at commit [`c64d9c4`](https://github.com/apache/spark/commit/c64d9c49c5f42b7a72b1570f00daa34a2843be1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 Once we pull out them into downstream project, should we still worry about call orders? They are evaluated before sort or shuffle added later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80595/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80595 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80595/testReport)** for PR 18931 at commit [`502139a`](https://github.com/apache/spark/commit/502139aca30db03d2ef52dc9e140b83668467122). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18866 **[Test build #80597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80597/testReport)** for PR 18866 at commit [`19f880b`](https://github.com/apache/spark/commit/19f880bcd1e519ac28e23df4a0bb6c796348ae30). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80592/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18317#discussion_r132846768 --- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java --- @@ -0,0 +1,288 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.io; + +import com.google.common.base.Preconditions; +import org.apache.spark.storage.StorageUtils; + +import javax.annotation.concurrent.GuardedBy; +import java.io.IOException; +import java.io.InputStream; +import java.nio.ByteBuffer; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Executors; +import java.util.concurrent.locks.Condition; +import java.util.concurrent.locks.ReentrantLock; + +/** + * {@link InputStream} implementation which asynchronously reads ahead from the underlying input + * stream when specified amount of data has been read from the current buffer. It does it by maintaining + * two buffer - active buffer and read ahead buffer. Active buffer contains data which should be returned + * when a read() call is issued. The read ahead buffer is used to asynchronously read from the underlying + * input stream and once the current active buffer is exhausted, we flip the two buffers so that we can + * start reading from the read ahead buffer without being blocked in disk I/O. + */ +public class ReadAheadInputStream extends InputStream { + + private ReentrantLock stateChangeLock = new ReentrantLock(); + + @GuardedBy("stateChangeLock") + private ByteBuffer activeBuffer; + + @GuardedBy("stateChangeLock") + private ByteBuffer readAheadBuffer; + + @GuardedBy("stateChangeLock") + private boolean endOfStream; + + @GuardedBy("stateChangeLock") + // true if async read is in progress + private boolean isReadInProgress; + + @GuardedBy("stateChangeLock") + // true if read is aborted due to an exception in reading from underlying input stream. + private boolean isReadAborted; + + @GuardedBy("stateChangeLock") + private Exception readException; + + // If the remaining data size in the current buffer is below this threshold, + // we issue an async read from the underlying input stream. + private final int readAheadThresholdInBytes; + + private final InputStream underlyingInputStream; + + private final ExecutorService executorService = Executors.newSingleThreadExecutor(); + + private final Condition asyncReadComplete = stateChangeLock.newCondition(); + + private final byte[] oneByte = new byte[1]; + + /** + * Creates a ReadAheadInputStream with the specified buffer size and read-ahead + * threshold + * + * @param inputStream The underlying input stream. + * @param bufferSizeInBytes The buffer size. + * @param readAheadThresholdInBytes If the active buffer has less data than the read-ahead + * threshold, an async read is triggered. + */ + public ReadAheadInputStream(InputStream inputStream, int bufferSizeInBytes, int readAheadThresholdInBytes) { +Preconditions.checkArgument(bufferSizeInBytes > 0, "bufferSizeInBytes should be greater than 0"); +Preconditions.checkArgument(readAheadThresholdInBytes > 0 && readAheadThresholdInBytes < bufferSizeInBytes, +"readAheadThresholdInBytes should be greater than 0 and less than bufferSizeInBytes" ); +activeBuffer = ByteBuffer.allocate(bufferSizeInBytes); +readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes); +this.readAheadThresholdInBytes = readAheadThresholdInBytes; +this.underlyingInputStream = inputStream; +activeBuffer.flip(); +readAheadBuffer.flip(); + } + + private boolean isEndOfStream() { +if(activeBuffer.remaining() == 0 && readAheadBuffer.remaining() == 0 && endOfStream) { + return true; +} +return false; + } + + + private void readAsync(final ByteBuffer byteBuffer) throws IOException { +stateChangeLock.lock(); +if (endOfStream || isReadInProgress) { + stateChangeLock.unlock();
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18866 **[Test build #80597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80597/testReport)** for PR 18866 at commit [`19f880b`](https://github.com/apache/spark/commit/19f880bcd1e519ac28e23df4a0bb6c796348ae30). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ClusteredDistribution(clustering: Seq[Expression], clustersOpt: Option[Int] = None,` * `case class HashPartitioning(expressions: Seq[Expression], numPartitions: Int,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18866 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80597/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18866 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user heary-cao commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132833028 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -30,6 +30,12 @@ import org.apache.spark.TestUtils.{assertNotSpilled, assertSpilled} class JoinSuite extends QueryTest with SharedSQLContext { import testImplicits._ + override def afterEach(): Unit = { +// Clear the cache table for test cases +spark.sharedState.cacheManager.clearCache() +super.afterEach() --- End diff -- thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80581/testReport)** for PR 18931 at commit [`05274e7`](https://github.com/apache/spark/commit/05274e7ad4c74e6241b5a05a9365c475f0c3c0a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18920 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80581/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18930 **[Test build #80579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80579/testReport)** for PR 18930 at commit [`ffa575a`](https://github.com/apache/spark/commit/ffa575a6731fef3e0731b73e0f7311cb024e831b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80580/testReport)** for PR 18914 at commit [`4a88e3f`](https://github.com/apache/spark/commit/4a88e3f71a45578dc9fd291e12c8cf616e629799). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18931 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80582/testReport)** for PR 18920 at commit [`b932d2f`](https://github.com/apache/spark/commit/b932d2f3a6741a8ef052cbd8087f4b0836c617d6). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80582/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18931: [SPARK-21717][SQL][WIP] Decouple consume functions of ph...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18931 **[Test build #80581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80581/testReport)** for PR 18931 at commit [`05274e7`](https://github.com/apache/spark/commit/05274e7ad4c74e6241b5a05a9365c475f0c3c0a3). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18920 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18930: [SPARK-21677][SQL] json_tuple throws NullPointException ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18930 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80579/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18914 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80580/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18916: [SPARK-21705][CORE][DOC]Add spark.internal.config parame...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18916 cc @HyukjinKwon @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80580/testReport)** for PR 18914 at commit [`4a88e3f`](https://github.com/apache/spark/commit/4a88e3f71a45578dc9fd291e12c8cf616e629799). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18931: [SPARK-21717][SQL][WIP] Decouple consume function...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18931 [SPARK-21717][SQL][WIP] Decouple consume functions of physical operators in whole-stage codegen ## What changes were proposed in this pull request? It has been observed in SPARK-21603 that whole-stage codegen suffers performance degradation, if the generated functions are too long to be optimized by JIT. We basically produce a single function to incorporate generated codes from all physical operators in whole-stage. Thus, it is possibly to grow the size of generated function over a threshold that we can't have JIT optimization for it anymore. This patch is trying to decouple the logic of consuming rows in physical operators to avoid a giant function processing rows. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21717 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18931.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18931 commit 05274e7ad4c74e6241b5a05a9365c475f0c3c0a3 Author: Liang-Chi HsiehDate: 2017-08-13T06:06:10Z Decouple consume functions of physical operators in whole-stage codegen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18700: [SPARK-21499] [SQL] Support creating persistent function...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18700 cc @cloud-fan @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80582/testReport)** for PR 18920 at commit [`b932d2f`](https://github.com/apache/spark/commit/b932d2f3a6741a8ef052cbd8087f4b0836c617d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18914: [MINOR][SQL][TEST]no uncache table in joinsuite t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18914#discussion_r132833474 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -30,6 +30,15 @@ import org.apache.spark.TestUtils.{assertNotSpilled, assertSpilled} class JoinSuite extends QueryTest with SharedSQLContext { import testImplicits._ + override def afterEach(): Unit = { +try { + // Clear the cache table for test cases + spark.sharedState.cacheManager.clearCache() --- End diff -- Does [this](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala#L91) work for `JoinSuite`, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18929: [MINOR][LAUNCHER]remove never used String in SparkLaunch...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18929 @srowen In fact, we can replace spark.executor.memory with _SparkLauncher.EXECUTOR_MEMORY_ and replace spark.executor.cores with _SparkLauncher.EXECUTOR_CORES_. But in the process of modification, it also interrupts the consistency of other codes. So I chose to remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18914: [MINOR][SQL][TEST]no uncache table in joinsuite test
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18914 **[Test build #80583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80583/testReport)** for PR 18914 at commit [`5b3f650`](https://github.com/apache/spark/commit/5b3f6505ed9e16c8b62ceb33535c078987bc9dd9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18899 OK maybe include some of this text in the scaladoc for it, to make it clear it is always intended to be called with the value of `numNonZeroes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18916: [SPARK-21705][CORE][DOC]Add spark.internal.config parame...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18916 In Spark SQL, the description of all the SQLConf can be displayed through the SQL commands. However, I am not sure how end users can get the descriptions of these Spark Conf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 @aokolnychyi Thanks for finding the non-convergent case! Let me see how to fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18926: [SPARK-21712] [PySpark] Clarify type error for Co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18926#discussion_r132856266 --- Diff: python/pyspark/sql/column.py --- @@ -406,7 +406,13 @@ def substr(self, startPos, length): [Row(col=u'Ali'), Row(col=u'Bob')] """ if type(startPos) != type(length): -raise TypeError("Can not mix the type") +raise TypeError( +"startPos and length must be the same type. " +"Got {startPos_t} and {length_t}, respectively." --- End diff -- For the latter, It looks we should call either `substr` with column,column or with int,int. I would like to avoid changing these If either way does not reduce the code diff and is virtually same, if I understood correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 When we join two tables, given there are equi-join keys, and they are non-deterministic, for example `t1.a = rand(t2.b)` and `t1.c = rand(t2.d)`. We pull out them to downstream project: Join [t1.a = rand(t2.b), t1.c = rand(t2.d)] Project [t1.a, t1.c] TableScan t1 Project [rand(t2.b) as rand(t2.b), rand(t2.d) as rand(t2.d)] TableScan t2 `rand(t2.b)` and `rand(t2.d)` are evaluated in projection. Why Join will change its order? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18920: [SPARK-19471][SQL]AggregationIterator does not initializ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18920 **[Test build #80599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80599/testReport)** for PR 18920 at commit [`5239ebb`](https://github.com/apache/spark/commit/5239ebb5843315430d5c942dc53e09fb09d6c1c8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 ok, I will solve the problems left first, and hold this PR @gatorsmile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9518: [SPARK-11574][Core] Add metrics StatsD sink
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/9518 Sorry I don't have the permission to merge this. Ping @cloud-fan @JoshRosen to review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18899 Thanks @sethah @srowen . The comment is added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18899: [SPARK-21680][ML][MLLIB]optimzie Vector coompress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18899 **[Test build #80600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80600/testReport)** for PR 18899 at commit [`d50de99`](https://github.com/apache/spark/commit/d50de9961f78c8d259b9167081c2d9529ce91a63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17980: [SPARK-20728][SQL] Make ORCFileFormat configurable betwe...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/17980 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18909: [MINOR][SQL] Additional test case for CheckCartesianProd...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18909 @gatorsmile sure, this PR is only about tests, I was just wondering what is planned regarding cross joins with inequality conditions. I borrowed several tests from PR #16762 and added additional ones. As I mentioned, there is a small overlap between the existing tests and proposed ones but they are defined at different levels. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org