[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212756422 **[Test build #56485 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56485/consoleFull)** for PR 12556 at commit [`c8708f7`](https://github.com/apache/spark/commit/c8708f7e9395811c9796bcbba68f63243bdda6cc). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` * [STORED AS file_format | STORED BY storage_handler_class [WITH SERDEPROPERTIES (...)]]` * `case class CreateTableAsSelectLogicalPlan(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212756425 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56485/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212756424 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212756240 **[Test build #56485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56485/consoleFull)** for PR 12556 at commit [`c8708f7`](https://github.com/apache/spark/commit/c8708f7e9395811c9796bcbba68f63243bdda6cc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212754891 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update DAGScheduler.scala
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/12524#issuecomment-212754822 is this related to #12436 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212754896 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56475/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212754203 **[Test build #56475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56475/consoleFull)** for PR 10024 at commit [`e7a98d5`](https://github.com/apache/spark/commit/e7a98d57a31923406c204e15f72c7a43579653bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12551 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60529161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortPrefixUtils.scala --- @@ -66,6 +66,32 @@ object SortPrefixUtils { } /** + * Returns whether the specified SortOrder can be satisfied with a radix sort on the prefix. + */ + def canSortFullyWithPrefix(sortOrder: SortOrder): Boolean = { +sortOrder.dataType match { + // TODO(ekl) long-type is problematic because it's null prefix representation collides with + // the lowest possible long value. Handle this special case outside radix sort. + case LongType if sortOrder.nullable => +false + case BooleanType | ByteType | ShortType | IntegerType | LongType | DateType | + TimestampType | FloatType | DoubleType => --- End diff -- nvm, the prefix of `null` had been changed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60529190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Sort.scala --- @@ -139,11 +148,15 @@ case class Sort( val dataSize = metricTerm(ctx, "dataSize") val spillSize = metricTerm(ctx, "spillSize") val spillSizeBefore = ctx.freshName("spillSizeBefore") +val startTime = ctx.freshName("startTime") +val sortTime = metricTerm(ctx, "sortTime") s""" | if ($needToSort) { | $addToSorter(); | Long $spillSizeBefore = $metrics.memoryBytesSpilled(); + | Long $startTime = System.nanoTime(); --- End diff -- long --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212753448 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12551#issuecomment-212753365 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60528873 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SortPrefixUtils.scala --- @@ -66,6 +66,32 @@ object SortPrefixUtils { } /** + * Returns whether the specified SortOrder can be satisfied with a radix sort on the prefix. + */ + def canSortFullyWithPrefix(sortOrder: SortOrder): Boolean = { +sortOrder.dataType match { + // TODO(ekl) long-type is problematic because it's null prefix representation collides with + // the lowest possible long value. Handle this special case outside radix sort. + case LongType if sortOrder.nullable => +false + case BooleanType | ByteType | ShortType | IntegerType | LongType | DateType | + TimestampType | FloatType | DoubleType => --- End diff -- the prefix of null for DoubleType (Double.NegativeInfinity) is also collides with the lowest possible long value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12551#issuecomment-212752845 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56470/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12551#issuecomment-212752841 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346] [SQL] Show create table
Github user xwu0226 closed the pull request at: https://github.com/apache/spark/pull/12406 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346] [SQL] Show create table
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/12406#issuecomment-212752759 Thanks @wangmiao1981 . This happens again. all the commits after my last commits got pulled into this PR. I need to close it and open a new PR. Will submit a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14786] Remove hive-cli dependency from ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12551#issuecomment-212752681 **[Test build #56470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56470/consoleFull)** for PR 12551 at commit [`e8c6f35`](https://github.com/apache/spark/commit/e8c6f35e4475586e87df6582ea51a18727ad8062). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14346] [SQL] Show create table
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/12406#issuecomment-212752658 @xwu0226 use git rebase upstream/master. Do not use git merge upstream/master. I have the same issue before. git merge will add others' commits to your PR. git rebase will discard others' commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12560#issuecomment-212752513 **[Test build #56484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56484/consoleFull)** for PR 12560 at commit [`239cabf`](https://github.com/apache/spark/commit/239cabf459fee929b19d541bf019de445ea2026d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60528592 --- Diff: core/src/test/scala/org/apache/spark/util/collection/unsafe/sort/PrefixComparatorsSuite.scala --- @@ -110,4 +112,12 @@ class PrefixComparatorsSuite extends SparkFunSuite with PropertyChecks { assert(PrefixComparators.DOUBLE.compare(nan1Prefix, doubleMaxPrefix) === 1) } + test("double prefix comparator handles negative NaNs properly") { +val negativeNan: Double = java.lang.Double.longBitsToDouble(0xfff1L) +assert(negativeNan.isNaN) +assert(java.lang.Double.doubleToRawLongBits(negativeNan) < 0) +val prefix = PrefixComparators.DoublePrefixComparator.computePrefix(negativeNan) +val doubleMaxPrefix = PrefixComparators.DoublePrefixComparator.computePrefix(Double.MaxValue) +assert(PrefixComparators.DOUBLE.compare(prefix, doubleMaxPrefix) === 1) --- End diff -- Could you also test MinValue, 0, NegativeInfinity, PositiveInfinity --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212752316 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56478/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212752210 **[Test build #56478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56478/consoleFull)** for PR 12556 at commit [`a5408e5`](https://github.com/apache/spark/commit/a5408e526a06a5d2629f21df1005696122916214). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212752313 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12559#issuecomment-212751784 **[Test build #56483 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56483/consoleFull)** for PR 12559 at commit [`e7afed9`](https://github.com/apache/spark/commit/e7afed92a21835bbb6d92df2dbd51fa872e2dbfa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14571][ML]Log instrumentation in ALS
GitHub user wangmiao1981 opened a pull request: https://github.com/apache/spark/pull/12560 [SPARK-14571][ML]Log instrumentation in ALS ## What changes were proposed in this pull request? Add log instrumentation for parameters: rank, numUserBlocks, numItemBlocks, implicitPrefs, alpha, userCol, itemCol, ratingCol, predictionCol, maxIter, regParam, nonnegative, checkpointInterval, seed Add log instrumentation for numUserFeatures and numItemFeatures ## How was this patch tested? Manual test: Set breakpoint in intellij and run def testALS(). Single step debugging and check the log method is called. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangmiao1981/spark log Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12560 commit 239cabf459fee929b19d541bf019de445ea2026d Author: wm...@hotmail.comDate: 2016-04-21T05:31:33Z add instrumentation to ALS --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212751532 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12559#issuecomment-212750650 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12559#issuecomment-212750651 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56481/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12559#issuecomment-212750649 **[Test build #56481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56481/consoleFull)** for PR 12559 at commit [`f17f42a`](https://github.com/apache/spark/commit/f17f42ac60e46fd26586dc9cb960689d7869f700). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Interrupt tasks in repl wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12557#issuecomment-212750487 **[Test build #56482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56482/consoleFull)** for PR 12557 at commit [`4f9bf69`](https://github.com/apache/spark/commit/4f9bf695344a5c4c54372eaa6bf54af0d2da1f74). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12559#issuecomment-212750481 **[Test build #56481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56481/consoleFull)** for PR 12559 at commit [`f17f42a`](https://github.com/apache/spark/commit/f17f42ac60e46fd26586dc9cb960689d7869f700). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13643][SQL] Implement SparkSession
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12553#issuecomment-212750214 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56468/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13643][SQL] Implement SparkSession
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12553#issuecomment-212750212 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13643][SQL] Implement SparkSession
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12553#issuecomment-212750048 **[Test build #56468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56468/consoleFull)** for PR 12553 at commit [`7ccfb38`](https://github.com/apache/spark/commit/7ccfb38d1cf378d017fd6a570e41d16bb02dbf86). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14793][SQL] Code generation for large c...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/12559 [SPARK-14793][SQL] Code generation for large complex type exceeds JVM size limit. ## What changes were proposed in this pull request? Code generation for complex type, `CreateArray`, `CreateMap`, `CreateStruct`, `CreateNamedStruct`, exceeds JVM size limit for large elements. ## How was this patch tested? I added some tests to check if the generated codes for the expressions exceed or not. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-14793 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12559 commit f17f42ac60e46fd26586dc9cb960689d7869f700 Author: Takuya UESHINDate: 2016-04-21T05:14:42Z Split wide complex type creation into blocks due to JVM code size limit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13988][Core] Make replaying event logs ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11800#issuecomment-212749912 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56474/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13988][Core] Make replaying event logs ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11800#issuecomment-212749911 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13988][Core] Make replaying event logs ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11800#issuecomment-212749831 **[Test build #56474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56474/consoleFull)** for PR 11800 at commit [`858e8ff`](https://github.com/apache/spark/commit/858e8ffefaa26c45249f81ed047ff00c77416bb6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14597] [Streaming] Streaming Listener t...
Github user agsachin commented on the pull request: https://github.com/apache/spark/pull/12357#issuecomment-212749355 hey closed this thread as we are moving towards approach 2 explained in the jira https://issues.apache.org/jira/browse/SPARK-14597 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14597] [Streaming] Streaming Listener t...
Github user agsachin closed the pull request at: https://github.com/apache/spark/pull/12357 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13266][SQL] None read/writer options we...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12494#issuecomment-212748601 @viirya Thanks for bearing with me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13266][SQL] None read/writer options we...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12494#issuecomment-212748355 @HyukjinKwon yea thanks for comment. I will remove the null check then. And I should make change to CSVOption to avoid the null exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Interrupt tasks in repl wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12557#issuecomment-212748073 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212748052 @davies Now all tests have been passed. So Could you take a look again? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Interrupt tasks in repl wi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12557#issuecomment-212748076 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56480/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Interrupt tasks in repl wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12557#issuecomment-212748057 **[Test build #56480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56480/consoleFull)** for PR 12557 at commit [`94323b9`](https://github.com/apache/spark/commit/94323b9a7e498c03f0de93bc536e5fe6710d062b). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212747490 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212747494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56466/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60527018 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer( } def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, _, _, _, _) => -i.copy(table = EliminateSubqueryAliases(getTable(u))) + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +val table = getTable(u) +// adding the table's partitions or validate the query's partition info +table match { + case relation: PartitionedRelation if relation.partitionColumns.nonEmpty => +val tablePartitionNames = relation.partitionColumns.map(_.name) +if (parts.keys.nonEmpty) { + // the query's partitioning must match the table's partitioning + // this is set for queries like: insert into ... partition (one = "a", two = ) + if (tablePartitionNames.size != parts.keySet.size) { +throw new AnalysisException( + s"""Requested partitioning does not match the ${u.tableIdentifier} table: + |Requested partitions: ${parts.keys.mkString(",")} + |Table partitions: ${tablePartitionNames.mkString(",")}""".stripMargin) + } + // assumes partition columns are correctly placed at the end of the child's output + i.copy(table = EliminateSubqueryAliases(table)) +} else { + // Set up the table's partition scheme with all dynamic partitions by moving partition + // columns to the end of the column list, in partition order. + val (inputPartCols, columns) = child.output.partition { attr => +tablePartitionNames.contains(attr.name) + } + // All partition columns are dynamic because this InsertIntoTable had no partitioning + val partColumns = tablePartitionNames.map { name => --- End diff -- when will the `partColumns` different from `inputPartCols`? Seems neve? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212747343 **[Test build #56466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56466/consoleFull)** for PR 10024 at commit [`e7a98d5`](https://github.com/apache/spark/commit/e7a98d57a31923406c204e15f72c7a43579653bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60526988 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer( } def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, _, _, _, _) => -i.copy(table = EliminateSubqueryAliases(getTable(u))) + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +val table = getTable(u) +// adding the table's partitions or validate the query's partition info +table match { + case relation: PartitionedRelation if relation.partitionColumns.nonEmpty => +val tablePartitionNames = relation.partitionColumns.map(_.name) +if (parts.keys.nonEmpty) { + // the query's partitioning must match the table's partitioning + // this is set for queries like: insert into ... partition (one = "a", two = ) + if (tablePartitionNames.size != parts.keySet.size) { --- End diff -- why do we only check size here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10001][Core] Allow interrupting tasks i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12557#issuecomment-212747247 **[Test build #56480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56480/consoleFull)** for PR 12557 at commit [`94323b9`](https://github.com/apache/spark/commit/94323b9a7e498c03f0de93bc536e5fe6710d062b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13266][SQL] None read/writer options we...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/12494#issuecomment-212747103 @viirya Ah, thanks! however, I am a bit worried if disallowing `null` as a value in options is correct. I don't want to be kind of so picky but I think it might not be guaranteed that every option does not take `null` as a value for all. Maybe in some external datasources or future options, I think there might be some cases that setting `null` as an option and not setting are not the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60526773 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/PrefixComparators.java --- @@ -28,88 +28,84 @@ public class PrefixComparators { private PrefixComparators() {} - public static final StringPrefixComparator STRING = new StringPrefixComparator(); - public static final StringPrefixComparatorDesc STRING_DESC = new StringPrefixComparatorDesc(); - public static final BinaryPrefixComparator BINARY = new BinaryPrefixComparator(); - public static final BinaryPrefixComparatorDesc BINARY_DESC = new BinaryPrefixComparatorDesc(); - public static final LongPrefixComparator LONG = new LongPrefixComparator(); - public static final LongPrefixComparatorDesc LONG_DESC = new LongPrefixComparatorDesc(); - public static final DoublePrefixComparator DOUBLE = new DoublePrefixComparator(); - public static final DoublePrefixComparatorDesc DOUBLE_DESC = new DoublePrefixComparatorDesc(); - - public static final class StringPrefixComparator extends PrefixComparator { -@Override -public int compare(long aPrefix, long bPrefix) { - return UnsignedLongs.compare(aPrefix, bPrefix); -} - + public static final PrefixComparator STRING = new UnsignedPrefixComparator(); + public static final PrefixComparator STRING_DESC = new UnsignedPrefixComparatorDesc(); + public static final PrefixComparator BINARY = new UnsignedPrefixComparator(); + public static final PrefixComparator BINARY_DESC = new UnsignedPrefixComparatorDesc(); + public static final PrefixComparator LONG = new SignedPrefixComparator(); + public static final PrefixComparator LONG_DESC = new SignedPrefixComparatorDesc(); + public static final PrefixComparator DOUBLE = new SignedPrefixComparator(); + public static final PrefixComparator DOUBLE_DESC = new SignedPrefixComparatorDesc(); + + public static final class StringPrefixComparator { public static long computePrefix(UTF8String value) { return value == null ? 0L : value.getPrefix(); } } - public static final class StringPrefixComparatorDesc extends PrefixComparator { -@Override -public int compare(long bPrefix, long aPrefix) { - return UnsignedLongs.compare(aPrefix, bPrefix); + public static final class BinaryPrefixComparator { +public static long computePrefix(byte[] bytes) { + return ByteArray.getPrefix(bytes); } } - public static final class BinaryPrefixComparator extends PrefixComparator { -@Override -public int compare(long aPrefix, long bPrefix) { - return UnsignedLongs.compare(aPrefix, bPrefix); + public static final class DoublePrefixComparator { +public static long computePrefix(double value) { + // Java's doubleToLongBits already canonicalizes all NaN values to the lowest possible NaN, + // so there's nothing special we need to do here. + return Double.doubleToLongBits(value); } + } -public static long computePrefix(byte[] bytes) { - return ByteArray.getPrefix(bytes); -} + /** + * Provides radix sort parameters. Comparators implementing this also are indicating that the + * ordering they define is compatible with radix sort. + */ + public static abstract class RadixSortSupport extends PrefixComparator { +/** @return Whether the sort should be descending in binary sort order. */ +public abstract boolean sortDescending(); + +/** @return Whether the sort should take into account the sign bit. */ +public abstract boolean sortSigned(); } - public static final class BinaryPrefixComparatorDesc extends PrefixComparator { + // + // Standard prefix comparator implementations + // + + public static final class UnsignedPrefixComparator extends RadixSortSupport { +@Override public final boolean sortDescending() { return false; } +@Override public final boolean sortSigned() { return false; } @Override -public int compare(long bPrefix, long aPrefix) { +public final int compare(long aPrefix, long bPrefix) { return UnsignedLongs.compare(aPrefix, bPrefix); } } - public static final class LongPrefixComparator extends PrefixComparator { + public static final class UnsignedPrefixComparatorDesc extends RadixSortSupport { +@Override public final boolean sortDescending() { return true; } +@Override public final boolean sortSigned() { return false; } @Override -public int compare(long a, long b) { - return (a < b) ? -1 : (a > b) ? 1 : 0; +public
[GitHub] spark pull request: [SPARK-10101] [SQL] Add maxlength to JDBC fiel...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/8374#issuecomment-212746977 Seems no because it gets stale. If nobody takes this, I'll do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14459] [SQL] Detect relation partitioni...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12239#discussion_r60526769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -414,8 +414,42 @@ class Analyzer( } def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case i @ InsertIntoTable(u: UnresolvedRelation, _, _, _, _) => -i.copy(table = EliminateSubqueryAliases(getTable(u))) + case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) if child.resolved => +val table = getTable(u) +// adding the table's partitions or validate the query's partition info +table match { + case relation: PartitionedRelation if relation.partitionColumns.nonEmpty => +val tablePartitionNames = relation.partitionColumns.map(_.name) +if (parts.keys.nonEmpty) { + // the query's partitioning must match the table's partitioning + // this is set for queries like: insert into ... partition (one = "a", two = ) + if (tablePartitionNames.size != parts.keySet.size) { --- End diff -- why do we only check size here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212746211 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56462/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212746210 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12525#issuecomment-212746170 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56467/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12525#issuecomment-212746169 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212746062 **[Test build #56462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56462/consoleFull)** for PR 10024 at commit [`7ea7274`](https://github.com/apache/spark/commit/7ea727470735cb2a420bd5411af0202d264d9ec7). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12525#issuecomment-212746059 **[Test build #56467 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56467/consoleFull)** for PR 12525 at commit [`3fcc4c3`](https://github.com/apache/spark/commit/3fcc4c34f515cfcb2b6dd56480e8824d1fa66e46). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12490#discussion_r60526550 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -87,18 +102,18 @@ public void expandPointerArray(LongArray newArray) { array.getBaseOffset(), newArray.getBaseObject(), newArray.getBaseOffset(), - array.size() * 8L + array.size() * (Long.BYTES / memoryAllocationFactor) --- End diff -- * ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212745547 **[Test build #56472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56472/consoleFull)** for PR 12556 at commit [`25da47d`](https://github.com/apache/spark/commit/25da47dfb040e0936eda8bfd5b282e1d3e094b5a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212745631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56472/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212745630 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13266][SQL] None read/writer options we...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12494#issuecomment-212745533 @HyukjinKwon ok. make thing clearer... @mathieulongtin wants to use `null` as option value in spark-csv from databricks. So he did the PR to allow passing `None` from python to Scala, instead of string "None". The passed `None` will be `null` in Scala side. However, passing `null` as option value in current codes will cause null exception. Because Spark CSV data source does not handle `null`. In this PR, I filter nulls and disallow using null as option value. It is not @mathieulongtin wants in his original PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14790] Always run scalastyle on sbt com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12555#issuecomment-212745308 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56463/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14790] Always run scalastyle on sbt com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12555#issuecomment-212745302 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14790] Always run scalastyle on sbt com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12555#issuecomment-212744664 **[Test build #56463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56463/consoleFull)** for PR 12555 at commit [`403fab6`](https://github.com/apache/spark/commit/403fab62fe6f40f2b21e09f06ff1481cb2e19cec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60526334 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -238,4 +281,42 @@ class VectorizedHashMapGenerator( |} """.stripMargin } + + private def computeHash( + input: String, + dataType: DataType, + result: String, + ctx: CodegenContext): String = { --- End diff -- We usually put `ctx` as the first argument --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14787][SQL] Upgrade Joda-Time library f...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12552#issuecomment-212744081 @HyukjinKwon thanks for picking up this PR & taking the time to investigate the places where the changes could be useful for Spark. This looks good to me as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60526194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -238,4 +281,42 @@ class VectorizedHashMapGenerator( |} """.stripMargin } + + private def computeHash( --- End diff -- genComputeHash ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13266][SQL] None read/writer options we...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12494#discussion_r60526159 --- Diff: python/pyspark/sql/readwriter.py --- @@ -367,16 +370,19 @@ def format(self, source): @since(1.5) def option(self, key, value): """Adds an output option for the underlying data source. + +>>> csvpath = os.path.join(tempfile.mkdtemp(), 'data') +>>> df.write.option('quote', None).format('csv').save(csvpath) --- End diff -- I added null check that disallows null input. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60526133 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -224,6 +224,127 @@ class BenchmarkWholeStageCodegen extends SparkFunSuite { */ } + ignore("aggregate with string key") { +val N = 20 << 20 + +val benchmark = new Benchmark("Aggregate w string key", N) +def f(): Unit = sqlContext.range(N).selectExpr("id", "cast(id & 1023 as string) as k") + .groupBy("k").count().collect() + +benchmark.addCase(s"codegen = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = T") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "true") + f() +} + +benchmark.run() + +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Aggregate w string key: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +codegen = F 3307 / 3376 6.3 157.7 1.0X +codegen = T hashmap = F 2364 / 2471 8.9 112.7 1.4X +codegen = T hashmap = T 1740 / 1841 12.0 83.0 1.9X +*/ + } + + ignore("aggregate with decimal key") { +val N = 20 << 20 + +val benchmark = new Benchmark("Aggregate w decimal key", N) +def f(): Unit = sqlContext.range(N).selectExpr("id", "cast(id & 65535 as decimal) as k") + .groupBy("k").count().collect() + +benchmark.addCase(s"codegen = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = T") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "true") + f() +} + +benchmark.run() + +/* +Java HotSpot(TM) 64-Bit Server VM 1.8.0_73-b02 on Mac OS X 10.11.4 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz +Aggregate w decimal key: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative + --- +codegen = F 2756 / 2817 7.6 131.4 1.0X +codegen = T hashmap = F 1580 / 1647 13.3 75.4 1.7X +codegen = T hashmap = T 641 / 662 32.7 30.6 4.3X +*/ + } + + ignore("aggregate with multiple key types") { +val N = 20 << 20 + +val benchmark = new Benchmark("Aggregate w multiple keys", N) +def f(): Unit = sqlContext.range(N) + .selectExpr( +"id", +"(id & 1023) as k1", +"cast(id & 1023 as string) as k2", +"cast(id & 1023 as int) as k3", +"cast(id & 1023 as double) as k4", +"cast(id & 1023 as float) as k5", +"id > 1023 as k6") + .groupBy("k1", "k2", "k3", "k4", "k5", "k6") + .sum() + .collect() + +benchmark.addCase(s"codegen = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = F") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "false") + f() +} + +benchmark.addCase(s"codegen = T hashmap = T") { iter => + sqlContext.setConf("spark.sql.codegen.wholeStage", "true") + sqlContext.setConf("spark.sql.codegen.aggregate.map.enabled", "true") + f() +} + +benchmark.run() + +/* +Java HotSpot(TM) 64-Bit Server VM
[GitHub] spark pull request: TEST - Throw exception on unsupported analyze ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12558#issuecomment-212742755 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56473/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: TEST - Throw exception on unsupported analyze ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12558#issuecomment-212742754 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: TEST - Throw exception on unsupported analyze ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12558#issuecomment-212742683 **[Test build #56473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56473/consoleFull)** for PR 12558 at commit [`a37eb1d`](https://github.com/apache/spark/commit/a37eb1d60b4c630dbd85753cb17b9d8c7f25ef20). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60526044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -69,17 +90,29 @@ class VectorizedHashMapGenerator( val generatedSchema: String = s""" |new org.apache.spark.sql.types.StructType() - |${(groupingKeySchema ++ bufferSchema).map(key => - s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""") - .mkString("\n")}; + |${(groupingKeySchema ++ bufferSchema).map { key => +key.dataType match { + case d: DecimalType => +s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.createDecimalType( + |${d.precision}, ${d.scale}))""".stripMargin + case _ => +s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""" +} + }.mkString("\n")}; """.stripMargin val generatedAggBufferSchema: String = s""" |new org.apache.spark.sql.types.StructType() - |${bufferSchema.map(key => -s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""") -.mkString("\n")}; + |${bufferSchema.map { key => --- End diff -- same this one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14780][R] Add `setLogLevel` to SparkR
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12547#issuecomment-212742481 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60525999 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -69,17 +90,29 @@ class VectorizedHashMapGenerator( val generatedSchema: String = s""" |new org.apache.spark.sql.types.StructType() - |${(groupingKeySchema ++ bufferSchema).map(key => - s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""") - .mkString("\n")}; + |${(groupingKeySchema ++ bufferSchema).map { key => +key.dataType match { --- End diff -- It's to pull this big expression out of the string --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14780][R] Add `setLogLevel` to SparkR
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12547#issuecomment-212742482 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56479/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60525942 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -69,17 +90,29 @@ class VectorizedHashMapGenerator( val generatedSchema: String = s""" |new org.apache.spark.sql.types.StructType() - |${(groupingKeySchema ++ bufferSchema).map(key => - s""".add("${key.name}", org.apache.spark.sql.types.DataTypes.${key.dataType})""") - .mkString("\n")}; + |${(groupingKeySchema ++ bufferSchema).map { key => +key.dataType match { --- End diff -- more `|` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14780][R] Add `setLogLevel` to SparkR
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12547#issuecomment-212742410 **[Test build #56479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56479/consoleFull)** for PR 12547 at commit [`0abf874`](https://github.com/apache/spark/commit/0abf874b4b5402197c28b74ba16f50b46c81a1d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60525850 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala --- @@ -40,12 +41,32 @@ import org.apache.spark.sql.types.StructType */ class VectorizedHashMapGenerator( ctx: CodegenContext, +aggregateExpressions: Seq[AggregateExpression], generatedClassName: String, groupingKeySchema: StructType, bufferSchema: StructType) { - val groupingKeys = groupingKeySchema.map(k => (k.dataType.typeName, ctx.freshName("key"))) - val bufferValues = bufferSchema.map(k => (k.dataType.typeName, ctx.freshName("value"))) - val groupingKeySignature = groupingKeys.map(_.productIterator.toList.mkString(" ")).mkString(", ") + case class Buffer(dataType: DataType, name: String) + val groupingKeys = groupingKeySchema.map(k => Buffer(k.dataType, ctx.freshName("key"))) + val bufferValues = bufferSchema.map(k => Buffer(k.dataType, ctx.freshName("value"))) + val groupingKeySignature = groupingKeys.map(key => (ctx.javaType(key.dataType), key.name)) --- End diff -- `s"${ctx.javaType(key.dataType)} ${key.name}"` will be easier to understand --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14680][SQL]Support all datatypes to use...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12440#discussion_r60525541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -271,6 +271,74 @@ class CodegenContext { } /** + * Returns the specialized code to set a given value in a column vector for a given `DataType`. + */ + def setValue(batch: String, row: String, dataType: DataType, ordinal: Int, + value: String): String = { +val jt = javaType(dataType) +dataType match { + case _ if isPrimitiveType(jt) => +s"$batch.column($ordinal).put${primitiveTypeName(jt)}($row, $value);" + case t: DecimalType => s"$batch.column($ordinal).putDecimal($row, $value, ${t.precision});" + case t: StringType => s"$batch.column($ordinal).putByteArray($row, $value.getBytes());" + case _ => +throw new IllegalArgumentException(s"cannot generate code for unsupported type: $dataType") +} + } + + /** + * Returns the specialized code to set a given value in a column vector for a given `DataType` + * that could potentially be nullable. + */ + def updateColumn( + batch: String, + row: String, + dataType: DataType, + ordinal: Int, + ev: ExprCode, + nullable: Boolean): String = { +if (nullable) { + // Can't call setNullAt on DecimalType, because we need to keep the offset --- End diff -- For batch, this is not true. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212739055 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56459/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212739052 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12525#discussion_r60525166 --- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala --- @@ -36,7 +36,7 @@ class StageInfo( val rddInfos: Seq[RDDInfo], val parentIds: Seq[Int], val details: String, -val taskMetrics: TaskMetrics = new TaskMetrics, +val taskMetrics: TaskMetrics = null, --- End diff -- creating `TaskMetrics` is not that cheap(register and un-register accumulators), so using null here as we won't call it when using default value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10024#issuecomment-212738764 **[Test build #56459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56459/consoleFull)** for PR 10024 at commit [`e009d95`](https://github.com/apache/spark/commit/e009d95c715879269253da2b47e669ffc2e13683). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212737750 **[Test build #56478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56478/consoleFull)** for PR 12556 at commit [`a5408e5`](https://github.com/apache/spark/commit/a5408e526a06a5d2629f21df1005696122916214). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14792][SQL] Move as many parsing rules ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12556#issuecomment-212737840 cc @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14780][R] Add `setLogLevel` to SparkR
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12547#issuecomment-212737812 **[Test build #56479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56479/consoleFull)** for PR 12547 at commit [`0abf874`](https://github.com/apache/spark/commit/0abf874b4b5402197c28b74ba16f50b46c81a1d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12525#discussion_r60524791 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -244,7 +243,14 @@ private[spark] class ListenerTaskMetrics(accumUpdates: Seq[AccumulableInfo]) ext private[spark] object TaskMetrics extends Logging { - def empty: TaskMetrics = new TaskMetrics + /** + * Create an empty task metrics that doesn't register its accumulators. + */ + def empty: TaskMetrics = { --- End diff -- This is not only used in test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12525#issuecomment-212735921 **[Test build #56477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56477/consoleFull)** for PR 12525 at commit [`ce0262b`](https://github.com/apache/spark/commit/ce0262b156990ed4a8e5ff854794a33f4bef582a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14753][CORE] remove internal flag in Ac...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12525#issuecomment-212733494 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14782][SPARK-14778][SQL] Remove HiveCon...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12550 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14780][R] Add `setLogLevel` to SparkR
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12547#discussion_r60524553 --- Diff: R/pkg/R/context.R --- @@ -225,3 +225,17 @@ broadcast <- function(sc, object) { setCheckpointDir <- function(sc, dirName) { invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(dirName } + +#' Set new log level +#' +#' Set new log level: "ALL", "DEBUG", "ERROR", "FATAL", "INFO", "OFF", "TRACE", "WARN" +#' @param sc Spark Context to use +#' @param level New log level +#' @examples --- End diff -- Thank you, @felixcheung . I'll fix soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org