[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185084845 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185084851 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51409/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185084564 **[Test build #51409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51409/consoleFull)** for PR 11229 at commit [`bfcd14c`](https://github.com/apache/spark/commit/bfcd14cb68ebb21ca3a5a1a6667758c76c77178d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10621#issuecomment-185083006 > @zsxwing > Could you confirm that the latest version of py4j (0.9.1) got packaged with spark 1.5.2. > The spark that got installed using AWS and the 1.5.2 tag (https://github.com/apache/spark/tree/v1.5.2/python/lib) contains 0.8.2.1. > > Let me know, If I have missed anything.. @sarathjiguru this bug exists in 1.5.2. You need to apply the patches by yourself for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185082259 **[Test build #51418 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51418/consoleFull)** for PR 11083 at commit [`c7429bb`](https://github.com/apache/spark/commit/c7429bb2aaa008e9427dd1e93d476a1f8506a78a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11235#issuecomment-185082131 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11235#issuecomment-185082137 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51414/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11235#issuecomment-185081565 **[Test build #51414 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51414/consoleFull)** for PR 11235 at commit [`132890b`](https://github.com/apache/spark/commit/132890bbcd013dd284dfb9d9fd48e8440343ddd5). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185081528 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185081535 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51411/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185081069 **[Test build #51411 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51411/consoleFull)** for PR 10757 at commit [`bce3339`](https://github.com/apache/spark/commit/bce3339f42bebf98092a4e67d3d8028d27624704). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.6
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11233#issuecomment-185080917 @fartzy, would you mind closing this pull request? We don't have the permissions to do it ourselves and will have to resort to pushing a dummy "closes #11233" commit if you don't do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185075144 **[Test build #51417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51417/consoleFull)** for PR 10757 at commit [`e9f9fd9`](https://github.com/apache/spark/commit/e9f9fd9208e4729dcac58ec66c5d542e322af9eb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185074506 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51406/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185074502 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185074026 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185073572 **[Test build #51406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51406/consoleFull)** for PR 11231 at commit [`cacd652`](https://github.com/apache/spark/commit/cacd65266d9f8708fe195c1708450e37c2905c49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185066164 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51416/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185066367 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185066158 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185064091 @yinxusen Thanks for reviewing, I have addressed the comments, Please have a look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11234#issuecomment-185063813 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11234#issuecomment-185063815 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51410/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11234#issuecomment-185063400 **[Test build #51410 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51410/consoleFull)** for PR 11234 at commit [`7d87244`](https://github.com/apache/spark/commit/7d87244a2bce20d891135ba64ee408bb5d23c6cd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185062646 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185062649 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51415/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185061326 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185061329 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51413/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185061002 **[Test build #51413 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51413/consoleFull)** for PR 11132 at commit [`2ada7ef`](https://github.com/apache/spark/commit/2ada7ef1859b0338a2613164d261671add1ff227). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13136][SQL] Create a dedicated Broadcas...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/11083#issuecomment-185057987 I renamed the `Exchange` and `Broadcast` operators to `ShuffleExchange` and `BroadcastExchange`. The `BroadcastExchange` operator is now part of `exchange.scala`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13327][SPARKR] Added parameter validati...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11220#discussion_r53126613 --- Diff: R/pkg/R/DataFrame.R --- @@ -303,8 +303,28 @@ setMethod("colnames", #' @rdname columns #' @name colnames<- setMethod("colnames<-", - signature(x = "DataFrame", value = "character"), + signature(x = "DataFrame"), function(x, value) { + +# Check parameter integrity +if (class(value) != "character") { + stop("Invalid column names.") +} + +if (length(value) != ncol(x)) { + stop( +"Column names must have the same length as the number of columns in the dataset.") +} + +if (any(is.na(value))) { + stop("Column names cannot be NA.") +} + +# Check if the column names have . in it +if (any(regexec(".", value, fixed=TRUE)[[1]][1] != -1)) { --- End diff -- This might seem rather restrictive? As this is possibly fixed by https://issues.apache.org/jira/browse/SPARK-11976 Instead of hardcoding the check here, why not just let this through? As of now a column cannot have '.' in it anyway --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11232#discussion_r53126328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala --- @@ -140,6 +140,17 @@ object IntegerLiteral { } /** + * Extractor for retrieving Boolean literals. + */ +object BooleanLiteral { --- End diff -- This looks weird. Called BooleanLiteral but actually it matches literals of IntegerType? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11235#issuecomment-185053702 **[Test build #51414 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51414/consoleFull)** for PR 11235 at commit [`132890b`](https://github.com/apache/spark/commit/132890bbcd013dd284dfb9d9fd48e8440343ddd5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185053841 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185053842 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51412/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13016] [Documentation] Replace example ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11132#issuecomment-185052767 **[Test build #51413 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51413/consoleFull)** for PR 11132 at commit [`2ada7ef`](https://github.com/apache/spark/commit/2ada7ef1859b0338a2613164d261671add1ff227). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13327][SPARKR] Added parameter validati...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/11220#discussion_r53126056 --- Diff: R/pkg/R/DataFrame.R --- @@ -303,8 +303,28 @@ setMethod("colnames", #' @rdname columns #' @name colnames<- setMethod("colnames<-", - signature(x = "DataFrame", value = "character"), --- End diff -- I agree - letting R do type matching for the method signature seems like a better approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/11235 [SPARK-13249][SQL] Add Filter checking nullability of keys for inner join JIRA: https://issues.apache.org/jira/browse/SPARK-13249 For inner join, the join key with null in it will not match each other, so we could insert a Filter before inner join (could be pushed down), then we don't need to check nullability of keys while joining. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 add-filter-for-innerjoin Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11235 commit 216305fef918f46d50af7107c7ea3182ad91afcf Author: Liang-Chi HsiehDate: 2016-02-17T06:31:03Z Add Filter checking nullability of keys for inner join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11175 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12820][SQL]Resolve db.table.column
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/10753#issuecomment-185045754 I would say we'd better keep the same checking logic with mysql/hive for the `ambiguous` case, @zhichao-li can you please check that with mysql/hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11232#issuecomment-185045523 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51405/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11232#issuecomment-185045518 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11232#issuecomment-185045286 **[Test build #51405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51405/consoleFull)** for PR 11232 at commit [`3f90749`](https://github.com/apache/spark/commit/3f90749148113069c312d5a03b09a67b054e5620). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Rand(seed: Long, isDeterministic: Boolean = false) extends RDG ` * `case class Randn(seed: Long, isDeterministic: Boolean = false) extends RDG ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53125003 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") --- End diff -- I don't think we have to put the fixed number of `Runtime.getRuntime.availableProcessors()`, probably we can simply put a fixed number says `16` or even bigger, as the bottleneck is in network / IO, not the CPU scheduling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12617][PySpark]Move Py4jCallbackConnect...
Github user sarathjiguru commented on the pull request: https://github.com/apache/spark/pull/10621#issuecomment-185044972 @zsxwing Could you confirm that the latest version of py4j (0.9.1) got packaged with spark 1.5.2. The spark that got installed using AWS and the 1.5.2 tag (https://github.com/apache/spark/tree/v1.5.2/python/lib) contains 0.8.2.1. Let me know, If I have missed anything.. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...
Github user sitalkedia commented on the pull request: https://github.com/apache/spark/pull/11175#issuecomment-185044331 @kayousterhout did you get some time to look into this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124605 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/ParallelUnionRDD.scala --- @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.concurrent.Callable + +import org.apache.spark.rdd.{RDD, UnionPartition, UnionRDD} +import org.apache.spark.util.ThreadUtils +import org.apache.spark.{Partition, SparkContext} + +import scala.reflect.ClassTag + +class ParallelUnionRDD[T: ClassTag]( + sc: SparkContext, + rdds: Seq[RDD[T]]) extends UnionRDD[T](sc, rdds){ + // TODO: We might need to guess a more reasonable thread pool size here + @transient val executorService = ThreadUtils.newDaemonFixedThreadPool( +Math.min(rdds.size, Runtime.getRuntime.availableProcessors()), "ParallelUnionRDD") + + override def getPartitions: Array[Partition] = { +// Calc partitions field for each RDD in parallel. +val rddPartitions = rdds.map {rdd => + (rdd, executorService.submit(new Callable[Array[Partition]] { +override def call(): Array[Partition] = rdd.partitions + })) +}.map {case(r, f) => (r, f.get())} + +val array = new Array[Partition](rddPartitions.map(_._2.length).sum) --- End diff -- seems here still be the main thread, probably we even don't need to place the `synchronized` in the `getPartitions`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124525 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- to be more precisely, https://github.com/apache/spark/pull/9483/files#diff-f4d927f57038fd77e8df7e976a0f29b3R35 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11517][SQL]Calc partitions in parallel ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/9483#discussion_r53124507 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -211,7 +211,7 @@ abstract class RDD[T: ClassTag]( // Our dependencies and partitions will be gotten by calling subclass's methods below, and will // be overwritten when we're checkpointed private var dependencies_ : Seq[Dependency[_]] = null - @transient private var partitions_ : Array[Partition] = null + @transient @volatile private var partitions_ : Array[Partition] = null --- End diff -- per my understanding, I don't think we need the `@volatile` here, probably the only place we need to change is the add the modifier of `synchronized` for method `getPartitions` in the concrete sub class of RDD, which will force the cpu cache to memory as the barrier fence of jvm memory model. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185043637 **[Test build #51411 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51411/consoleFull)** for PR 10757 at commit [`bce3339`](https://github.com/apache/spark/commit/bce3339f42bebf98092a4e67d3d8028d27624704). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13232][YARN] Fix executor node label
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/11129#discussion_r53124448 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -307,8 +307,14 @@ private[yarn] class YarnAllocator( nodes: Array[String], racks: Array[String]): ContainerRequest = { nodeLabelConstructor.map { constructor => + val labelExp = if ((racks != null && (!racks.isEmpty)) +|| (nodes != null && (!nodes.isEmpty))) { +null + } else { +labelExpression.orNull + } constructor.newInstance(resource, nodes, racks, RM_REQUEST_PRIORITY, true: java.lang.Boolean, -labelExpression.orNull) +labelExp) --- End diff -- From my understanding, currently in your implementation if `nodes` or `racks` is not empty, label expression will not be worked even it is explicitly set through configuration. IMO I would choose to set `nodes` and `racks` to null if label expression is configured, otherwise user will be confused why explicitly setting lab expression is not worked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185042471 @cloud-fan I'm open to the new column naming issue. Personally I tend to make it more consistent. Please note that replace `prettyString` with `sql` already breaks compatibility (e.g. `a && b` is renamed to `a AND b`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10759 [MLlib] Add python example to mode...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/11202#issuecomment-185040855 Refer to https://github.com/apache/spark/pull/11126 and JIRA https://issues.apache.org/jira/browse/SPARK-11337 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11234#issuecomment-185040622 **[Test build #51410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51410/consoleFull)** for PR 11234 at commit [`7d87244`](https://github.com/apache/spark/commit/7d87244a2bce20d891135ba64ee408bb5d23c6cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10759 [MLlib] Add python example to mode...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/11202#issuecomment-185040255 It's better to reuse current cross validator with `{% include_example %}` other than writing it directly in markdown file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10757#discussion_r53123489 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -198,18 +208,26 @@ case class GetArrayStructFields( } } +case class PrettyGetArrayStructFields( +child: Expression, ordinal: Int, name: String, dataType: DataType) + extends UnaryExpression with Unevaluable { + + override def sql: String = s"${child.sql}[$ordinal].$name" --- End diff -- Yes, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user ygcao commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-185036851 Done! sorry for missing the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.6
Github user fartzy commented on the pull request: https://github.com/apache/spark/pull/11233#issuecomment-185036762 Oh this was by accident! -Original Message- From: "UCB AMPLab"Sent: ‎2/‎16/‎2016 11:23 PM To: "apache/spark" Cc: "Mike" Subject: Re: [spark] Branch 1.6 (#11233) Can one of the admins verify this patch? — Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11234#issuecomment-185036566 It sounds like this PR is related to the following two PRs https://github.com/apache/spark/pull/10567 and https://github.com/apache/spark/pull/10566 If we can convert the outer joins to inner joins, the push down will be done by the existing rules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13354] [SQL] push filter throughout out...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/11234 [SPARK-13354] [SQL] push filter throughout outer join For a query ``` select * from a left outer join b on a.a = b.a where b.b > 10 ``` The condition `b.b > 10` will filter out all the row that the b part of it is empty. In this case, we should use Inner join, and push down the filter into b. You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark filter_outer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11234.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11234 commit 7d87244a2bce20d891135ba64ee408bb5d23c6cd Author: Davies LiuDate: 2016-02-17T05:36:29Z push filter throughout outer join --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user ygcao commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r53122840 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -289,24 +301,20 @@ class Word2Vec extends Serializable with Logging { val expTable = sc.broadcast(createExpTable()) val bcVocab = sc.broadcast(vocab) val bcVocabHash = sc.broadcast(vocabHash) - -val sentences: RDD[Array[Int]] = words.mapPartitions { iter => - new Iterator[Array[Int]] { -def hasNext: Boolean = iter.hasNext - -def next(): Array[Int] = { - val sentence = ArrayBuilder.make[Int] - var sentenceLength = 0 - while (iter.hasNext && sentenceLength < MAX_SENTENCE_LENGTH) { -val word = bcVocabHash.value.get(iter.next()) -word match { - case Some(w) => -sentence += w -sentenceLength += 1 - case None => -} +// each partition is a collection of sentences, +// will be translated into arrays of Index integer +val sentences: RDD[Array[Int]] = dataset.mapPartitions { sentenceIter => + // Each sentence will map to 0 or more Array[Int] + sentenceIter.flatMap { sentence => { + // Sentence of words, some of which map to a word index + val wordIndexes = sentence.flatMap(bcVocabHash.value.get) + if (wordIndexes.nonEmpty) { --- End diff -- You guys are right. didn't quite get @mengxr 's previous explanation. The splitting function is doing something unexpected to me, split an empty String will result in a non empty array. FYI. I verified the logic using following codes to be more explicit test, which proved your assertions. scala> sentences res4: List[List[String]] = List(List(a, b, c), List(b), List(c, d)) scala> dict res5: scala.collection.immutable.Map[String,Int] = Map(a -> 1, c -> 2) scala> sentences.flatMap(sen=>{val indexes=sen.flatMap(dict.get);indexes.grouped(2).map(_.toArray)}) res6: List[Array[Int]] = List(Array(1, 2), Array(2)) scala> "".split(" ") res7: Array[String] = Array("") scala> "".split(" ").size res8: Int = 1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10705#issuecomment-185034523 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51408/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12757][WIP] Use reference counting to p...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10705#issuecomment-185034522 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13357][SQL] Use generated projection an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11230#issuecomment-185033275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51403/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13357][SQL] Use generated projection an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11230#issuecomment-185033272 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13357][SQL] Use generated projection an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11230#issuecomment-185032581 **[Test build #51403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51403/consoleFull)** for PR 11230 at commit [`63833df`](https://github.com/apache/spark/commit/63833dfdf4418d69a94a2c34aa91c06796606697). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185032333 **[Test build #51409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51409/consoleFull)** for PR 11229 at commit [`bfcd14c`](https://github.com/apache/spark/commit/bfcd14cb68ebb21ca3a5a1a6667758c76c77178d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185030818 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51407/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185030814 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.6
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11233#issuecomment-185030541 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185030163 **[Test build #51406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51406/consoleFull)** for PR 11231 at commit [`cacd652`](https://github.com/apache/spark/commit/cacd65266d9f8708fe195c1708450e37c2905c49). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.6
GitHub user fartzy opened a pull request: https://github.com/apache/spark/pull/11233 Branch 1.6 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.6 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11233.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11233 commit 04dfaa6d58bd9ce18a141a976a4a96218e5ee9e0 Author: Yanbo LiangDate: 2015-12-06T00:39:01Z [SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note. cc sun-rui felixcheung shivaram Author: Yanbo Liang Closes #10123 from yanboliang/spark-12115. (cherry picked from commit 6979edf4e1a93caafa8d286692097dd377d7616d) Signed-off-by: Shivaram Venkataraman commit 2feac49fbca2e2f309c857f10511be2b2c1948cc Author: Yanbo Liang Date: 2015-12-06T06:51:05Z [SPARK-12044][SPARKR] Fix usage of isnan, isNaN 1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. 3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```. cc shivaram sun-rui felixcheung Author: Yanbo Liang Closes #10037 from yanboliang/spark-12044. (cherry picked from commit b6e8e63a0dbe471187a146c96fdaddc6b8a8e55e) Signed-off-by: Shivaram Venkataraman commit c8747a9db718deefa5f61cc4dc692c439d4d5ab6 Author: gcc Date: 2015-12-06T16:27:40Z [SPARK-12048][SQL] Prevent to close JDBC resources twice Author: gcc Closes #10101 from rh99/master. (cherry picked from commit 04b6799932707f0a4aa4da0f2fc838bdb29794ce) Signed-off-by: Sean Owen commit 82a71aba043a0b1ed50168d2b5b312c79b8c8fa3 Author: gatorsmile Date: 2015-12-06T19:15:02Z [SPARK-12138][SQL] Escape \u in the generated comments of codegen When \u appears in a comment block (i.e. in /**/), code gen will break. So, in Expression and CodegenFallback, we escape \u to \\u. yhuai Please review it. I did reproduce it and it works after the fix. Thanks! Author: gatorsmile Closes #10155 from gatorsmile/escapeU. (cherry picked from commit 49efd03bacad6060d99ed5e2fe53ba3df1d1317e) Signed-off-by: Yin Huai commit c54b698ecc284bce9b80c40ba46008bd6321c812 Author: Burak Yavuz Date: 2015-12-07T08:21:55Z [SPARK-12106][STREAMING][FLAKY-TEST] BatchedWAL test transiently flaky when Jenkins load is high We need to make sure that the last entry is indeed the last entry in the queue. Author: Burak Yavuz Closes #10110 from brkyvz/batch-wal-test-fix. (cherry picked from commit 6fd9e70e3ed43836a0685507fff9949f921234f4) Signed-off-by: Tathagata Das commit 3f230f7b331cf6d67426cece570af3f1340f526e Author: Sun Rui Date: 2015-12-07T18:38:17Z [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui Closes #10030 from sun-rui/SPARK-12034. (cherry picked from commit 39d677c8f1ee7ebd7e142bec0415cf8f90ac84b6) Signed-off-by: Shivaram Venkataraman commit fed453821d81470b9035d33e36fa6ef1df99c0de Author: Davies Liu Date: 2015-12-07T19:00:25Z [SPARK-12132] [PYSPARK] raise KeyboardInterrupt inside SIGINT handler Currently, the current line is not cleared by Cltr-C After this patch ``` >>> asdfasdf^C
[GitHub] spark pull request: [SPARK-13357][SQL] Use generated projection an...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11230#issuecomment-185025967 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185024830 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11232#issuecomment-185024358 **[Test build #51405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51405/consoleFull)** for PR 11232 at commit [`3f90749`](https://github.com/apache/spark/commit/3f90749148113069c312d5a03b09a67b054e5620). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185022716 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51404/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11231#issuecomment-185022715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10757#discussion_r53120467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -130,12 +134,17 @@ case class GetStructField(child: Expression, ordinal: Int, name: Option[String] } }) } +} + +case class PrettyGetStructField(child: Expression, name: String, dataType: DataType) + extends UnaryExpression with Unevaluable { - override def sql: String = child.sql + s".`${childSchema(ordinal).name}`" + override def sql: String = s"${child.sql}.$name" --- End diff -- what if we use `transformUp` in `usePrettyExpression`, so that we can make sure when we generate this string, `child.sql` won't have back-ticks/double-quotes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/11232 [SPARK-1] [SQL] Added Rand and Randn Functions Generating Deterministic Results So far, `rand` and `randn` functions with a `seed` argument are commonly used. Based on the common sense, the results of `rand` and `randn` should be deterministic if the `seed` parameter value is provided. However, the current solution is unable to generate deterministic results. It depends on data partitioning and task scheduling. An example has been given by @jkbradley in the following JIRA: https://issues.apache.org/jira/browse/SPARK-1 This PR is to introduce a new parameter `deterministic` for `Rand` and `Randn` functions. When users set it true, the results will be deterministic. **Question:** should we introduce new parameter `deterministic` for `Rand` and `Randn` functions? Or just make the results deterministic when users input the parameter value of `seed`? @rxin @marmbrus @cloud-fan @jkbradley You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark randSeed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11232.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11232 commit 3f90749148113069c312d5a03b09a67b054e5620 Author: gatorsmileDate: 2016-02-17T04:28:05Z added a random function that can generate deterministic results --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10757#discussion_r53120276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -198,18 +208,26 @@ case class GetArrayStructFields( } } +case class PrettyGetArrayStructFields( +child: Expression, ordinal: Int, name: String, dataType: DataType) + extends UnaryExpression with Unevaluable { + + override def sql: String = s"${child.sql}[$ordinal].$name" --- End diff -- should be `s"${child.sql}.$name"`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11231#discussion_r53120113 --- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala --- @@ -93,7 +93,8 @@ private[spark] object Benchmark { if (SystemUtils.IS_OS_MAC_OSX) { Utils.executeAndGetOutput(Seq("/usr/sbin/sysctl", "-n", "machdep.cpu.brand_string")) } else if (SystemUtils.IS_OS_LINUX) { - Utils.executeAndGetOutput(Seq("/usr/bin/grep", "-m", "1", "\"model name\"", "/proc/cpuinfo")) + val grepPath = Utils.executeAndGetOutput(Seq("which", "grep")) + Utils.executeAndGetOutput(Seq(grepPath, "-m", "1", "model name", "/proc/cpuinfo")) --- End diff -- "\"model name\"" doesn't work on my linux. "model name" works. Don't know if it is special case or general. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10759 [MLlib] Add python example to mode...
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/11202#issuecomment-185016426 One question is whether we should be using `{% include_example %}` when adding new examples to the documentation. We could separate it out into different PRs, but then we are duplicating code (this example already exists [here](https://github.com/apache/spark/blob/master/examples/src/main/python/ml/cross_validator.py)). @yinxusen could you advise? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13117][Web UI] WebUI should use the loc...
Github user devaraj-kavali commented on the pull request: https://github.com/apache/spark/pull/11133#issuecomment-185016457 @srowen I am investigating it, will update. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13358][SQL] Retrieve grep path when do ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/11231 [SPARK-13358][SQL] Retrieve grep path when do benchmark JIRA: https://issues.apache.org/jira/browse/SPARK-13358 When trying to run a benchmark, I found that on my Ubuntu linux grep is not in /usr/bin/ but /bin/. So wondering if it is better to use which to retrieve grep path. cc @davies You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 benchmark-grep-path Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11231 commit cacd65266d9f8708fe195c1708450e37c2905c49 Author: Liang-Chi HsiehDate: 2016-02-17T04:30:21Z Use which to retrieve grep path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/11178#issuecomment-185015502 Looks good !, I have taken a quick look and did not actually ran it. Hoping the tests will ensure that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10757#discussion_r53119765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala --- @@ -130,12 +134,17 @@ case class GetStructField(child: Expression, ordinal: Int, name: Option[String] } }) } +} + +case class PrettyGetStructField(child: Expression, name: String, dataType: DataType) + extends UnaryExpression with Unevaluable { - override def sql: String = child.sql + s".`${childSchema(ordinal).name}`" + override def sql: String = s"${child.sql}.$name" --- End diff -- No, `child.sql` may contain back-ticks/double-quotes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/11178#discussion_r53119771 --- Diff: dev/run-tests.py --- @@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version): # Enable all of the profiles for the build: build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags sbt_goals = ["package", - "assembly/assembly", --- End diff -- Understood, for tests assembly is no longer needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10757#discussion_r53119738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala --- @@ -310,10 +311,20 @@ case class AttributeReference( * A place holder used when printing expressions without debugging information such as the * expression id or the unresolved indicator. */ -case class PrettyAttribute(name: String, dataType: DataType = NullType) +case class PrettyAttribute( +name: String, +dataType: DataType = NullType, +override val foldable: Boolean = false) --- End diff -- Oh, this is an unexpected change. I should revert it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/11178#discussion_r53119446 --- Diff: dev/run-tests.py --- @@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version): # Enable all of the profiles for the build: build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags sbt_goals = ["package", - "assembly/assembly", --- End diff -- See https://issues.apache.org/jira/browse/SPARK-9284 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185011892 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185011896 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51402/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185011817 **[Test build #51402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51402/consoleFull)** for PR 11229 at commit [`acfba66`](https://github.com/apache/spark/commit/acfba66071f95c11847535995c4ce574b345549c). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Don't build full...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/11178#discussion_r53119369 --- Diff: dev/run-tests.py --- @@ -336,7 +336,6 @@ def build_spark_sbt(hadoop_version): # Enable all of the profiles for the build: build_profiles = get_hadoop_profiles(hadoop_version) + modules.root.build_profile_flags sbt_goals = ["package", - "assembly/assembly", --- End diff -- A full assembly is no longer needed ?, how do you configure classpath ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13001] [CORE] [MESOS] Prevent getting o...
Github user sebastienrainville commented on the pull request: https://github.com/apache/spark/pull/10924#issuecomment-185010055 I'm not sure we want to use only one rejection delay setting in these 2 cases. Arguably we could reject offers for a much longer period of time for `unmet constraints` since AFAIK constraints don't change dynamically and therefore are true for the lifetime of a framework. It's a bit different with `reached max cores` because if we lose an executor we want the scheduler to launch a new one and ideally not have to wait for too long for it. I put the same default delay of 120s for both since it seems to be a reasonable value. And for the fine-grained mode, there's no reason to not add the same logic. I'll do the change and test it. Unfortunately, the example function `declineOffer` cannot be reused there because it relies on local variables declared inside the loop. It really feels like this code needs some refactoring. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12799] Simplify various string output f...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10757#issuecomment-185007809 My last concern is changing default column alias `_c0, _c1, ..` to `Expression.sql`. Before this PR, query plans generated by SQL string will use `_c0, _c1, ..`, query plans generated by DataFrame will use `prettyString`. Now we always use `Expression.sql`, this is more consistent, but will break some old code, e.g. `select _c1 from (select a + 1, b + 2 from tbl) t `, does it worth? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11101#discussion_r53118993 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala --- @@ -123,6 +126,12 @@ class JobGenerator(jobScheduler: JobScheduler) extends Logging { timedOut } + // generate one more bacth to make sure RDD in lastJob is checkpointed. + if (!jobScheduler.receiverTracker.hasUnallocatedBlocks && +ssc.graph.isCheckpointMissedLastTime) { +Thread.sleep(ssc.graph.batchDuration.milliseconds) --- End diff -- What about if the last batch execution takes longer time than the batchDuration? Probably we need a notification with a timeout? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13357][SQL] Use generated projection an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11230#issuecomment-185006799 **[Test build #51403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51403/consoleFull)** for PR 11230 at commit [`63833df`](https://github.com/apache/spark/commit/63833dfdf4418d69a94a2c34aa91c06796606697). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13222][Streaming][WIP]make sure latest ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/11101#discussion_r53118838 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala --- @@ -490,6 +494,19 @@ abstract class DStream[T: ClassTag] ( logDebug("Cleared checkpoint data") } + private[streaming] def readyToShutdown(): Unit = { +_readyToShutdown = true +dependencies.foreach(_.readyToShutdown()) +logDebug("Ready to shutdown") --- End diff -- yes I know, I mean you have one https://github.com/apache/spark/pull/11101/files#diff-221bc6301915f7a476786c794b855b21R105 already which print out exactly the same log, probably we can add more information in this log? for example: creationSite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13220][Core]deprecate yarn-client and y...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11229#issuecomment-185006361 **[Test build #51402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51402/consoleFull)** for PR 11229 at commit [`acfba66`](https://github.com/apache/spark/commit/acfba66071f95c11847535995c4ce574b345549c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13302][PYSPARK][TESTS] Move the temp fi...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/11197#issuecomment-185006188 or @davies or @jkbradley maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org