[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57896058 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21286/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-57896055 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21286/consoleFull) for PR 2647 at commit [`5fc1259`](https://github.com/apache/spark/commit/5fc12597afe5964c7b9f688fd2919426b928b3ec). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example
Github user rezazadeh commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57896484 Parameters are now configurable. Added approximation error reporting. Added JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57896544 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21287/consoleFull) for PR 2622 at commit [`eca3dfd`](https://github.com/apache/spark/commit/eca3dfd62c1ce3643ef03b44f79c3e840b27a390). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3525] Adding gradient boosting
Github user epahomov closed the pull request at: https://github.com/apache/spark/pull/2394 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57897337 @davis Thanks for all the suggestions, really makes things a lot cleaner! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57897374 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21288/consoleFull) for PR 2576 at commit [`f928657`](https://github.com/apache/spark/commit/f92865707782387bb59c2c66a73d68eb8b030fa8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57897375 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21289/consoleFull) for PR 2563 at commit [`54c46ce`](https://github.com/apache/spark/commit/54c46ce607c521df4bea390d3cac7d42a6f006f8). * This patch **does not** merge cleanly! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57897385 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21288/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57897538 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21291/consoleFull) for PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57897540 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21290/consoleFull) for PR 2576 at commit [`1505af4`](https://github.com/apache/spark/commit/1505af48becdca9f17d84d9c7a0ef7f03dbc4e8a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57897553 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21290/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57897798 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21287/consoleFull) for PR 2622 at commit [`eca3dfd`](https://github.com/apache/spark/commit/eca3dfd62c1ce3643ef03b44f79c3e840b27a390). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3790][MLlib] CosineSimilarity Example
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2622#issuecomment-57897801 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21287/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57898946 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21291/consoleFull) for PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57898948 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21291/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57899622 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21289/consoleFull)** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57899624 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21289/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3713][SQL] Uses JSON to serialize DataT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2563#issuecomment-57900055 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/265/consoleFull) for PR 2563 at commit [`785b683`](https://github.com/apache/spark/commit/785b6834e4f0ea24a3b5be4c55d675b8687b12c9). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-57902212 Rebased to the master, with the new `CACHE LAZY TABLE t` syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-57902286 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21292/consoleFull) for PR 2513 at commit [`fe92287`](https://github.com/apache/spark/commit/fe922870ec9b16d053621d37ddb847a89502087c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57902624 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21293/consoleFull) for PR 2576 at commit [`1db30b1`](https://github.com/apache/spark/commit/1db30b1d9e9b24fb5bd0933855b65f99f8ae715b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57902670 Hi, @liancheng, master branch test failed in my machine for all dynamic partition , [info] - dynamic_partition *** FAILED *** [info] - Dynamic partition folder layout *** FAILED *** [info] - dynamic_partition_skip_default *** FAILED *** [info] - load_dyn_part1 *** FAILED *** [info] - load_dyn_part10 *** FAILED *** [info] - load_dyn_part11 *** FAILED *** [info] - load_dyn_part12 *** FAILED *** [info] - load_dyn_part13 *** FAILED *** [info] - load_dyn_part14 *** FAILED *** [info] - load_dyn_part14_win *** FAILED *** [info] - load_dyn_part2 *** FAILED *** [info] - load_dyn_part3 *** FAILED *** [info] - load_dyn_part4 *** FAILED *** [info] - load_dyn_part5 *** FAILED *** [info] - load_dyn_part6 *** FAILED *** [info] - load_dyn_part8 *** FAILED *** [info] - load_dyn_part9 *** FAILED *** [info] *** 17 TESTS FAILED *** Detail log--- [info] - dynamic_partition *** FAILED *** [info] Failed to execute query using catalyst: [info] Error: get partition: Value for key partcol1 is null or empty [info] org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for key partcol1 is null or empty [info] at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1585) [info] at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1556) [info] at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1189) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2652 [SPARK-3792][SQL]enable JavaHiveQLSuite Do not use TestSQLContext in JavaHiveQLSuite, that may lead to two SparkContext and enable JavaHiveQLSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark fix-JavaHiveQLSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2652.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2652 commit be35c919f2e4701775a69c1ce09831cb205037de Author: scwf wangf...@huawei.com Date: 2014-10-04T07:18:39Z enable JavaHiveQLSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2652#issuecomment-57903264 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-57903715 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21292/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2513#issuecomment-57903713 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21292/consoleFull) for PR 2513 at commit [`fe92287`](https://github.com/apache/spark/commit/fe922870ec9b16d053621d37ddb847a89502087c). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableCommand(tableName: String, plan: Option[LogicalPlan], isLazy: Boolean)` * `case class UncacheTableCommand(tableName: String) extends Command` * `case class CacheTableCommand(` * `case class UncacheTableCommand(tableName: String) extends LeafNode with Command ` * `case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2576#issuecomment-57905623 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21293/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750] support https in spark web ui
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1980#issuecomment-57905663 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21294/consoleFull) for PR 1980 at commit [`baaa1ce`](https://github.com/apache/spark/commit/baaa1ce05fcb426de7d3002a5cc30f18ae119d34). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/1297#issuecomment-57905819 This looks really interesting. Is there a blocker for supporting generic keys (or at least say `String`), or is that a performance issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57905979 Can we use the existing `pairRDDToPython ` function? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala#L120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3773][PySpark][Doc] Sphinx build warnin...
GitHub user cocoatomo opened a pull request: https://github.com/apache/spark/pull/2653 [SPARK-3773][PySpark][Doc] Sphinx build warning When building Sphinx documents for PySpark, we have 12 warnings. Their causes are almost docstrings in broken ReST format. To reproduce this issue, we should run following commands on the commit: 6e27cb630de69fa5acb510b4e2f6b980742b1957. ```bash $ cd ./python/docs $ make clean html ... /Users/user/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.SparkContext.sequenceFile:4: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/__init__.py:docstring of pyspark.RDD.saveAsSequenceFile:4: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:14: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.LogisticRegressionWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:14: ERROR: Unexpected indentation. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:16: WARNING: Definition list ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/classification.py:docstring of pyspark.mllib.classification.SVMWithSGD.train:17: WARNING: Block quote ends without a blank line; unexpected unindent. /Users/user/MyRepos/Scala/spark/python/docs/pyspark.mllib.rst:50: WARNING: missing attribute mentioned in :members: or __all__: module pyspark.mllib.regression, attribute RidgeRegressionModelLinearRegressionWithSGD /Users/user/MyRepos/Scala/spark/python/pyspark/mllib/tree.py:docstring of pyspark.mllib.tree.DecisionTreeModel.predict:3: ERROR: Unexpected indentation. ... checking consistency... /Users/user/MyRepos/Scala/spark/python/docs/modules.rst:: WARNING: document isn't included in any toctree ... copying static files... WARNING: html_static_path entry u'/Users/user/MyRepos/Scala/spark/python/docs/_static' does not exist ... build succeeded, 12 warnings. ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/cocoatomo/spark issues/3773-sphinx-build-warnings Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2653 commit 6f656618d7a6fe3f9977f6a1fb15350577388f06 Author: cocoatomo cocoatom...@gmail.com Date: 2014-10-04T14:07:20Z [SPARK-3773][PySpark][Doc] Sphinx build warning Remove all warnings on document building --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3773][PySpark][Doc] Sphinx build warnin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2653#issuecomment-57906805 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18429655 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ +context.runSqlHive(CREATE DATABASE default) +context.runSqlHive(USE default) + } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd, conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +HiveDecimal.create(bd) + } + + /* + * This function in hive-0.13 become private, but we have to do this to walkaround hive bug + */ + private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { +val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, ) +val result: StringBuilder = new StringBuilder(old) +var first: Boolean = old.isEmpty + +for (col - cols) { + if (first) { +first = false + } + else { +result.append(',') + } + result.append(col) +} +conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) + } + + /* + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + */ + def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { +if (ids != null ids.size 0) { + ColumnProjectionUtils.appendReadColumns(conf, ids) +} +appendReadColumnNames(conf, names) + } + + def getExternalTmpPath(context: Context, path: Path) = { +context.getExternalTmpPath(path.toUri) + } + + def getDataLocationPath(p: Partition) = p.getDataLocation + + def getAllPartitionsOf(client: Hive, tbl: Table) = client.getAllPartitionsOf(tbl) + + /* + * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not. + * Fix it through wrapper. --- End diff -- I am pretty confused about it. I think Hive needs to serialize FileSinkDesc when the query plan
[GitHub] spark pull request: [SPARK-2750] support https in spark web ui
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1980#issuecomment-57907416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21294/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750] support https in spark web ui
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1980#issuecomment-57907414 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21294/consoleFull) for PR 1980 at commit [`baaa1ce`](https://github.com/apache/spark/commit/baaa1ce05fcb426de7d3002a5cc30f18ae119d34). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57907458 @scwf Can you elaborate on what configurations you're using? Details like compilation flags, environment variables and building process can be helpful. I've been tracking this failure during the last a few days but couldn't reproduce it either locally or on Jenkins PR builder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2513#discussion_r18429700 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -91,42 +92,42 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val IN = Keyword(IN) protected val INNER = Keyword(INNER) protected val INSERT = Keyword(INSERT) + protected val INTERSECT = Keyword(INTERSECT) protected val INTO = Keyword(INTO) protected val IS = Keyword(IS) protected val JOIN = Keyword(JOIN) + protected val LAST = Keyword(LAST) + protected val LAZY = Keyword(LAZY) protected val LEFT = Keyword(LEFT) + protected val LIKE = Keyword(LIKE) protected val LIMIT = Keyword(LIMIT) + protected val LOWER = Keyword(LOWER) protected val MAX = Keyword(MAX) protected val MIN = Keyword(MIN) protected val NOT = Keyword(NOT) protected val NULL = Keyword(NULL) protected val ON = Keyword(ON) protected val OR = Keyword(OR) --- End diff -- Added keyword `LAZY` and sorted all the keywords in alphabetical order here. This list was once sorted but broken later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3645][SQL] Makes table caching eager by...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2513#discussion_r18429702 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -91,42 +92,42 @@ class SqlParser extends StandardTokenParsers with PackratParsers { protected val IN = Keyword(IN) protected val INNER = Keyword(INNER) protected val INSERT = Keyword(INSERT) + protected val INTERSECT = Keyword(INTERSECT) protected val INTO = Keyword(INTO) protected val IS = Keyword(IS) protected val JOIN = Keyword(JOIN) + protected val LAST = Keyword(LAST) + protected val LAZY = Keyword(LAZY) protected val LEFT = Keyword(LEFT) + protected val LIKE = Keyword(LIKE) protected val LIMIT = Keyword(LIMIT) + protected val LOWER = Keyword(LOWER) protected val MAX = Keyword(MAX) protected val MIN = Keyword(MIN) protected val NOT = Keyword(NOT) protected val NULL = Keyword(NULL) protected val ON = Keyword(ON) protected val OR = Keyword(OR) - protected val OVERWRITE = Keyword(OVERWRITE) - protected val LIKE = Keyword(LIKE) - protected val RLIKE = Keyword(RLIKE) - protected val UPPER = Keyword(UPPER) - protected val LOWER = Keyword(LOWER) - protected val REGEXP = Keyword(REGEXP) protected val ORDER = Keyword(ORDER) protected val OUTER = Keyword(OUTER) + protected val OVERWRITE = Keyword(OVERWRITE) + protected val REGEXP = Keyword(REGEXP) protected val RIGHT = Keyword(RIGHT) + protected val RLIKE = Keyword(RLIKE) protected val SELECT = Keyword(SELECT) protected val SEMI = Keyword(SEMI) + protected val SQRT = Keyword(SQRT) protected val STRING = Keyword(STRING) + protected val SUBSTR = Keyword(SUBSTR) + protected val SUBSTRING = Keyword(SUBSTRING) protected val SUM = Keyword(SUM) protected val TABLE = Keyword(TABLE) protected val TIMESTAMP = Keyword(TIMESTAMP) protected val TRUE = Keyword(TRUE) protected val UNCACHE = Keyword(UNCACHE) protected val UNION = Keyword(UNION) + protected val UPPER = Keyword(UPPER) protected val WHERE = Keyword(WHERE) --- End diff -- Added keyword LAZY and sorted all the keywords in alphabetical order here. This list was once sorted but broken later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18429712 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ +context.runSqlHive(CREATE DATABASE default) +context.runSqlHive(USE default) + } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd, conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +HiveDecimal.create(bd) + } + + /* + * This function in hive-0.13 become private, but we have to do this to walkaround hive bug + */ + private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { +val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, ) +val result: StringBuilder = new StringBuilder(old) +var first: Boolean = old.isEmpty + +for (col - cols) { + if (first) { +first = false + } + else { +result.append(',') + } + result.append(col) +} +conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) + } + + /* + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + */ + def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { +if (ids != null ids.size 0) { + ColumnProjectionUtils.appendReadColumns(conf, ids) +} +appendReadColumnNames(conf, names) + } + + def getExternalTmpPath(context: Context, path: Path) = { +context.getExternalTmpPath(path.toUri) + } + + def getDataLocationPath(p: Partition) = p.getDataLocation + + def getAllPartitionsOf(client: Hive, tbl: Table) = client.getAllPartitionsOf(tbl) + + /* + * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not. + * Fix it through wrapper. + * */ + implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = { +var f = new
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57907663 @scwf Or could you please describe the steps to reproduce this failure from a newly checked out master branch? I guess once you can reproduce it, it happens deterministically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18429736 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ +context.runSqlHive(CREATE DATABASE default) +context.runSqlHive(USE default) + } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd, conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +HiveDecimal.create(bd) + } + + /* + * This function in hive-0.13 become private, but we have to do this to walkaround hive bug + */ + private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { +val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, ) +val result: StringBuilder = new StringBuilder(old) +var first: Boolean = old.isEmpty + +for (col - cols) { + if (first) { +first = false + } + else { +result.append(',') + } + result.append(col) +} +conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) + } + + /* + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + */ + def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { +if (ids != null ids.size 0) { + ColumnProjectionUtils.appendReadColumns(conf, ids) +} +appendReadColumnNames(conf, names) + } + + def getExternalTmpPath(context: Context, path: Path) = { +context.getExternalTmpPath(path.toUri) + } + + def getDataLocationPath(p: Partition) = p.getDataLocation + + def getAllPartitionsOf(client: Hive, tbl: Table) = client.getAllPartitionsOf(tbl) + + /* + * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its member path is not. + * Fix it through wrapper. + * */ + implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = { --- End diff -- If we
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57907852 Ah, just found out that I can reproduce it with `-Phive`, had been using `-Phive,hadoop-2.4` all the time and just couldn't reproduce this, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57908014 Yes, i will use -Phive,hadoop-2.4 to see whether it has the peoblem --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18429781 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ +context.runSqlHive(CREATE DATABASE default) +context.runSqlHive(USE default) + } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd, conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +HiveDecimal.create(bd) + } + + /* + * This function in hive-0.13 become private, but we have to do this to walkaround hive bug + */ + private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { +val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, ) +val result: StringBuilder = new StringBuilder(old) +var first: Boolean = old.isEmpty + +for (col - cols) { + if (first) { +first = false + } + else { --- End diff -- ``` if () { ... } else { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18429783 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ +context.runSqlHive(CREATE DATABASE default) +context.runSqlHive(USE default) + } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd, conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +HiveDecimal.create(bd) + } + + /* + * This function in hive-0.13 become private, but we have to do this to walkaround hive bug + */ + private def appendReadColumnNames(conf: Configuration, cols: Seq[String]) { +val old: String = conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, ) +val result: StringBuilder = new StringBuilder(old) +var first: Boolean = old.isEmpty + +for (col - cols) { + if (first) { +first = false + } + else { +result.append(',') + } + result.append(col) +} +conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, result.toString) + } + + /* + * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids is null or empty + */ + def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { +if (ids != null ids.size 0) { + ColumnProjectionUtils.appendReadColumns(conf, ids) +} +appendReadColumnNames(conf, names) --- End diff -- Why no null and empty check at here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2654#issuecomment-57908521 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/2654 [Minor] Trivial fix to make codes more readable It should just use `maxResults` there. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 trivial_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2654.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2654 commit 13622893af0a67e21d149792aaee47fcdaf427ca Author: Liang-Chi Hsieh vii...@gmail.com Date: 2014-10-04T15:07:09Z Trivial fix to make codes more readable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3793][SQL]use hiveconf when parse hive ...
GitHub user scwf opened a pull request: https://github.com/apache/spark/pull/2655 [SPARK-3793][SQL]use hiveconf when parse hive ql This PR is to make hive ql parser more general and compatible with both hive-0.12 and hive-0.13. In hive-0.13 we may need hiveconf(or hivecontext) when parsing a sql(quoted sql). For example, when runing sql as follow without hiveconf will get NPE exception: createQueryTest(quoted alias.attr, SELECT `a`.`key` FROM src a ORDER BY key LIMIT 1) [info] - quoted alias.attr *** FAILED *** [info] org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: SELECT `a`.`key` FROM src a ORDER BY key LIMIT 1 [info] at org.apache.spark.sql.hive.HiveQl$.parseSql(HiveQl.scala:221) [info] at org.apache.spark.sql.hive.test.TestHiveContext$HiveQLQueryExecution.logical$lzycompute(TestHive.scala:143) [info] at org.apache.spark.sql.hive.test.TestHiveContext$HiveQLQueryExecution.logical(TestHive.scala:143) [info] at org.apache.spark.sql.hive.test.TestHiveContext$QueryExecution.analyzed$lzycompute(TestHive.scala:153) [info] at org.apache.spark.sql.hive.test.TestHiveContext$QueryExecution.analyzed(TestHive.scala:152) [info] at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan$lzycompute(SQLContext.scala:403) [info] at org.apache.spark.sql.SQLContext$QueryExecution.optimizedPlan(SQLContext.scala:403) [info] at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:407) [info] at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:405) [info] at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:411) [info] ... [info] Cause: java.lang.NullPointerException: [info] at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java:1295) [info] at org.apache.hadoop.hive.ql.parse.HiveLexer.allowQuotedId(HiveLexer.java:342) [info] at org.apache.hadoop.hive.ql.parse.HiveLexer$DFA21.specialStateTransition(HiveLexer.java:10945) [info] at org.antlr.runtime.DFA.predict(DFA.java:80) [info] at org.apache.hadoop.hive.ql.parse.HiveLexer.mIdentifier(HiveLexer.java:7925) [info] at org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:10818) [info] at org.antlr.runtime.Lexer.nextToken(Lexer.java:89) [info] at org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133) [info] at org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127) [info] at org.antlr.runtime.CommonTokenStream.consume(CommonTokenStream.java:70) You can merge this pull request into a Git repository by running: $ git pull https://github.com/scwf/spark addconf-to-hiveql Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2655.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2655 commit 51e61621469392c3b357781230ef2909cf98b7a8 Author: scwf wangf...@huawei.com Date: 2014-10-04T15:09:18Z add hiveconf when parse hive ql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57908927 using -Phive,hadoop-2.4 is ok in my local maching --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3793][SQL]use hiveconf when parse hive ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2655#issuecomment-57908961 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750] support https in spark web ui
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/1980#issuecomment-57909017 Updated and fix the confilct, @JoshRosen you can have a test for this refer to my last comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3758] [Windows] Wrong EOL character in ...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2612#issuecomment-57909026 I'm worried since most of us develop on mac/linux that this will end up in a weird state where there are mixed EOL characters. I'd worry about this, too. If one of us edits one of these files on a Mac, we might unknowingly recreate the issue this PR is trying to fix. Does it make sense to add some kind of style check to `dev/run-tests` that validates that all `.cmd` files use Windows-style newlines? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57909183 So this bug can be triggered by lower versions of Hadoop, e.g. 1.0.3. I haven't validate the exact range yet. In `Hive.loadDynamicPartitions`, Hive calls `o.a.h.h.q.e.Utilities.getFIleStatusRecurse` to glob the temporary directory for data files, it seems that lower versions of Hadoop doesn't filter out files like `_SUCCESS`, which causes the problem. Within Hive, `loadDynamicPartitions` is only used in operations like `LOAD`. At the end of a normal insertion to a dynamically partitioned table, `FileSinkOperator` calls `Utilities.mvFileToFinalPath` to move the entire temporary directory to target location, thus doesn't have this problem. `Utilities.mvFileToFinalPath` is more efficient than `Hive.loadDynamicPartitions` since it doesn't parses and validates partition specs. But it requires some internal Hive data structures like `DynamicPartitionCtx`. I'll try to see whether I can mock these data structures and use `mvFileToFinalPath` instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-57909187 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21295/consoleFull) for PR 2642 at commit [`37945af`](https://github.com/apache/spark/commit/37945af9defd4dbf450f1391ca621b9c4c63030f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57909238 @scwf Thanks for all the information you provided offline :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-57909316 As horizontal space is precious for including more metrics, might it make sense to combine Address / Executor and Executor ID into a single Executor column, with values like 1 / 10.37.129.2. Agree. I updated to put them into one column. I use `host` because I feel `Address` should include the `port` number but here we only have `host`. The new screenshot is as follow: ![executor_id_host](https://cloud.githubusercontent.com/assets/1000778/4515998/f84fdc82-4bdb-11e4-81a3-659cd28a4b43.png) Also, is including the port still worthwhile now that we have the ID? `TaskInfo` does not have a `port` field, and I cannot find an easy way to add it. However, I think `Executor ID` is enough. If the executor id is provided, I can use `ps -ef | grep spark | grep executor_id ` to find the process id. Looks `port` cannot help me find the process id more easily. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57909423 According to previous failed Jenkins builds ([1](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/752/), [2](https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/753/), etc.), Hadoop 1.0.3 and 2.0 are vulnerable, 2.2 and above are OK. That explains why this PR together with #2226 always passes Jenkins -- the PR builder uses Hadoop 2.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3007][SQL] Adds dynamic partitioning su...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2616#issuecomment-57909534 Get it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430042 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.net.URI +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.`type`.HiveDecimal +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors._ +import org.apache.hadoop.hive.ql.stats.StatsSetupConst +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils} +import org.apache.hadoop.{io = hadoopIo} +import org.apache.hadoop.mapred.InputFormat +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.12.0. + */ +private[hive] object HiveShim { + val version = 0.12.0 + val metastoreDecimal = decimal + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties) --- End diff -- Is it necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-57911257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21295/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-57911254 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21295/consoleFull) for PR 2642 at commit [`37945af`](https://github.com/apache/spark/commit/37945af9defd4dbf450f1391ca621b9c4c63030f). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430235 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala --- @@ -369,6 +371,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) { * tests. */ protected val originalUdfs: JavaSet[String] = FunctionRegistry.getFunctionNames + HiveShim.createDefaultDBIfNeeded(this) --- End diff -- Can you add a comment at here to explain why it is necessary? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430262 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala --- @@ -78,6 +79,7 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) { // For some hive test case which contain ${system:test.tmp.dir} System.setProperty(test.tmp.dir, testTempDir.getCanonicalPath) + CommandProcessorFactory.clean(hiveconf) --- End diff -- Since it is a cleanup work, seems it is better to be placed after `System.clearProperty(spark.hostPort)`. Also, please add comment about what this call is doing and why it is needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430322 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -80,8 +81,10 @@ class StatisticsSuite extends QueryTest with BeforeAndAfterAll { sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect() sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect() -assert(queryTotalSize(analyzeTable) === defaultSizeInBytes) - +// TODO: How it works? needs to add it back for other hive version. +if (HiveShim.version ==0.12.0) { --- End diff -- For Hive 0.13, will table always be updated after `INSERT INTO`? When we added this test, the table size was not updated with the `INSERT INTO` command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430463 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) --- End diff -- Can you double check it? I am not sure DECIMAL in hive-0.12 is actually DECIMAL(10,0). From the code, seems precision is unbounded. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430480 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -212,7 +214,18 @@ private[hive] object HiveQl { /** * Returns the AST for the given SQL string. */ - def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new ParseDriver).parse(sql)) + def getAst(sql: String): ASTNode = { +/* + * Context has to be passed in in hive0.13.1. --- End diff -- in in --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430557 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) + * Full support of new decimal feature need to be fixed in seperate PR. + */ + val metastoreDecimal = decimal(10,0) --- End diff -- Let's say we connect to a existing hive 0.13 metastore. If there is a decimal column with a user-defined precision and scale, will we see parsing error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430590 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -557,11 +557,14 @@ class HiveQuerySuite extends HiveComparisonTest { |WITH serdeproperties('s1'='9') .stripMargin) } -sql(sADD JAR $testJar) -sql( - ALTER TABLE alter1 SET SERDE 'org.apache.hadoop.hive.serde2.TestSerDe' -|WITH serdeproperties('s1'='9') - .stripMargin) +// now only verify 0.12.0, and ignore other versions due to binary compatability --- End diff -- Can you explain it a little bit more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430600 --- Diff: sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.net.URI +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.`type`.HiveDecimal +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors._ +import org.apache.hadoop.hive.ql.stats.StatsSetupConst +import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils} +import org.apache.hadoop.{io = hadoopIo} +import org.apache.hadoop.mapred.InputFormat +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.12.0. + */ +private[hive] object HiveShim { + val version = 0.12.0 + val metastoreDecimal = decimal + + def getTableDesc( +serdeClass: Class[_ : Deserializer], +inputFormatClass: Class[_ : InputFormat[_, _]], +outputFormatClass: Class[_], +properties: Properties) = { +new TableDesc(serdeClass, inputFormatClass, outputFormatClass, properties) + } + + def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE + + def createDefaultDBIfNeeded(context: HiveContext) ={ } + + /** The string used to denote an empty comments field in the schema. */ + def getEmptyCommentsFieldValue = None + + def getCommandProcessor(cmd: Array[String], conf: HiveConf) = { +CommandProcessorFactory.get(cmd(0), conf) + } + + def createDecimal(bd: java.math.BigDecimal): HiveDecimal = { +new HiveDecimal(bd) + } + + def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: Seq[String]) { +ColumnProjectionUtils.appendReadColumnIDs(conf, ids) +ColumnProjectionUtils.appendReadColumnNames(conf, names) + } + + def getExternalTmpPath(context: Context, uri: URI): String = { --- End diff -- It will be good to make the return type consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18430657 --- Diff: sql/hive/pom.xml --- @@ -119,6 +83,74 @@ profiles profile + idhive-default/id + activation +property + name!hive.version/name --- End diff -- If we use modified hive dependencies, can we avoid this error and simplify pom changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/2241#discussion_r18431391 --- Diff: sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.util.Properties +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path +import org.apache.hadoop.hive.common.StatsSetupConst +import org.apache.hadoop.hive.common.`type`.{HiveDecimal} +import org.apache.hadoop.hive.conf.HiveConf +import org.apache.hadoop.hive.ql.Context +import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition} +import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc} +import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory +import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer} +import org.apache.hadoop.mapred.InputFormat +import org.apache.spark.Logging +import org.apache.hadoop.{io = hadoopIo} +import scala.collection.JavaConversions._ +import scala.language.implicitConversions + +/** + * A compatibility layer for interacting with Hive version 0.13.1. + */ +private[hive] object HiveShim { + val version = 0.13.1 + /* + * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is actually DECIMAL(10,0) --- End diff -- Yeah I think you are right, it is unbounded in Hive 12. Spark SQL also will use unbounded precision decimals internally, so when its not specified thats what we should assume. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/1297#issuecomment-57917807 @MLnick It's a slight performance issue, since we currently use PrimitiveKeyOpenHashMap which optimizes for primitive keys by avoiding null tracking, but I think the performance loss is worth it and I'm working on adding this ([SPARK-3668](https://issues.apache.org/jira/browse/SPARK-3668)). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3798][SQL] Store the output of a genera...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/2656 [SPARK-3798][SQL] Store the output of a generator in a val This prevents it from changing during serialization, leading to corrupted results. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark generateBug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2656.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2656 commit efa32eb7b2122a97c8ea309da9acdcccd462ec12 Author: Michael Armbrust mich...@databricks.com Date: 2014-10-04T21:26:25Z Store the output of a generator in a val. This prevents it from changing during serialization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Add type checking debugging functions
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/2657 [SQL] Add type checking debugging functions Adds some functions that were very useful when trying to track down the bug from #2656. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark debugging Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2657 commit 1d0c2da90fe605ae81267477d07704725d4ac132 Author: Michael Armbrust mich...@databricks.com Date: 2014-10-04T21:30:05Z Add typeChecking debugging functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2654#issuecomment-57919211 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3798][SQL] Store the output of a genera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2656#issuecomment-57919209 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21296/consoleFull) for PR 2656 at commit [`efa32eb`](https://github.com/apache/spark/commit/efa32eb7b2122a97c8ea309da9acdcccd462ec12). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2652#issuecomment-57919225 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57919277 @mengxr Ok, updated to address your suggestions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2654#issuecomment-57919347 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21299/consoleFull) for PR 2654 at commit [`1362289`](https://github.com/apache/spark/commit/13622893af0a67e21d149792aaee47fcdaf427ca). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Add type checking debugging functions
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2657#issuecomment-57919342 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21297/consoleFull) for PR 2657 at commit [`1d0c2da`](https://github.com/apache/spark/commit/1d0c2da90fe605ae81267477d07704725d4ac132). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57919346 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21300/consoleFull) for PR 2491 at commit [`e535d8b`](https://github.com/apache/spark/commit/e535d8b7f08fb848ee5687881cbfc6e4c9e798cd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2652#issuecomment-57919345 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21298/consoleFull) for PR 2652 at commit [`be35c91`](https://github.com/apache/spark/commit/be35c919f2e4701775a69c1ce09831cb205037de). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Add type checking debugging functions
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2657#issuecomment-57919377 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21297/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Add type checking debugging functions
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2657#issuecomment-57919376 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21297/consoleFull) for PR 2657 at commit [`1d0c2da`](https://github.com/apache/spark/commit/1d0c2da90fe605ae81267477d07704725d4ac132). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class TypeCheck(child: SparkPlan) extends SparkPlan ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2654#issuecomment-57920602 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21299/consoleFull) for PR 2654 at commit [`1362289`](https://github.com/apache/spark/commit/13622893af0a67e21d149792aaee47fcdaf427ca). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2652#issuecomment-57920595 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21298/consoleFull) for PR 2652 at commit [`be35c91`](https://github.com/apache/spark/commit/be35c919f2e4701775a69c1ce09831cb205037de). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3773][PySpark][Doc] Sphinx build warnin...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2653#issuecomment-57920629 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3798][SQL] Store the output of a genera...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/2656#issuecomment-57920689 Is there any way to add a unit test for this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57920705 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21300/consoleFull) for PR 2491 at commit [`e535d8b`](https://github.com/apache/spark/commit/e535d8b7f08fb848ee5687881cbfc6e4c9e798cd). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class NaiveBayesModel extends ClassificationModel with Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57920707 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21300/Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3773][PySpark][Doc] Sphinx build warnin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2653#issuecomment-57920747 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21301/consoleFull) for PR 2653 at commit [`6f65661`](https://github.com/apache/spark/commit/6f656618d7a6fe3f9977f6a1fb15350577388f06). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Trivial fix to make codes more readabl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2654#issuecomment-57920603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21299/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3792][SQL]enable JavaHiveQLSuite
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2652#issuecomment-57920598 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21298/Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3798][SQL] Store the output of a genera...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2656#issuecomment-57920778 Unfortunately, I haven't found a way to reproduce it deterministically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1655][MLLIB] Add option for distributed...
Github user staple commented on the pull request: https://github.com/apache/spark/pull/2491#issuecomment-57920855 Again, python tests failed because the python interface is disabled in order to focus on the scala implementation first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3597][Mesos] Implement `killTask`.
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2453#issuecomment-57921063 This looks good to me. I tested this patch against Mesos 0.20.1 running in Docker with a modified version of a test from Spark's JobCancellationSuite . @tnachen commented on this over at https://github.com/apache/spark/pull/1940#issuecomment-56246740: On Mesos side if you call killTask on a non-existing Task all you get is a LOG(WARNING). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3765][Doc] add testing with sbt to docs
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2629#issuecomment-57921350 @JoshRosen, can you test this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3765][Doc] add testing with sbt to docs
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2629#issuecomment-57921539 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3765][Doc] add testing with sbt to docs
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2629#issuecomment-57921605 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21302/consoleFull) for PR 2629 at commit [`fd9cf29`](https://github.com/apache/spark/commit/fd9cf297865e4e5bd5ba375b56094c68beb7287a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org