[GitHub] spark pull request: [SPARK-3781] code format and little improvemen...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2734#issuecomment-58849017 Hi @shijinkui, After some discussion, we've decided that we'd like to avoid merging pull requests that make large, sweeping style changes/improvements, since these changes tend to create maintenance headaches for us by making `git blame` less useful and creating merge-conflicts when backporting to maintenance branches. However, we'd be open to automatic style checks if they can be conditionally applied only to new / modified code (see https://issues.apache.org/jira/browse/SPARK-3849 for more details). In the meantime, do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3873] [build] Add style checker to enfo...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2757#issuecomment-58849258 Hi @vanzin, We'd like to avoid making large refactorings for style, since these changes tend to create merge-conflicts when backporting to maintenance branches and make `git blame` significantly less useful. However, we'd be open to automatic style checks if they can be enforced only for new code (see https://issues.apache.org/jira/browse/SPARK-3849 for more details). In the meantime, do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add echo Run streaming tests ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2778#issuecomment-58849321 LGTM; thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3854] Scala style: require spaces befor...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2761#issuecomment-58849280 Hi @sarutak, We'd like to avoid making large refactorings for style, since these changes tend to create merge-conflicts when backporting to maintenance branches and make git blame significantly less useful. However, we'd be open to automatic style checks if they can be enforced only for new code (see https://issues.apache.org/jira/browse/SPARK-3849 for more details). In the meantime, do you mind closing this pull request? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add echo Run streaming tests ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2778 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2725#issuecomment-58849438 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-58849802 @zhzhan @scwf - I think this should be okay now for protobuf. We made some other changes this week updating the protobuf version to be based on protobuf 2.5 instead of 2.4 in akka. So now throughout Spark we use this. Mind rebasing this? I think the protobuf issue will go away. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2749#issuecomment-58849985 Hmm, but #implementing-and-using-a-custom-actor-based-receiver is a not valid link, sorry, did not get you, can you explain more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2725#issuecomment-58850138 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21677/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-58850172 Ok, thanks for that, i will also test it in https://github.com/apache/spark/pull/2685 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2749#issuecomment-58850273 Oh, I meant that you could have linked the page like this, so that the link jumps to the Akka-specific section: https://spark.apache.org/docs/latest/streaming-custom-receivers.html#implementing-and-using-a-custom-actor-based-receiver. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug Fix: without unpersist method in RandomFor...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2775#issuecomment-58850468 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2725#issuecomment-58850691 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/354/consoleFull) for PR 2725 at commit [`f894ebd`](https://github.com/apache/spark/commit/f894ebd0b6799af4037134fadf6c515af09181fc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug Fix: without unpersist method in RandomFor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2775#issuecomment-58851060 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21678/consoleFull) for PR 2775 at commit [`815d543`](https://github.com/apache/spark/commit/815d543606efb0f90da8c5a1c87b3e12924d25a7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2749#issuecomment-58851126 Get it, use ```streaming-custom-receivers.html#implementing-and-using-a-custom-actor-based-receiver``` here jumps to the Akka-specific section:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3812] [BUILD] Adapt maven build to publ...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2673#issuecomment-58851300 @pwendell I don't see an easy way with maven shade plugin either ? Do you ?, One way is to include a fake dependency and then ask it to shade that across all artifacts. But I somehow felt this is more invasive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2749#issuecomment-58851418 Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2749#issuecomment-58851490 LGTM. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3899][Doc]fix wrong links in streaming ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2749 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3854] Scala style: require spaces befor...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2761#issuecomment-58852435 O.K. I close this PR for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/2779 [SPARK-3921] Fix CoarseGrainedExecutorBackend's arguments for Standalone mode The goal of this patch is to fix the swapped arguments in standalone mode, which was caused by https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153. More details can be found in the JIRA: [SPARK-3921](https://issues.apache.org/jira/browse/SPARK-3921) Tested in Standalone mode, but not in Mesos. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aarondav/spark fix-standalone Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2779.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2779 commit 9d703feb03b3201d73012cb1081b8d20d7ba4ac1 Author: Aaron Davidson aa...@databricks.com Date: 2014-10-13T06:26:46Z [SPARK-3921] Fix CoarseGrainedExecutorBackend's arguments for Standalone mode The goal of this patch is to fix the swapped arguments in standalone mode, which was caused by https://github.com/apache/spark/commit/79e45c9323455a51f25ed9acd0edd8682b4bbb88#diff-79391110e9f26657e415aa169a004998R153. More details can be found in the JIRA: [SPARK-3921](https://issues.apache.org/jira/browse/SPARK-3921) Tested in Standalone mode, but not in Mesos. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3854] Scala style: require spaces befor...
Github user sarutak closed the pull request at: https://github.com/apache/spark/pull/2761 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2780 [SPARK-3207][MLLIB]Choose splits for continuous features in DecisionTree more adaptively DecisionTree splits on continuous features by choosing an array of values from a subsample of the data. Currently, it does not check for identical values in the subsample, so it could end up having multiple copies of the same split. In this PR, we choose splits for a continuous feature in 3 steps: 1. Sort sample values for this feature 2. Get number of occurrence of each distinct value 3. Iterate the value count array computed in step 2 to choose splits. After find splits, `numSplits` and `numBins` in metadata will be updated. CC: @mengxr @manishamde @jkbradley, please help me review this, thanks. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chouqin/spark dt-findsplits Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2780 commit af7cb7962ff9f5041981ea5e4fe2465eceb6f0e5 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-09T11:47:09Z Choose splits for continuous features in DecisionTree more adaptively commit 365282375ce3d1a26664695893ebad13d1b3bc47 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-09T12:40:55Z fix bug commit 0cd744a4e710463591324b36f01d9dab028e79ef Author: liqi liqiping1...@gmail.com Date: 2014-10-10T04:33:24Z fix bug commit 1b25a3530f5429b245a50d4c706ebad2d2875726 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-11T01:36:38Z Merge branch 'master' of https://github.com/apache/spark into dt-findsplits commit 9e7138e09dfe27c41d8d20ba6fcf9cb59d64a46b Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T01:11:31Z Merge branch 'dt-findsplits' of https://github.com/chouqin/spark into dt-findsplits Conflicts: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala commit 8f46af6b57149fefd1e32120947ebe3291730af0 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T03:48:42Z add comments and unit test commit 369f812a9ffce7dd10fc37e4a937158f2fa93e1c Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T03:53:07Z fix style commit c339a614362f3045ee95975f99b6fde884657d48 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T04:31:23Z fix bug commit 2a8267ab9bd8853fa1f638b69373dbbbf0d1a329 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T04:43:44Z fix bug commit af6dc974258a9b07020e233e16cbbb584f501122 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T05:03:43Z fix bug commit ab303a4ab1931b0c1a90ae2c3923f25d8f266178 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T06:10:33Z fix bug commit f69f47f25f292995aa8710da6384bf631787711a Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T06:12:10Z fix bug commit 092efcb89c4113eba8374e47587c6f1272aa7125 Author: Qiping Li liqiping1...@gmail.com Date: 2014-10-13T06:31:58Z fix bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58853083 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58853271 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21680/consoleFull) for PR 2780 at commit [`092efcb`](https://github.com/apache/spark/commit/092efcb89c4113eba8374e47587c6f1272aa7125). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2779#issuecomment-58853441 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21679/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-58853725 Hi @zhzhan and @scwf - I made some changes to the build to simplify it a bit. I made a PR into your branch. I tested it locally compiling for 0.12 and 0.13, but it would be good if you tested it as well to make sure it works. https://github.com/zhzhan/spark/pull/1/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2779#issuecomment-58853884 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/2781 [Spark 3922] Refactor spark-core to use Utils.UTF_8 A global UTF8 constant is very helpful to handle encoding problems when converting between String and bytes. There are several solutions here: 1. Add `val UTF_8 = Charset.forName(UTF-8)` to Utils.scala 2. java.nio.charset.StandardCharsets.UTF_8 (require JDK7) 3. io.netty.util.CharsetUtil.UTF_8 4. com.google.common.base.Charsets.UTF_8 5. org.apache.commons.lang.CharEncoding.UTF_8 6. org.apache.commons.lang3.CharEncoding.UTF_8 IMO, I prefer option 1) because people can find it easily. This is a PR for option 1) and only fixes Spark Core. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-3922 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2781.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2781 commit 65b6b8ef68aa71ac45d292eefd7b3e4de0de3bf8 Author: zsxwing zsxw...@gmail.com Date: 2014-10-13T06:53:11Z Add UTF_8 to Utils commit 80f4af8812d3f36a3807e574478a10511916dfbc Author: zsxwing zsxw...@gmail.com Date: 2014-10-13T06:53:26Z Refactor spark-core to use Utils.UTF_8 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58854034 /cc @rxin, @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58854193 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2779#issuecomment-58854400 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21682/consoleFull) for PR 2779 at commit [`9d703fe`](https://github.com/apache/spark/commit/9d703feb03b3201d73012cb1081b8d20d7ba4ac1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58854409 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21681/consoleFull) for PR 2781 at commit [`80f4af8`](https://github.com/apache/spark/commit/80f4af8812d3f36a3807e574478a10511916dfbc). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2241#issuecomment-58854626 Note @scwf there are some TODO's in there that need to be addressed in your patch for JDBC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2753#discussion_r18755797 --- Diff: core/src/main/scala/org/apache/spark/network/netty/NettyBlockFetcher.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.netty + +import java.nio.ByteBuffer +import java.util + +import org.apache.spark.Logging +import org.apache.spark.network.BlockFetchingListener +import org.apache.spark.serializer.Serializer +import org.apache.spark.network.buffer.ManagedBuffer +import org.apache.spark.network.client.{RpcResponseCallback, ChunkReceivedCallback, SluiceClient} +import org.apache.spark.storage.BlockId +import org.apache.spark.util.Utils + +/** + * Responsible for holding the state for a request for a single set of blocks. This assumes that + * the chunks will be returned in the same order as requested, and that there will be exactly + * one chunk per block. + * + * Upon receipt of any block, the listener will be called back. Upon failure part way through, + * the listener will receive a failure callback for each outstanding block. + */ +class NettyBlockFetcher( +serializer: Serializer, +client: SluiceClient, +blockIds: Seq[String], +listener: BlockFetchingListener) + extends Logging { + + require(blockIds.nonEmpty) + + val ser = serializer.newInstance() + + var streamHandle: ShuffleStreamHandle = _ + + val chunkCallback = new ChunkReceivedCallback { +// On receipt of a chunk, pass it upwards as a block. +def onSuccess(chunkIndex: Int, buffer: ManagedBuffer): Unit = Utils.logUncaughtExceptions { + buffer.retain() + listener.onBlockFetchSuccess(blockIds(chunkIndex), buffer) +} + +// On receipt of a failure, fail every block from chunkIndex onwards. +def onFailure(chunkIndex: Int, e: Throwable): Unit = { + blockIds.drop(chunkIndex).foreach { blockId = +listener.onBlockFetchFailure(blockId, e); + } +} + } + + // Send the RPC to open the given set of blocks. This will return a ShuffleStreamHandle. + client.sendRpc(ser.serialize(OpenBlocks(blockIds.map(BlockId.apply))).array(), --- End diff -- does this even need to be a class on its own? if yes, maybe have a separate init method so we don't get a weird object ctor failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58855147 I vote for `com.google.common.base.Charsets.UTF_8` now, and `java.nio.charset.StandardCharsets.UTF_8` when Spark moves to Java 7+. No need to define this constant yet again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3869] ./bin/spark-class miss Java versi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2725#issuecomment-58855812 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/354/consoleFull) for PR 2725 at commit [`f894ebd`](https://github.com/apache/spark/commit/f894ebd0b6799af4037134fadf6c515af09181fc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug Fix: without unpersist method in RandomFor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2775#issuecomment-58856471 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21678/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2779#issuecomment-58856481 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Bug Fix: without unpersist method in RandomFor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2775#issuecomment-58856468 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21678/consoleFull) for PR 2775 at commit [`815d543`](https://github.com/apache/spark/commit/815d543606efb0f90da8c5a1c87b3e12924d25a7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-58856535 @pwendell, i am resolving the conflicts, other TODO's here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] topic modeling on Graphx
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-58856594 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21683/consoleFull) for PR 2388 at commit [`1e2485c`](https://github.com/apache/spark/commit/1e2485c05c77dbca4332b9af616c27c45f2f5e32). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58857589 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21680/consoleFull) for PR 2780 at commit [`092efcb`](https://github.com/apache/spark/commit/092efcb89c4113eba8374e47587c6f1272aa7125). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58859149 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21681/consoleFull) for PR 2781 at commit [`80f4af8`](https://github.com/apache/spark/commit/80f4af8812d3f36a3807e574478a10511916dfbc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 3922] Refactor spark-core to use Utils....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2781#issuecomment-58859154 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21681/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3921] Fix CoarseGrainedExecutorBackend'...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2779#issuecomment-58859175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21682/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-2426: Quadratic Minimizati...
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/2705#issuecomment-58862691 @chanda what's your problem formulation? min x'Hx + c'x s.t Ax = B You can write it as min x'Hx + c'x + g(z) s.t Ax = B + z g(z) here is indicator function that z = 0 Now we can solve this using QuadraticMinimizer.scala...Let me know if this formulation makes sense and I will point to the rest of the steps to you... By the way I am working on adding H as sparse matrix but it will take some time since we need LDL factorization and that's in ECOS code base...Once I make the ECOS jar available we should be able to use LDL from there... Is your matrix sparse since you keep sparse kernel for SVM and not all entries from RBF ? For now I will say use the dense formulation, partition your kernel matrix and solve a QP on each worker and then combine the results using treeAggregate on master... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] topic modeling on Graphx
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-58862844 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21683/consoleFull) for PR 2388 at commit [`1e2485c`](https://github.com/apache/spark/commit/1e2485c05c77dbca4332b9af616c27c45f2f5e32). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class TopicModelingKryoRegistrator extends KryoRegistrator ` * `class StreamingContext(object):` * `class DStream(object):` * `class TransformedDStream(DStream):` * `class TransformFunction(object):` * `class TransformFunctionSerializer(object):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] topic modeling on Graphx
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2388#issuecomment-58862848 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21683/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MLLIB] [WIP] SPARK-2426: Quadratic Minimizati...
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/2705#issuecomment-58863216 @Chanda breeze sparse matrix does not solve your problem since breeze does not have sparse LDL but the ECOS jar has the ldl and amd native libraries which we will use for sparse LDL... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865080 Jekins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865274 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21684/consoleFull) for PR 2780 at commit [`9e64699`](https://github.com/apache/spark/commit/9e64699f67e64424f877aea8fc1e6282e32c8595). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865777 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21685/consoleFull) for PR 2780 at commit [`9e64699`](https://github.com/apache/spark/commit/9e64699f67e64424f877aea8fc1e6282e32c8595). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58865951 @jkbradley, RandomForestSuite fails because original splits are better fit for the training data(for example, 899.5 is a split threshold, which is close to 900.) I think this PR's method to choose splits is more reasonable than the original method in that the first threshold found by the original method will be the average value of the first two `featureSamples`. For example, if `featureSamples` is `Array(0, 1, 2, 3, 4, 5)`, find a split point using the original method will return 0.5, while this PR's method will return 2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3814][SQL] Bitwise does not work in H...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2736#issuecomment-58872032 Thank you @scwf , I have created new PR since it has merge conflicts. It will not be neat If I rebase and push to old PR because it will show all changed files which are merged while rebasing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58872054 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21684/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58872046 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21684/consoleFull) for PR 2780 at commit [`9e64699`](https://github.com/apache/spark/commit/9e64699f67e64424f877aea8fc1e6282e32c8595). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58872500 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21685/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58872495 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21685/consoleFull) for PR 2780 at commit [`9e64699`](https://github.com/apache/spark/commit/9e64699f67e64424f877aea8fc1e6282e32c8595). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/2471#discussion_r18763243 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -214,6 +224,43 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis } --- End diff -- @vanzin sorry, i do not what you means. do you means that do not throw Throwable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58876511 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21686/consoleFull) for PR 2780 at commit [`d353596`](https://github.com/apache/spark/commit/d3535963cf69bf36e7b059f2c7fd6ee148892135). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2771#discussion_r18764783 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala --- @@ -99,6 +99,16 @@ private[hive] case class HiveSimpleUdf(functionClassName: String, children: Seq[ @transient protected lazy val arguments = children.map(c = toInspector(c.dataType)).toArray + @transient + protected lazy val isUDFDeterministic = { +val udfType = function.getClass().getAnnotation(classOf[HiveUDFType]) +(udfType != null udfType.deterministic()) --- End diff -- Nit: redundant parenthesis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3911] [SQL] HiveSimpleUdf can not be op...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2771#issuecomment-58881732 This LGTM. Would you mind to add some tests? Probably in `ExpressionOptimizationSuite`. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58883501 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21686/consoleFull) for PR 2780 at commit [`d353596`](https://github.com/apache/spark/commit/d3535963cf69bf36e7b059f2c7fd6ee148892135). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-58883509 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21686/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18765864 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala --- @@ -62,6 +62,16 @@ object HiveFromSpark { println(Result of SELECT *:) sql(SELECT * FROM records r JOIN src s ON r.key = s.key).collect().foreach(println) +// Write out an RDD as a orc file. +rdd.saveAsOrcFile(pair.orc) + +// Read in orc file. Orc files are self-describing so the schmema is preserved. +val orcFile = hiveContext.orcFile(pair.orc) + +// These files can also be registered as tables. +orcFile.registerTempTable(orcFile) +sql(SELECT * FROM records r JOIN orcFile s ON r.key = s.key).collect().foreach(println) + --- End diff -- I think test cases and documentation can be better places to illustrate the API usage. This example is used to illustrate how Spark SQL cooperates with Hive. And with this PR, we don't need Hive (Metastore) to access ORC files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18765873 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -114,6 +114,22 @@ case class InsertIntoTable( } } +case class InsertIntoOrcTable( +table: LogicalPlan, --- End diff -- Why is the type of `table` `LogicalPlan` rather than `OrcRelation`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766058 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -128,6 +144,13 @@ case class WriteToFile( override def output = child.output } +case class WriteToOrcFile( --- End diff -- I think we should rename the original `WriteToFile` class to `WriteToParquetFile`. That name was too general and rather confusing in the first place, and it becomes even more confusing after ORC is supported. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala --- @@ -77,6 +77,18 @@ private[sql] trait SchemaRDDLike { } /** + * Saves the contents of this `SchemaRDD` as a orc file, preserving the schema. Files that --- End diff -- Please use ORC instead of orc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3580] add 'partitions' property to PySp...
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2478#issuecomment-58884300 @JoshRosen @pwendell any further comment on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -504,19 +505,41 @@ private[parquet] object FileSystemHelper { fs.listStatus(path).map(_.getPath) } -/** - * Finds the maximum taskid in the output file names at the given path. - */ - def findMaxTaskId(pathStr: String, conf: Configuration): Int = { + /** + * List files with special extension + */ + def listFiles(origPath: Path, conf: Configuration, extension: String): Seq[Path] = { +val fs = origPath.getFileSystem(conf) +if (fs == null) { + throw new IllegalArgumentException( +sOrcTableOperations: Path $origPath is incorrectly formatted) --- End diff -- This helper class is not specific to ORC support, please reword the exception message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3818] Graph coarsening
Github user uncleGen commented on the pull request: https://github.com/apache/spark/pull/2679#issuecomment-58885039 @ankurdave I have some doubts, but not about this patch. In [GraphX OSDI paper](http://ankurdave.com/dl/graphx-osdi14.pdf) , I find that you have implemented a memory-based shuffle manager. But, I do not find it in any release. Do you have any concerns? Actuallyï¼ I am going to do and doing somethings about the memory-based shuffle manager. Please give me some adviceï¼ thank youï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766443 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -504,19 +505,41 @@ private[parquet] object FileSystemHelper { fs.listStatus(path).map(_.getPath) } -/** - * Finds the maximum taskid in the output file names at the given path. - */ - def findMaxTaskId(pathStr: String, conf: Configuration): Int = { + /** + * List files with special extension + */ + def listFiles(origPath: Path, conf: Configuration, extension: String): Seq[Path] = { +val fs = origPath.getFileSystem(conf) +if (fs == null) { + throw new IllegalArgumentException( +sOrcTableOperations: Path $origPath is incorrectly formatted) +} +val path = origPath.makeQualified(fs) +if (fs.exists(path) fs.getFileStatus(path).isDir) { + fs.listStatus(path).map(_.getPath).filter(p = p.getName.endsWith(extension)) --- End diff -- I think `FileSystem.globStatus` can be convenient and more efficient here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3562]Periodic cleanup event logs
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2471#issuecomment-58885115 @mattf I understand what you're trying to say, but think about it in context. As I said above, the when to poll the file system code is the most trivial part of this change. The only advantage of using cron for that is that you'd have more scheduling options - e.g., absolute times instead of a period. To achieve that, you'd be considerably complicating everything else. You'd be creating a new command line tool in Spark, that needs to deal with command line arguments, be documented, and handle security settings (e.g. kerberos) - so it's more burden for everybody, maintaners of the code and admins alike. And all that for a trivial, and I'd say, not really needed gain in functionality. @aw-altiscale pointed me to camus which has a nearly separable component: https://github.com/linkedin/camus/tree/master/camus-sweeper my objection to this is about the architecture and responsibilities of the spark components. i don't object to having the functionality. i think you should implement the ability to sweep/rotate/clean log files in hdfs, but not as part of a spark process. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766608 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -504,19 +505,41 @@ private[parquet] object FileSystemHelper { fs.listStatus(path).map(_.getPath) } -/** - * Finds the maximum taskid in the output file names at the given path. - */ - def findMaxTaskId(pathStr: String, conf: Configuration): Int = { + /** + * List files with special extension + */ + def listFiles(origPath: Path, conf: Configuration, extension: String): Seq[Path] = { +val fs = origPath.getFileSystem(conf) +if (fs == null) { + throw new IllegalArgumentException( +sOrcTableOperations: Path $origPath is incorrectly formatted) +} +val path = origPath.makeQualified(fs) +if (fs.exists(path) fs.getFileStatus(path).isDir) { + fs.listStatus(path).map(_.getPath).filter(p = p.getName.endsWith(extension)) +} else { + Seq.empty +} + } + + /** + * Finds the maximum taskid in the output file names at the given path. + */ + def findMaxTaskId(pathStr: String, conf: Configuration, extension: String): Int = { val files = FileSystemHelper.listFiles(pathStr, conf) -// filename pattern is part-r-int.parquet -val nameP = new scala.util.matching.Regex(part-r-(\d{1,}).parquet, taskid) +// filename pattern is part-r-int.$extension +val nameP = extension match { + case parquet = new scala.util.matching.Regex( part-r-(\d{1,}).parquet, taskid) + case orc = new scala.util.matching.Regex( part-r-(\d{1,}).orc, taskid) + case _ = +sys.error(sERROR: unsupported extension: $extension) +} --- End diff -- Move this `match` expression to the beginning of this function since `.listFiles` can be expensive. Also this expression can be simplified to: ```scala require(Seq(orc, parquet).contains(extension), sUnsupported extension: $extension) val nameP = new Regex(spart-r-(\d{1,}).$extension, taskid) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766670 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -121,6 +122,48 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { } /** + * Loads a Orc file, returning the result as a [[SchemaRDD]]. + * + * @group userf + */ + def orcFile(path: String): SchemaRDD = +new SchemaRDD(this, orc.OrcRelation(path, Some(sparkContext.hadoopConfiguration), this)) + + /** + * :: Experimental :: + * Creates an empty orc file with the schema of class `A`, which can be registered as a table. --- End diff -- Capitalize orc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766697 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -121,6 +122,48 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { } /** + * Loads a Orc file, returning the result as a [[SchemaRDD]]. + * + * @group userf + */ + def orcFile(path: String): SchemaRDD = +new SchemaRDD(this, orc.OrcRelation(path, Some(sparkContext.hadoopConfiguration), this)) + + /** + * :: Experimental :: + * Creates an empty orc file with the schema of class `A`, which can be registered as a table. + * This registered table can be used as the target of future `insertInto` operations. + * + * {{{ + * val sqlContext = new SQLContext(...) --- End diff -- Should be `HiveContext`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766749 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -28,6 +28,7 @@ import org.apache.spark.sql.catalyst.types.StringType import org.apache.spark.sql.execution.{DescribeCommand, OutputFaker, SparkPlan} import org.apache.spark.sql.hive import org.apache.spark.sql.hive.execution._ +import org.apache.spark.sql.hive.orc.{InsertIntoOrcTable, OrcTableScan, OrcRelation} --- End diff -- Sort imported classes in this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18766771 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -221,4 +222,24 @@ private[hive] trait HiveStrategies { case _ = Nil } } + + object OrcOperations extends Strategy { +def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { + case logical.WriteToOrcFile(path, child) = +val relation = + OrcRelation.create(path, child, sparkContext.hadoopConfiguration, sqlContext) +InsertIntoOrcTable(relation, planLater(child), overwrite=true) :: Nil --- End diff -- Spaces around `=`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
GitHub user ScrapCodes opened a pull request: https://github.com/apache/spark/pull/2782 SPARK-3874, Provide stable TaskContext API You can merge this pull request into a Git repository by running: $ git pull https://github.com/ScrapCodes/spark-1 SPARK-3874/stable-tc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2782.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2782 commit ef633f5e4857400c8711ee800b01016b6bd406b2 Author: Prashant Sharma prashan...@imaginea.com Date: 2014-10-13T12:41:11Z SPARK-3874, Provide stable TaskContext API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58886892 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18767300 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + --- End diff -- Remove redundant new line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58887166 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21687/consoleFull) for PR 2782 at commit [`ef633f5`](https://github.com/apache/spark/commit/ef633f5e4857400c8711ee800b01016b6bd406b2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58887515 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21687/consoleFull) for PR 2782 at commit [`ef633f5`](https://github.com/apache/spark/commit/ef633f5e4857400c8711ee800b01016b6bd406b2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public abstract class TaskContext implements Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58887519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21687/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-5896 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768084 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) + log.info(types is {}, types) + + val colBuff = new StringBuilder + val typeBuff = new StringBuilder + for (i - 0 until fields.size()) { +val fieldName = fields.get(i).getFieldName +val typeName = fields.get(i).getFieldObjectInspector.getTypeName +colBuff.append(fieldName) +fieldNameTypeCache.put(fieldName, typeName) +fieldIdCache.put(fieldName, i) +colBuff.append(,) +typeBuff.append(typeName) +typeBuff.append(:) + } + colBuff.setLength(colBuff.length - 1) + typeBuff.setLength(typeBuff.length - 1) + prop.setProperty(columns, colBuff.toString()) + prop.setProperty(columns.types, typeBuff.toString()) + val attributes = convertToAttributes(reader, keys, types) + attributes +} else { + Seq.empty +} + } + + def convertToAttributes( + reader: Reader, + keys: java.util.List[String], + types: java.util.List[Integer]): Seq[Attribute] = { +val range = 0.until(keys.size()) +range.map { + i = reader.getTypes.get(types.get(i)).getKind match { +case Kind.BOOLEAN = + new AttributeReference(keys.get(i), BooleanType, false)() +case Kind.STRING = + new AttributeReference(keys.get(i), StringType, true)() +case
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58889380 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21688/consoleFull) for PR 2782 at commit [`ef633f5`](https://github.com/apache/spark/commit/ef633f5e4857400c8711ee800b01016b6bd406b2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768122 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) + log.info(types is {}, types) + + val colBuff = new StringBuilder + val typeBuff = new StringBuilder + for (i - 0 until fields.size()) { +val fieldName = fields.get(i).getFieldName +val typeName = fields.get(i).getFieldObjectInspector.getTypeName +colBuff.append(fieldName) +fieldNameTypeCache.put(fieldName, typeName) +fieldIdCache.put(fieldName, i) +colBuff.append(,) +typeBuff.append(typeName) +typeBuff.append(:) + } + colBuff.setLength(colBuff.length - 1) + typeBuff.setLength(typeBuff.length - 1) + prop.setProperty(columns, colBuff.toString()) + prop.setProperty(columns.types, typeBuff.toString()) + val attributes = convertToAttributes(reader, keys, types) + attributes +} else { + Seq.empty +} + } + + def convertToAttributes( + reader: Reader, + keys: java.util.List[String], + types: java.util.List[Integer]): Seq[Attribute] = { +val range = 0.until(keys.size()) +range.map { + i = reader.getTypes.get(types.get(i)).getKind match { +case Kind.BOOLEAN = + new AttributeReference(keys.get(i), BooleanType, false)() +case Kind.STRING = + new AttributeReference(keys.get(i), StringType, true)() +case
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58889799 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21688/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58889796 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21688/consoleFull) for PR 2782 at commit [`ef633f5`](https://github.com/apache/spark/commit/ef633f5e4857400c8711ee800b01016b6bd406b2). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public abstract class TaskContext implements Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1405][MLLIB] topic modeling on Graphx
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/2388#discussion_r18768316 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/TopicModeling.scala --- @@ -0,0 +1,682 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.feature + +import java.util.Random + +import breeze.linalg.{DenseVector = BDV, SparseVector = BSV, sum = brzSum} + +import org.apache.spark.annotation.Experimental +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.graphx._ +import org.apache.spark.Logging +import org.apache.spark.mllib.linalg.distributed.{MatrixEntry, RowMatrix} +import org.apache.spark.mllib.linalg.{DenseVector = SDV, SparseVector = SSV, Vector = SV} +import org.apache.spark.rdd.RDD +import org.apache.spark.serializer.KryoRegistrator +import org.apache.spark.storage.StorageLevel +import org.apache.spark.SparkContext._ + +import TopicModeling._ + +class TopicModeling private[mllib]( + @transient var corpus: Graph[VD, ED], + val numTopics: Int, + val numTerms: Int, + val alpha: Double, + val beta: Double, + @transient val storageLevel: StorageLevel) + extends Serializable with Logging { + + def this(docs: RDD[(TopicModeling.DocId, SSV)], +numTopics: Int, +alpha: Double, +beta: Double, +storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK, +computedModel: Broadcast[TopicModel] = null) { +this(initializeCorpus(docs, numTopics, storageLevel, computedModel), + numTopics, docs.first()._2.size, alpha, beta, storageLevel) + } + + + /** + * The number of documents in the corpus + */ + val numDocs = docVertices.count() + + /** + * The number of terms in the corpus + */ + private val sumTerms = corpus.edges.map(e = e.attr.size.toDouble).sum().toLong + + /** + * The total counts for each topic + */ + @transient private var globalTopicCounter: BDV[Count] = collectGlobalCounter(corpus, numTopics) + assert(brzSum(globalTopicCounter) == sumTerms) + + @transient private val sc = corpus.vertices.context + @transient private val seed = new Random().nextInt() + @transient private var innerIter = 1 + @transient private var cachedEdges: EdgeRDD[ED, VD] = corpus.edges + @transient private var cachedVertices: VertexRDD[VD] = corpus.vertices + + private def termVertices = corpus.vertices.filter(t = t._1 = 0) + + private def docVertices = corpus.vertices.filter(t = t._1 0) + + private def checkpoint(): Unit = { +if (innerIter % 10 == 0 sc.getCheckpointDir.isDefined) { + val edges = corpus.edges.map(t = t) + edges.checkpoint() + val newCorpus: Graph[VD, ED] = Graph.fromEdges(edges, null, +storageLevel, storageLevel) + corpus = updateCounter(newCorpus, numTopics).cache() +} + } + + private def gibbsSampling(): Unit = { +val corpusTopicDist = collectTermTopicDist(corpus, globalTopicCounter, + sumTerms, numTerms, numTopics, alpha, beta) + +val corpusSampleTopics = sampleTopics(corpusTopicDist, globalTopicCounter, + sumTerms, innerIter + seed, numTerms, numTopics, alpha, beta) +corpusSampleTopics.edges.setName(sedges-$innerIter).cache().count() +Option(cachedEdges).foreach(_.unpersist()) +cachedEdges = corpusSampleTopics.edges + +corpus = updateCounter(corpusSampleTopics, numTopics) +corpus.vertices.setName(svertices-$innerIter).cache() +globalTopicCounter = collectGlobalCounter(corpus, numTopics) +assert(brzSum(globalTopicCounter) == sumTerms) +Option(cachedVertices).foreach(_.unpersist()) +cachedVertices = corpus.vertices + +checkpoint() +innerIter += 1 + } + + def saveTopicModel(burnInIter: Int): TopicModel = { +val topicModel =
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768380 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { --- End diff -- Please add comments to explain how you get column info from the metadata of an ORC file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768433 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) + log.info(types is {}, types) + + val colBuff = new StringBuilder + val typeBuff = new StringBuilder + for (i - 0 until fields.size()) { +val fieldName = fields.get(i).getFieldName +val typeName = fields.get(i).getFieldObjectInspector.getTypeName +colBuff.append(fieldName) +fieldNameTypeCache.put(fieldName, typeName) +fieldIdCache.put(fieldName, i) +colBuff.append(,) +typeBuff.append(typeName) +typeBuff.append(:) + } + colBuff.setLength(colBuff.length - 1) + typeBuff.setLength(typeBuff.length - 1) + prop.setProperty(columns, colBuff.toString()) + prop.setProperty(columns.types, typeBuff.toString()) + val attributes = convertToAttributes(reader, keys, types) + attributes +} else { + Seq.empty +} + } + + def convertToAttributes( + reader: Reader, + keys: java.util.List[String], + types: java.util.List[Integer]): Seq[Attribute] = { +val range = 0.until(keys.size()) +range.map { + i = reader.getTypes.get(types.get(i)).getKind match { +case Kind.BOOLEAN = + new AttributeReference(keys.get(i), BooleanType, false)() +case Kind.STRING = + new AttributeReference(keys.get(i), StringType, true)() +case Kind.BYTE =
[GitHub] spark pull request: SPARK-3874, Provide stable TaskContext API
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2782#issuecomment-58890576 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21689/consoleFull) for PR 2782 at commit [`bbd9e05`](https://github.com/apache/spark/commit/bbd9e057a24cd25336a806dce41b2cbd1ebc3233). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768525 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) + log.info(types is {}, types) + + val colBuff = new StringBuilder + val typeBuff = new StringBuilder + for (i - 0 until fields.size()) { +val fieldName = fields.get(i).getFieldName +val typeName = fields.get(i).getFieldObjectInspector.getTypeName +colBuff.append(fieldName) +fieldNameTypeCache.put(fieldName, typeName) +fieldIdCache.put(fieldName, i) +colBuff.append(,) +typeBuff.append(typeName) +typeBuff.append(:) + } + colBuff.setLength(colBuff.length - 1) + typeBuff.setLength(typeBuff.length - 1) + prop.setProperty(columns, colBuff.toString()) + prop.setProperty(columns.types, typeBuff.toString()) + val attributes = convertToAttributes(reader, keys, types) + attributes +} else { + Seq.empty +} + } + + def convertToAttributes( + reader: Reader, + keys: java.util.List[String], + types: java.util.List[Integer]): Seq[Attribute] = { +val range = 0.until(keys.size()) +range.map { + i = reader.getTypes.get(types.get(i)).getKind match { +case Kind.BOOLEAN = + new AttributeReference(keys.get(i), BooleanType, false)() +case Kind.STRING = + new AttributeReference(keys.get(i), StringType, true)() +case Kind.BYTE =
[GitHub] spark pull request: [spark-3586][streaming]Support nested director...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-58890751 Hi @wangxiaojing ï¼a small suggestion, why not making this improvement more flexible by adding a parameter to control the searching depth of directories, this will be more general than the current 1-depth searching implementation. Like: ```scala class FileInputDStream[K: ClassTag, V: ClassTag, F : NewInputFormat[K,V] : ClassTag]( @transient ssc_ : StreamingContext, directory: String, filter: Path = Boolean = FileInputDStream.defaultFilter, depth: Int = 1, newFilesOnly: Boolean = true) ``` People can use this parameter to control the searching depth, default 1 keeps the same semantics as current code. Besides some while space related code styles should be changed to align with Scala style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768639 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) --- End diff -- Field names are... Also use `logInfo` instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768648 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] + + override val output = orcSchema + + override lazy val statistics = Statistics(sizeInBytes = sqlContext.defaultSizeInBytes) + + def orcSchema: Seq[Attribute] = { +val origPath = new Path(path) +val reader = OrcFileOperator.readMetaData(origPath, conf) + +if (null != reader) { + val inspector = reader.getObjectInspector.asInstanceOf[StructObjectInspector] + val fields = inspector.getAllStructFieldRefs + + if (fields.size() == 0) { +return Seq.empty + } + + val totalType = reader.getTypes.get(0) + val keys = totalType.getFieldNamesList + val types = totalType.getSubtypesList + log.info(field name is {}, keys) + log.info(types is {}, types) --- End diff -- Types are ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r18768785 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.orc + +import java.util.Properties +import java.io.IOException +import org.apache.hadoop.hive.ql.stats.StatsSetupConst + +import scala.collection.mutable + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.permission.FsAction +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector +import org.apache.hadoop.hive.ql.io.orc._ +import org.apache.hadoop.hive.ql.io.orc.OrcProto.Type.Kind + +import org.apache.spark.sql.parquet.FileSystemHelper +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, LeafNode} +import org.apache.spark.sql.catalyst.analysis.{UnresolvedException, MultiInstanceRelation} +import org.apache.spark.sql.catalyst.expressions.Attribute +import org.apache.spark.sql.catalyst.expressions.AttributeReference +import org.apache.spark.sql.catalyst.types._ + + +private[sql] case class OrcRelation( +path: String, +@transient conf: Option[Configuration], +@transient sqlContext: SQLContext, +partitioningAttributes: Seq[Attribute] = Nil) + extends LeafNode with MultiInstanceRelation { + self: Product = + + val prop: Properties = new Properties + + var rowClass: Class[_] = null + + val fieldIdCache: mutable.Map[String, Int] = new mutable.HashMap[String, Int] + + val fieldNameTypeCache: mutable.Map[String, String] = new mutable.HashMap[String, String] --- End diff -- Seems that this field is not used anywhere... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org