[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74827293 If every single object is large though, then in that case after we've spilled the 32nd object, there would still be an OOM before we check for spilling again, right? I don't know if there's a completely OOM-proof solution aside from checking for spilling on every single element though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user mingyukim commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74828351 @pwendell, I mentioned above, but at a high level, wouldn't it be better to control the frequency of spills by how much memory you acquire from the shuffle memory manager at a time than by how often you check if spill is needed? We can test out your proposal in the specific case we have if that's a less risky option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5878] fix DataFrame.repartition() in Py...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4667#issuecomment-74832254 Thanks. Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Avoid deprecation warnings in JDBCSuite.
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4668 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3781#discussion_r24890677 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -148,19 +152,16 @@ private[spark] class SparkDeploySchedulerBackend( super.applicationId } + def setShutdownCallback(f: SparkDeploySchedulerBackend = Unit) { --- End diff -- OK, but why do you need this setter now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3781#issuecomment-74846885 [Test build #27678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27678/consoleFull) for PR 3781 at commit [`c146c93`](https://github.com/apache/spark/commit/c146c93b3df500881f716b5007304315a70fb641). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4628#issuecomment-74833670 @marmbrus I updated it with test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] Minor doc fix in GBT classification ex...
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4672 [Minor] Minor doc fix in GBT classification example numClassesForClassification has been renamed to numClasses. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark minor-doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4672.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4672 commit d2ddb7fb513dde84d34df08aa70c053042fa0ec8 Author: MechCoder manojkumarsivaraj...@gmail.com Date: 2015-02-18T09:17:10Z Minor doc fix in GBT classification example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74834330 ping @jkbradley ? I was not sure if I had to open a JIRA for this, as it is minor. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4611#issuecomment-74842751 As I say on OS X you get the whole binary path, not just `java`: ``` ps -p ... -o comm= ... /Library/Java/JavaVirtualMachines/jdk1.8.0_31.jdk/Contents/Home/jre/bin/java ``` that's why I was thinking `if ps -p $TARGET_ID -o comm= | grep -q java ; then` + @nchammas for bash syntax thoughts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5878] fix DataFrame.repartition() in Py...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4667 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4628#issuecomment-74834156 [Test build #27676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27676/consoleFull) for PR 4628 at commit [`ecb3bcd`](https://github.com/apache/spark/commit/ecb3bcd74914128cc65fa0c4b3454e1914d18a9f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74843700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27677/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4628#issuecomment-74843574 [Test build #27676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27676/consoleFull) for PR 4628 at commit [`ecb3bcd`](https://github.com/apache/spark/commit/ecb3bcd74914128cc65fa0c4b3454e1914d18a9f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4628#issuecomment-74843582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27676/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74843689 [Test build #27677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27677/consoleFull) for PR 4672 at commit [`d2ddb7f`](https://github.com/apache/spark/commit/d2ddb7fb513dde84d34df08aa70c053042fa0ec8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Github user lukovnikov commented on the pull request: https://github.com/apache/spark/pull/4650#issuecomment-74851018 style errors fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74834704 [Test build #27677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27677/consoleFull) for PR 4672 at commit [`d2ddb7f`](https://github.com/apache/spark/commit/d2ddb7fb513dde84d34df08aa70c053042fa0ec8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74834754 I will merge this back to 1.2. It really should just be an addendum to https://issues.apache.org/jira/browse/SPARK-4610 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4949]shutdownCallback in SparkDeploySch...
Github user sarutak commented on a diff in the pull request: https://github.com/apache/spark/pull/3781#discussion_r24893296 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala --- @@ -148,19 +152,16 @@ private[spark] class SparkDeploySchedulerBackend( super.applicationId } + def setShutdownCallback(f: SparkDeploySchedulerBackend = Unit) { --- End diff -- It's no longer needed so I remove it soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5669 [BUILD] [HOTFIX] Spark assembly inc...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/4673 SPARK-5669 [BUILD] [HOTFIX] Spark assembly includes incompatibly licensed libgfortran, libgcc code via JBLAS Correct exclusion path for JBLAS native libs. (More explanation coming soon on the mailing list re: 1.3.0 RC1) You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-5669.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4673.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4673 commit e29693cc1eceb5c7917a36d93e77a158915f2a0c Author: Sean Owen so...@cloudera.com Date: 2015-02-18T11:29:55Z Correct exclusion path for JBLAS native libs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Avoid deprecation warnings in JDBCSuite.
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4668#issuecomment-74832153 This is great. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4672 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4650#issuecomment-74851295 [Test build #27680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27680/consoleFull) for PR 4650 at commit [`4014c7f`](https://github.com/apache/spark/commit/4014c7f9b8ee8a975f9263adc22f940d99820cb6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74908505 Sorry, but this patch is not correct. As @growse mentions, when `SPARK_LOCAL_DIRS` is not set, this code will try to change the permissions of `/tmp` on Unix machines. It will also use `/tmp/` as the local dir for the driver in client mode, which was the exact thing the original change was trying to avoid. The correct fix here, if you really care about cleaning up the extra directory, is to export a different env variable from the `Worker` ([here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala#L134)) and handle that variable specially in `getOrCreateLocalRootDirs`. When that new env variable is set, the code would behave just like the `isRunningInYarnContainer()` case above the change you're making. @srowen the current code shouldn't create a cascade of directories, but it does create a two-level-deep spark- hierarchy for executors in standalone mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74910370 Ah, wait, there's a second problem (which would result in the cascading directories, I think). `getLocalDir` should cache the local directory it returns, to avoid having to recreate it. (And should probably be made synchronized in the process.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5507] Added documentation for BlockMatr...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4664#issuecomment-74915682 LGTM. Merged into master and branch-1.3. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5830][Core]Don't create unnecessary dir...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4620#issuecomment-74916479 Hi, me again, sorry for the spam. Regarding my last comment, it's probably better if `getOrCreateLocalRootDirs()` caches its return value instead of `getLocalDir()`, since the former is called in several places, and doing that would also cover the `getLocalDir()` case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5507] Added documentation for BlockMatr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4664 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4671#issuecomment-74913992 [Test build #27682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27682/consoleFull) for PR 4671 at commit [`3168b4b`](https://github.com/apache/spark/commit/3168b4b19971f1e82c91f561d3abc3f3141dfa9b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4661 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5821] [SQL] JSON CTAS command should th...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4610#discussion_r24922476 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala --- @@ -66,9 +66,17 @@ private[sql] class DefaultSource mode match { case SaveMode.Append = sys.error(sAppend mode is not supported by ${this.getClass.getCanonicalName}) -case SaveMode.Overwrite = - fs.delete(filesystemPath, true) +case SaveMode.Overwrite = { + try { +fs.delete(filesystemPath, true) + } catch { +case e: IOException = + throw new IOException( +sUnable to clear output directory ${filesystemPath.toString} prior + + s to CREATE a JSON table AS SELECT:\n${e.toString}) + } --- End diff -- @yanbohappy Seems we just throw another error message at here. Based on your JIRA description, I think you need to check if delete returns true or false when data already exists. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4661#issuecomment-74915467 Merged into master and branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-74919610 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/4675 [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release For SPARK-5867: * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API. * It should also include Python examples now. For SPARK-5892: * Fix Python docs * Various other cleanups CC: @mengxr (ML), @davies (Python docs) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark doc-review-1.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4675.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4675 commit e57282727570ce7a47cf4144ae2db33c874f6357 Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T02:34:12Z updated programming guide for ml and mllib commit b05a80de67b645bf68d7ab87123720517d0222e1 Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T02:35:22Z organize imports. doc cleanups commit a72c018ddf029496cb5a158d8a22aafd9f819483 Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T02:36:50Z made ChiSqTestResult appear in python docs commit 695f3f62b202f319a2dbf9c6fd8436be280ca48d Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T02:37:19Z partly done trying to fix inherit_doc for class hierarchies in python docs commit 8cce91c47e9633a11c911e879e489df7c54324e1 Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T19:24:27Z GMM: removed old imports, added some doc commit da16aef6800b16a708739c96bc1ef713043eb461 Author: Joseph K. Bradley jos...@databricks.com Date: 2015-02-18T19:24:56Z Fixed python mllib docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74931339 [Test build #27684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27684/consoleFull) for PR 4675 at commit [`da16aef`](https://github.com/apache/spark/commit/da16aef6800b16a708739c96bc1ef713043eb461). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor] [MLlib] Minor doc fix in GBT classific...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4672#issuecomment-74932384 (belatedly) Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4676#issuecomment-74932367 [Test build #27685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27685/consoleFull) for PR 4676 at commit [`bfabd91`](https://github.com/apache/spark/commit/bfabd91d350fbb48c103896a585b362c7c823c2d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4583#issuecomment-74931875 @florianverhein - Sorry for the delay. I just tested this out and it seemed to work okay. One thing that I was confused by is that its not very clear where the files are ending up on the master. For example I had a directory `/home/shivaram/dotfiles` that I passed in as the argument. I think it would be good to rsync this to `/root/dotfiles` on the master ? Right now the behavior is that the files inside the directory (like say `.vimrc`) are put in `/root/` (i.e. I got `/root/.vimrc`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark 5889] Remove pid file after stopping se...
GitHub user zhzhan opened a pull request: https://github.com/apache/spark/pull/4676 [Spark 5889] Remove pid file after stopping service. Currently the pid file is not deleted, and potentially may cause some problem after service is stopped. The fix remove the pid file after service stopped. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhzhan/spark spark-5889 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4676.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4676 commit e63bfa5dcd7f35e0101a8aa6631a1ae29b81a399 Author: Zhan Zhang zhaz...@gmail.com Date: 2014-08-08T17:47:18Z test commit c0c7d2ae0dcd4d8921513910985e10f1f58e8ab4 Author: Zhan Zhang zhaz...@gmail.com Date: 2015-01-07T21:01:45Z squash all commits commit bfabd91d350fbb48c103896a585b362c7c823c2d Author: Zhan Zhang zhaz...@gmail.com Date: 2015-02-18T19:22:23Z spark-5889: remove pid file after stopping service --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4671#issuecomment-74932737 [Test build #27682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27682/consoleFull) for PR 4671 at commit [`3168b4b`](https://github.com/apache/spark/commit/3168b4b19971f1e82c91f561d3abc3f3141dfa9b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4671#issuecomment-74932750 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27682/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-74934591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27683/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-74934575 [Test build #27683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27683/consoleFull) for PR 3850 at commit [`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74931104 Note: The altered examples in the spark.ml guide were copied from executable examples in the examples/ directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74932285 [Test build #27686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27686/consoleFull) for PR 4675 at commit [`34b067f`](https://github.com/apache/spark/commit/34b067fba0bb7602b69d0f5fdcde5ce470786de4). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [branch-1.0][SPARK-4355] ColumnStatisticsAggre...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3850#issuecomment-74920638 [Test build #27683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27683/consoleFull) for PR 3850 at commit [`ae9b94a`](https://github.com/apache/spark/commit/ae9b94a3f817759ee6249af991beec7e19e52f12). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5570: No docs stating that `new SparkCon...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4665#issuecomment-74920473 Hey @ilganeli thanks for doing this. Can you also do this for the other `spark.driver.*` options? Like extra java opts, class paths etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5548: Fix for AkkaUtilsSuite failure - a...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4653#discussion_r24927231 --- Diff: core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala --- @@ -370,9 +371,13 @@ class AkkaUtilsSuite extends FunSuite with LocalSparkContext with ResetSystemPro val selection = slaveSystem.actorSelection( AkkaUtils.address(AkkaUtils.protocol(slaveSystem), spark, localhost, boundPort, MapOutputTracker)) val timeout = AkkaUtils.lookupTimeout(conf) -intercept[TimeoutException] { - slaveTracker.trackerActor = Await.result(selection.resolveOne(timeout * 2), timeout) +val result = Try(Await.result(selection.resolveOne(timeout * 2), timeout)) + +assert(result.isFailure === true) +val exception = result match { + case Failure(ex) = ex } --- End diff -- this will create a lot of warnings complaining that the match is not exhaustive. I thin you'll need to add a `case _ = fail(...)` to fix this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5559] [Streaming] [Test] Remove oppotun...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4337#discussion_r24929712 --- Diff: external/mqtt/src/test/scala/org/apache/spark/streaming/mqtt/MQTTStreamSuite.scala --- @@ -113,7 +115,8 @@ class MQTTStreamSuite extends FunSuite with Eventually with BeforeAndAfter { } private def findFreePort(): Int = { --- End diff -- We don't have a test utilities subproject, so this ends up getting duplicated, but note that we also have duplication of classes like LocalSparkContext; fixing this broader issue is outside the scope of this PR (there's a few JIRAs to track the creation of a test utilities project, though). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Merge pull request #1 from apache/master
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4553#issuecomment-74928572 It looks like this was opened by mistake; do you mind closing this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user gurvindersingh commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74931493 will be nice to have this patch merged in for 1.3 release. As we plan to use this feature with Mesos and Spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5548: Fix for AkkaUtilsSuite failure - a...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4653#discussion_r24927173 --- Diff: core/src/test/scala/org/apache/spark/util/AkkaUtilsSuite.scala --- @@ -370,9 +371,13 @@ class AkkaUtilsSuite extends FunSuite with LocalSparkContext with ResetSystemPro val selection = slaveSystem.actorSelection( AkkaUtils.address(AkkaUtils.protocol(slaveSystem), spark, localhost, boundPort, MapOutputTracker)) val timeout = AkkaUtils.lookupTimeout(conf) -intercept[TimeoutException] { - slaveTracker.trackerActor = Await.result(selection.resolveOne(timeout * 2), timeout) +val result = Try(Await.result(selection.resolveOne(timeout * 2), timeout)) + +assert(result.isFailure === true) --- End diff -- you can just do `assert(result.isFailure)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5570: No docs stating that `new SparkCon...
Github user ilganeli commented on the pull request: https://github.com/apache/spark/pull/4665#issuecomment-74931272 Sure @andrewor14 , I presume their behavior is identical ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74936971 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27687/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74936955 [Test build #27687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27687/consoleFull) for PR 3074 at commit [`0d6d2b3`](https://github.com/apache/spark/commit/0d6d2b304d56b65d7e2fa61d762ae787d35a2e75). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4675#discussion_r24935321 --- Diff: docs/ml-guide.md --- @@ -171,12 +171,12 @@ import org.apache.spark.sql.{Row, SQLContext} val conf = new SparkConf().setAppName(SimpleParamsExample) val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) -import sqlContext._ +import sqlContext.implicits._ // Prepare training data. -// We use LabeledPoint, which is a case class. Spark SQL can convert RDDs of case classes -// into SchemaRDDs, where it uses the case class metadata to infer the schema. -val training = sparkContext.parallelize(Seq( +// We use LabeledPoint, which is a case class. Spark SQL can convert RDDs of Java Beans --- End diff -- This is under Scala context. `case classes` or `case class instances` may be better than `JavaBeans`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74936967 [Test build #27687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27687/consoleFull) for PR 3074 at commit [`0d6d2b3`](https://github.com/apache/spark/commit/0d6d2b304d56b65d7e2fa61d762ae787d35a2e75). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4675#discussion_r24936197 --- Diff: python/pyspark/ml/pipeline.py --- @@ -18,7 +18,8 @@ from abc import ABCMeta, abstractmethod from pyspark.ml.param import Param, Params -from pyspark.ml.util import inherit_doc, keyword_only +from pyspark.ml.util import keyword_only +from pyspark.mllib.__init__ import inherit_doc --- End diff -- from pyspark.mllib import inherit_doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74940888 @mateiz where do you suggest putting this Dockerfile? I have a Dockerfile that builds Spark from source that depends on the Mesos image here: https://github.com/tnachen/spark/blob/dockerfile/Dockerfile @hellertime you can use this if you like or make modifications with it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mbofb commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74948488 description of RowMatrix.computeSVD and mllib-dimensionality-reduction.html: We assume n is smaller than m. Is this just a recommendation or a hard requirement. This condition seems not to be checked and causing an IllegalArgumentException â the processing finishes even though the vectors have a higher dimension than the number of vectors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mbofb commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74949165 description of RowMatrix. computePrincipalComponents or RowMatrix in general: I got a Exception. java.lang.IllegalArgumentException: Argument with more than 65535 cols: 7949273 at org.apache.spark.mllib.linalg.distributed.RowMatrix.checkNumColumns(RowMatrix.scala:131) at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeCovariance(RowMatrix.scala:318) at org.apache.spark.mllib.linalg.distributed.RowMatrix.computePrincipalComponents(RowMatrix.scala:373) This 65535 cols restriction would be nice to be written in the doc (if this still applies in 1.3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74951424 @MechCoder Could you share some performance comparison results? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74953365 [Test build #27689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27689/consoleFull) for PR 4677 at commit [`07c8f12`](https://github.com/apache/spark/commit/07c8f12bc72b11ae780095a73662b5e049dc6e22). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3861#issuecomment-74950927 We spoke a bit offline about this, but my feeling was that the best thing here might be to add a way to launch the shuffle service as a standalone application (initially, not one managed by Mesos) so that it can be shared across Spark applications. That would involve writing some simple launching scripts for it in a similar way to existing daemons we launch, and you'd ask users to launch the shuffle service similar to other storage systems like HDFS. That's very simple and would avoid diverging a lot between Mesos and the other modes. And longer term we could actually have a single shared shuffle service that is scheduled by mesos. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...
Github user tnachen closed the pull request at: https://github.com/apache/spark/pull/3861 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3861#issuecomment-74951847 Agree and it's currently being worked on. We can close this PR too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74951728 The docker folder is for test images, but it could be a good place for this one. I'll let @pwendell comment on it. Does Apache Mesos publish a base Docker image? It would be easier to base it on that if that would get updated with each release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5425: Use synchronised methods in system...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4221#issuecomment-74952993 Yeah our auto-close doesn't work on PR's into release branches like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4677 [SPARK-5436] [MLlib] Validate GradientBoostedTrees during train One can early stop if the decrease in error rate is lesser than a certain tol, or if the error increases if the training data is overfit. This introduces a new method which takes in a pair of RDD's , one for the training data and the other for the validation. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MechCoder/spark spark-5436 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4677 commit 07c8f12bc72b11ae780095a73662b5e049dc6e22 Author: MechCoder manojkumarsivaraj...@gmail.com Date: 2015-02-18T21:23:33Z [SPARK-5436] Validate GradientBoostedTrees during train --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74953724 @jkbradley I just wanted to know if this is in the right direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74954211 [Test build #27690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27690/consoleFull) for PR 4677 at commit [`7534d14`](https://github.com/apache/spark/commit/7534d145d8cf686221647bffeeb2c404dddc575d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4583#issuecomment-74954531 Hmm okay - My other concern was also that the directory itself wasn't maintained. i.e. it might be better to put the deploy-root-dir into `/` as a directory (`/dotfiles/.vimrc` instead of `/.vimrc`) ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74956760 Mesosphere does publish a Mesos image on each release (mesosphere/mesos), with the each version tagged. We don't tag the latest release with the :latest tag, I could go change that for sure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4671#issuecomment-74957281 Thanks, merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: example of python converter for avrò output f...
GitHub user daria-sukhareva opened a pull request: https://github.com/apache/spark/pull/4678 example of python converter for avrò output format I actually wanted to know if I am doing it right rather than suggest pulling it to spark repo You can merge this pull request into a Git repository by running: $ git pull https://github.com/daria-sukhareva/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4678.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4678 commit 2ba7b213572d6ce2056cfc2536b701ae689c7f98 Author: daria daria.sukhar...@rubikloud.com Date: 2015-02-18T21:49:45Z avrò output format --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74957382 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27690/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: example of python converter for avrò output f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4678#issuecomment-74957691 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5710][SQL] Combines two adjacent Cast e...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4497#issuecomment-74957611 Agreed. If there are concrete proposals for eliminating redundant casts then we should discuss on JIRA. However as is this could change the answer and thus is an invalid optimization. So, we should close this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4903][SQL]Backport the bug fix for SPAR...
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/4671#issuecomment-74957462 Thanks!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5436] [MLlib] Validate GradientBoostedT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4677#issuecomment-74957377 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27689/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4675#discussion_r24935349 --- Diff: docs/mllib-guide.md --- @@ -90,6 +90,21 @@ version 1.4 or newer. # Migration Guide +## From 1.2 to 1.3 + +In the `spark.mllib` package: + +* *(Breaking change)* In [`ALS`](api/scala/index.html#org.apache.spark.mllib.recommendation.ALS), the extraneous method `solveLeastSquares` has been removed. The `DeveloperApi` method `analyzeBlocks` was also removed. --- End diff -- Shall we try to make the sections as code tabs? It is getting longer and longer. For the `breaking change`, we should mention that they are experimental or developer APIs. `ALS.solverLeastSquares` is perhaps the only outlier. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4675#discussion_r24935325 --- Diff: docs/ml-guide.md --- @@ -300,19 +302,21 @@ ListLabeledPoint localTest = Lists.newArrayList( new LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)), new LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)), new LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5))); -JavaSchemaRDD test = jsql.createDataFrame(jsc.parallelize(localTest), LabeledPoint.class); +DataFrame test = jsql.createDataFrame(jsc.parallelize(localTest), LabeledPoint.class); // Make predictions on test documents using the Transformer.transform() method. // LogisticRegression.transform will only use the 'features' column. -// Note that model2.transform() outputs a 'probability' column instead of the usual 'score' -// column since we renamed the lr.scoreCol parameter previously. -model2.transform(test).registerAsTable(results); -JavaSchemaRDD results = -jsql.sql(SELECT features, label, probability, prediction FROM results); +// Note that model2.transform() outputs a 'myProbability' column instead of the usual +// 'probability' column since we renamed the lr.probabilityCol parameter previously. +model2.transform(test).registerTempTable(results); +DataFrame results = +jsql.sql(SELECT features, label, myProbability, prediction FROM results); --- End diff -- With the DataFrame API, we don't need to call SQL now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5673] [MLlib] Implement Streaming wrapp...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4456#issuecomment-74936361 @catap Can you please add a description for this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user hellertime commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74944182 @tnachen That Dockerfile you have is actually all that is needed for an example image; that its based on the mesosphere image is even better! I had hoped that there could be an actual image on the Docker hub which could be referenced from the properties example. Is that image on the Docker hub? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74946826 [Test build #27686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27686/consoleFull) for PR 4675 at commit [`34b067f`](https://github.com/apache/spark/commit/34b067fba0bb7602b69d0f5fdcde5ce470786de4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74946839 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27686/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74939602 [Test build #27688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27688/consoleFull) for PR 3074 at commit [`127aaa8`](https://github.com/apache/spark/commit/127aaa8050b34925e511b8d8131dfb1e75841be8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74939621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27688/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74945101 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27684/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5641] [EC2] Allow spark_ec2.py to copy ...
Github user florianverhein commented on the pull request: https://github.com/apache/spark/pull/4583#issuecomment-74946543 Thanks @shivaram. I'm not sure I follow 100%. With that argument they should have ended up eg /.vimrc (unless root is a subdirectory of dotfiles). The contents if '--deploy-root-dir' end up in /, not /root/ (ie as documented in the help). This is necessary because you may want to copy files elsewhere on the file system. Eg /opt. It's just unfortunate that the existence of /root/ means root is not unambiguous. Therefore I made sure to use / in the help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74946838 @mccheah @mingyukim yeah, there isn't an OOM proof solution at all because these are all heuristics. Even checking every element is not OOM proof since memory estimation is itself a heuristic that involves sampling. My only concern with exposing knobs here is that users will expect us to support these going forward, even though we may want to refactor this in the future in a way where those knobs don't make sense anymore. It's reasonable users would consider it a regression if their tuning of those knobs stopped working. So if possible, it would be good to adjust our heuristics to meet a wider range of use cases and then if we keep hearing more issues we can expose knobs. We can't have them meet every possible use case, since they are heuristics, but in this case I was wondering if we could have a strict improvement to the heuristics. @andrewor14 can you comment on whether this is indeed a strict improvement? One of the main benefits of the new data frames API is that we will be able to have precise control over memory usage in a way that can avoid OOM's ever. But for the current Spark API we are using this more ad-hoc memory estimation along with some heuristics. I'm not 100% against exposing knobs either, but I'd be interested if some simple improvements fix your use case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4676#issuecomment-74946963 [Test build #27685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27685/consoleFull) for PR 4676 at commit [`bfabd91`](https://github.com/apache/spark/commit/bfabd91d350fbb48c103896a585b362c7c823c2d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-5889] Remove pid file after stopping se...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4676#issuecomment-74946981 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27685/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user hellertime commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74937819 So perhaps putting an example Dockerfile in the `docker` subdirectory is not an appropriate thing to do... any suggestions on a better location for examples such as this? The `examples` directory also would be inappropriate I think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4675#discussion_r24936278 --- Diff: python/pyspark/mllib/__init__.py --- @@ -33,3 +34,20 @@ random.__name__ = 'random' random.RandomRDDs.__module__ = __name__ + '.random' sys.modules[__name__ + '.random'] = random + + +def inherit_doc(cls): --- End diff -- Move this into mllib/common.py ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-74939617 [Test build #27688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27688/consoleFull) for PR 3074 at commit [`127aaa8`](https://github.com/apache/spark/commit/127aaa8050b34925e511b8d8131dfb1e75841be8). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74945089 [Test build #27684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27684/consoleFull) for PR 4675 at commit [`da16aef`](https://github.com/apache/spark/commit/da16aef6800b16a708739c96bc1ef713043eb461). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] D...
Github user mbofb commented on the pull request: https://github.com/apache/spark/pull/4675#issuecomment-74948403 The description of RowMatrix.computeSVD and mllib-dimensionality-reduction.html should be more precise/explicit regarding the m x n matrix. In the current description I would conclude that n refers to the rows. According to http://math.stackexchange.com/questions/191711/how-many-rows-and-columns-are-in-an-m-x-n-matrix this way of describing a matrix is only used in particular domains. I as a reader interested on applying SVD would rather prefer the more common m x n way of rows x columns (e.g. http://en.wikipedia.org/wiki/Matrix_%28mathematics%29 ) which is also used in http://en.wikipedia.org/wiki/Latent_semantic_analysis (and also within the ARPACK manual: â N Integer. (INPUT) - Dimension of the eigenproblem. NEV Integer. (INPUT) - Number of eigenvalues of OP to be computed. 0 NEV N. NCV Integer. (INPUT) - Number of columns of the matrix V (less than or equal to N). â ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5840][SQL] HiveContext cannot be serial...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4628 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org