[GitHub] spark pull request: Spark-5708: Add Slf4jSink to Spark Metrics

2015-02-17 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4644#issuecomment-74727941 Jenkinmensch, this is OK to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4571 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74729680 Looks like this failed due to Scala style issues due to a few long lines. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-5661]function hasShutdownDeleteTachyonD...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4418 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74733422 Merged into master--thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [Minor][SQL] Use same function to check path p...

2015-02-17 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4649#issuecomment-74711513 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74710776 Is there any reason to control map side combine? that seems to be the original motivation. @mccheah ? Any third opinions? Although I didn't perceive this method as

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24835658 --- Diff: core/src/main/scala/org/apache/spark/ui/storage/BroadcastPage.scala --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5862][SQL] Only transformUp the given p...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4651#issuecomment-74715550 [Test build #27633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27633/consoleFull) for PR 4651 at commit

[GitHub] spark pull request: SPARK-5856: In Maven build script, launch Zinc...

2015-02-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4643#issuecomment-74718771 Thanks guys, pulling this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5864] [PySpark] support .jar as python ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4652#issuecomment-74720173 [Test build #27635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27635/consoleFull) for PR 4652 at commit

[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4647 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5859] [PySpark] [SQL] fix DataFrame Pyt...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4645 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread manishamde
Github user manishamde commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74722368 @MechCoder Sorry I didn't see the message earlier. I am sure @jkbradley must have done a thorough review but please let me know if you need me to take a look. ---

[GitHub] spark pull request: Spark-5708: Add Slf4jSink to Spark Metrics

2015-02-17 Thread kayousterhout
Github user kayousterhout commented on the pull request: https://github.com/apache/spark/pull/4644#issuecomment-74727261 I'm not familiar with this particular part of the metrics...some git blame suggests @jerryshao wrote most of this code originally, and @rxin was involved in

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread squito
Github user squito commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74713997 can you also post a screenshot of the detailed page for a broadcast var? Ideally involving a broadcast var that gets turned into multiple blocks by `TorrentBroadcast`, I

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24840220 --- Diff: core/src/main/scala/org/apache/spark/storage/RDDInfo.scala --- @@ -21,13 +21,14 @@ import org.apache.spark.annotation.DeveloperApi import

[GitHub] spark pull request: Spark-5708: Add Slf4jSink to Spark Metrics

2015-02-17 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4644#issuecomment-74726007 cc @JoshRosen @kayousterhout - I believe y'all are the point people for metrics stuff...? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-5864] [PySpark] support .jar as python ...

2015-02-17 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/4652#issuecomment-74727279 @pwendell this is not enough to support Spark Packages with pyspark but solves the harder half of the problem. I have a follow up patch that adds jars in `--packages` to

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74731158 [Test build #27638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27638/consoleFull) for PR 2851 at commit

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
GitHub user MechCoder opened a pull request: https://github.com/apache/spark/pull/4654 [SPARK-5016] Distribute Gaussian Initialization in GaussianMixture Following discussion in the JIRA You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread MattWhelan
Github user MattWhelan commented on the pull request: https://github.com/apache/spark/pull/4648#issuecomment-74711906 LGTM. Thanks for taking care of this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4647#issuecomment-74720366 Merged into master and branch-1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4231#discussion_r24838925 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala --- @@ -1064,9 +1045,12 @@ object DecisionTree extends Serializable with

[GitHub] spark pull request: [SPARK-5778] throw if nonexistent metrics conf...

2015-02-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4571#issuecomment-74727631 Pulling it in now, thanks ryan! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: SPARK-5548: Fixed a race condition in AkkaUtil...

2015-02-17 Thread jacek-lewandowski
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/4343#issuecomment-74728480 Here is the new PR https://github.com/apache/spark/pull/4653 --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74730606 [Test build #612 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/612/consoleFull) for PR 3027 at commit

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74732535 There is a case where map-side-combine is actually not the right thing to do in some of my workflows. map-side-combine makes sense if the overall amount of data is

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3027#discussion_r24843768 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -95,25 +96,30 @@ private[spark] class TaskResultGetter(sparkEnv:

[GitHub] spark pull request: [SPARK-5864] [PySpark] support .jar as python ...

2015-02-17 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4652 [SPARK-5864] [PySpark] support .jar as python package A jar file containing Python sources in it could be used as a Python package, just like zip file. spark-submit already put the jar file

[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...

2015-02-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-74721574 LGTM, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread CodingCat
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/2851#discussion_r24840983 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -522,7 +523,9 @@ private[spark] class BlockManagerInfo(

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74730456 @JoshRosen I fixed it locally, but forgot to push out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-02-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-74729850 @selvinsource I like the output options which write to disk, but what about supporting writing to distributed file systems? It's more common in Spark to use URIs

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74733737 [Test build #27639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27639/consoleFull) for PR 4654 at commit

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74643730 @sryza The logical difference is small. `aggregateByKey` is for when you have a single immutable 'zero' value to start from for each key. `combineByKey` lets this be a

[GitHub] spark pull request: [SPARK-5259][CORE]Make sure mapStage.pendingta...

2015-02-17 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4055#discussion_r24806228 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -483,8 +483,9 @@ private[spark] class TaskSetManager( // a

[GitHub] spark pull request: SPARK-5841 [CORE] [HOTFIX] Memory leak in Disk...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4648#issuecomment-74650240 [Test build #27627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27627/consoleFull) for PR 4648 at commit

[GitHub] spark pull request: [SPARK-5825] [Spark Submit] Remove the double ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4611#issuecomment-74650236 Gotcha. OK. Until someone thinks of a more robust check, I think we can resort to just checking if `ps -p $pid -o comm=` is `java`? --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-5826][Streaming] Fix Configuration not ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4612#issuecomment-74648878 OK LGTM. I suppose a field is not generated here as it's never used outside the constructor and it need not be `private`. Looks like a clean fix, we've reviewed it, tests

[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4647#issuecomment-74648972 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5858][MLLIB] Remove unnecessary first()...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4647#issuecomment-74648960 [Test build #27626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27626/consoleFull) for PR 4647 at commit

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4622#issuecomment-74653282 [Test build #27628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27628/consoleFull) for PR 4622 at commit

[GitHub] spark pull request: [SPARK-5832][Mllib] Add Affinity Propagation c...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4622#issuecomment-74653294 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5852][SQL]Fail to convert a newly creat...

2015-02-17 Thread yhuai
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/4655 [SPARK-5852][SQL]Fail to convert a newly created empty metastore parquet table to a data source parquet table. The problem is that after we create an empty hive metastore parquet table (e.g. `CREATE

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74734747 Do you want me to time any specific data? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5852][SQL]Fail to convert a newly creat...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4655#issuecomment-74735710 [Test build #27640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27640/consoleFull) for PR 4655 at commit

[GitHub] spark pull request: [SPARK-5864] [PySpark] support .jar as python ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4652#issuecomment-74735449 [Test build #27635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27635/consoleFull) for PR 4652 at commit

[GitHub] spark pull request: [SPARK-4436][SPARK-3624][BUILD] Debian packagi...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5311][core] Corrected EventLoggingListe...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4656#issuecomment-74736599 [Test build #27641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27641/consoleFull) for PR 4656 at commit

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread tgaloppo
Github user tgaloppo commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74737757 Please be sure to test in cluster setting, not just on a multicore machine... I believe the computation/communication ratio is going to be too low to make this

[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

2015-02-17 Thread ankurdave
Github user ankurdave commented on the pull request: https://github.com/apache/spark/pull/4650#issuecomment-74737920 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74740668 [Test build #27644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27644/consoleFull) for PR 4654 at commit

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread tgaloppo
Github user tgaloppo commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24850641 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -168,16 +182,26 @@ class GaussianMixture private (

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-17 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-74745858 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5871] output explain in Python

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4658#issuecomment-74746113 [Test build #27647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27647/consoleFull) for PR 4658 at commit

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-17 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4641#issuecomment-74746384 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-3957]: show broadcast variable resource...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2851#issuecomment-74746530 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4142#issuecomment-74747297 [Test build #27650 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27650/consoleFull) for PR 4142 at commit

[GitHub] spark pull request: [SPARK-5872] [SQL] create a sqlCtx in pyspark ...

2015-02-17 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4659 [SPARK-5872] [SQL] create a sqlCtx in pyspark shell The sqlCtx will be HiveContext if hive is built in assembly jar, or SQLContext if not. It also skip the Hive tests in pyspark.sql.tests

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74733959 Could you please add a description (based on the JIRA info)? That makes it easier for others to review. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24846111 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -135,25 +135,39 @@ class GaussianMixture private (

[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4650#issuecomment-74738600 [Test build #27643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for PR 4650 at commit

[GitHub] spark pull request: [SPARK 5280] RDF Loader added + documentation

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4650#issuecomment-74738782 [Test build #27643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27643/consoleFull) for PR 4650 at commit

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4654#issuecomment-74740673 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5868][SQL] Fix python UDFs in HiveConte...

2015-02-17 Thread marmbrus
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/4657 [SPARK-5868][SQL] Fix python UDFs in HiveContext and checks in SQLContext You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark

[GitHub] spark pull request: [SPARK-5016] Distribute Gaussian Initializatio...

2015-02-17 Thread MechCoder
Github user MechCoder commented on a diff in the pull request: https://github.com/apache/spark/pull/4654#discussion_r24848796 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala --- @@ -168,16 +182,26 @@ class GaussianMixture private (

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4656 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5871] output explain in Python

2015-02-17 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/4658 [SPARK-5871] output explain in Python You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark explain Alternatively you can review and

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74745780 [Test build #27637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27637/consoleFull) for PR 3027 at commit

[GitHub] spark pull request: [SPARK-4544] Properly synchronize accesses to ...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4660#issuecomment-74748317 [Test build #27651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27651/consoleFull) for PR 4660 at commit

[GitHub] spark pull request: SPARK-4454 Fix race condition in DAGScheduler

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3345#issuecomment-74749292 I've opened #4660 as an alternative fix for this issue; please take a look. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-5852][SQL]Fail to convert a newly creat...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4655#issuecomment-74749230 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...

2015-02-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-74750269 I wonder if maybe we could improve these heuristics instead of adding user flags (for many users it might be hard to figure out how to set these). The heuristic of

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-17 Thread squito
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/4629#discussion_r24852684 --- Diff: python/pyspark/tests.py --- @@ -740,6 +739,27 @@ def test_multiple_python_java_RDD_conversions(self): converted_rdd =

[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4661#issuecomment-74750803 [Test build #27653 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27653/consoleFull) for PR 4661 at commit

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74750902 I don't entirely understand the advantage of `combineByKey` without a combiner vs. `groupByKey`. Can't whatever aggregation function that would be used inside the

[GitHub] spark pull request: [SPARK-5785] [PySpark] narrow dependency for c...

2015-02-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4629#discussion_r24853424 --- Diff: python/pyspark/tests.py --- @@ -740,6 +739,27 @@ def test_multiple_python_java_RDD_conversions(self): converted_rdd =

[GitHub] spark pull request: [SPARK-5872] [SQL] create a sqlCtx in pyspark ...

2015-02-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4659#issuecomment-74752885 LGTM - thanks davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74752839 We want to take advantage of the distributed reduce functionality of combineByKey when computing the other aggregation metrics as well. Is this not lost if we do a map

[GitHub] spark pull request: [SPARK-5811] Added documentation for maven coo...

2015-02-17 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/4662 [SPARK-5811] Added documentation for maven coordinates and added Spark Packages support Documentation for maven coordinates + Spark Package support. Added pyspark tests for `--packages` You can

[GitHub] spark pull request: [SPARK-5519][MLLIB] add user guide with exampl...

2015-02-17 Thread mengxr
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/4661 [SPARK-5519][MLLIB] add user guide with example code for fp-growth You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-5519

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4656#issuecomment-74752039 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4656#issuecomment-74751559 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4656#issuecomment-74751548 [Test build #27641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27641/consoleFull) for PR 4656 at commit

[GitHub] spark pull request: [Minor] fix typo in SQL document

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4656#issuecomment-74752031 [Test build #27642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27642/consoleFull) for PR 4656 at commit

[GitHub] spark pull request: [SPARK-5771] Number of Cores in Completed Appl...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4567#issuecomment-74755232 See https://issues.apache.org/jira/browse/SPARK-5076 too; it suggests that in addition to removing the current cores column for completed apps, to remove the Memory per

[GitHub] spark pull request: [SPARK-5068][SQL]fix bug query data when path ...

2015-02-17 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3907#issuecomment-74755965 Is this superseded by https://github.com/apache/spark/pull/4356 ? if so can this be closed? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74756176 What exactly do you mean by the distributed reduce functionality? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-5811] Added documentation for maven coo...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4662#issuecomment-74756031 [Test build #27654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27654/consoleFull) for PR 4662 at commit

[GitHub] spark pull request: [SPARK-5868][SQL] Fix python UDFs in HiveConte...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4657 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5843] Allowing map-side combine to be s...

2015-02-17 Thread mccheah
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/4634#issuecomment-74756966 You lose the parallelism that's inherent in computing the reduce as a parallel operation, as opposed to computing it on a list in a single task. For more

[GitHub] spark pull request: [SPARK-5811] Added documentation for maven coo...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4662#issuecomment-74758524 [Test build #27655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27655/consoleFull) for PR 4662 at commit

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3027#issuecomment-74759178 LGTM, so I'm going to merge this into `master` (1.4.0) and `branch-1.3` (1.3.0). Thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-4172] [PySpark] Progress API in Python

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3027 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5811] Added documentation for maven coo...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4662#issuecomment-74760331 [Test build #27656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27656/consoleFull) for PR 4662 at commit

[GitHub] spark pull request: [SPARK-4454] Properly synchronize accesses to ...

2015-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4660#issuecomment-74760535 /cc @pwendell @markhamstra --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-5871] output explain in Python

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4658#issuecomment-74760523 [Test build #27647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27647/consoleFull) for PR 4658 at commit

<    1   2   3   4   >