[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5705#issuecomment-96350256 [Test build #30957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30957/consoleFull) for PR 5705 at commit

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29107445 --- Diff: dev/run-tests --- @@ -17,239 +17,394 @@ # limitations under the License. # -# Go to the Spark project root directory

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/5608#discussion_r29107048 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging {

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-9633 [Test build #710 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/710/consoleFull) for PR 5687 at commit

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107372 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107373 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107408 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-6954. ExecutorAllocationManager can end ...

2015-04-26 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/5704 SPARK-6954. ExecutorAllocationManager can end up requesting a negative n... ...umber of executors You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107399 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -89,6 +89,12 @@ abstract class BinaryExpression extends

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-96334096 #5704 demonstrates what I outlined in my above comment, and should supersede this PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96336202 [Test build #30956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30956/consoleFull) for PR 5704 at commit

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96344428 [Test build #710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/710/consoleFull) for PR 5687 at commit

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5705#discussion_r29107784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -317,18 +331,13 @@ object functions { def not(e: Column): Column = !e

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/5705 [SPARK-7152][SQL] Add a Column expression for partition ID. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark df-pid Alternatively you

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107367 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107363 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TestData.scala --- @@ -57,6 +58,15 @@ object TestData { TestData2(3, 2) :: Nil, 2).toDF()

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-96388201 [Test build #30958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30958/consoleFull) for PR 5707 at commit

[GitHub] spark pull request: [ML][SPARK-6529] Add Word2Vec transformer

2015-04-26 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5596#issuecomment-96392739 @mengxr I have merged it with #5626. You can retest it when possible. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-26 Thread debasish83
Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-96403986 was very last few weeks...update it in next few days... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-26 Thread viirya
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-96406187 @chenghao-intel thanks for suggestion. This indeed is just a quick fixing. Since in these test cases, users don't indicate SerDe to use, I will investigate if there is

[GitHub] spark pull request: [SPARK-7142][SQL]: Minor enhancement to Boolea...

2015-04-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/5700#discussion_r29110184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -413,6 +418,10 @@ object BooleanSimplification

[GitHub] spark pull request: [SPARK-7153][SQL] support long type ordinal in...

2015-04-26 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/5706#issuecomment-96392010 Jenkins test it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-26 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-96392452 @viirya thanks for so quick fixing, but my concern on `ScriptTransformation` is we don't use the `SerDe` or `InputFormat/OutputFormat` at all, and it seems a hack

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 non...

2015-04-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29111497 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -556,6 +579,28 @@ class SparseVector( i += 1 } }

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 non...

2015-04-26 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29111498 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -63,20 +63,27 @@ sealed trait Vector extends Serializable {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-96392337 [Test build #30958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30958/consoleFull) for PR 5707 at commit

[GitHub] spark pull request: [SPARK-7142][SQL]: Minor enhancement to Boolea...

2015-04-26 Thread saucam
Github user saucam commented on a diff in the pull request: https://github.com/apache/spark/pull/5700#discussion_r29110517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -413,6 +418,10 @@ object BooleanSimplification extends

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-26 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96406891 Sorry, I only verified SVM and thought logistic regression was implemented the same way. For SVM, could we try `normalizationMethod = none` and set the threshold as the

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96365828 @shenh062326 This still doesn't compile though, see the test output. ``` [error]

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96365871 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7153][SQL] support long type ordinal in...

2015-04-26 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/5706 [SPARK-7153][SQL] support long type ordinal in GetItem You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark 7153 Alternatively

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread Lewuathe
GitHub user Lewuathe opened a pull request: https://github.com/apache/spark/pull/5707 [SPARK-6263] Python MLlib API missing items: Utils Implement missing API in pyspark. MLUtils * appendBias * loadVectors `kFold` is also missing however I am not sure

[GitHub] spark pull request: [SPARK-5155] [PySpark] [Streaming] Mqtt stream...

2015-04-26 Thread prabeesh
Github user prabeesh commented on the pull request: https://github.com/apache/spark/pull/4229#issuecomment-96364287 @tdas please review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5679#issuecomment-96365916 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369904 @andrewor14 for second question,i add two things for it.one is i add zip pyspark archives to pyspark/lib when we build spark jar. other is in submit if

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96359704 **[Test build #30956 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30956/consoleFull)** for PR 5704 at commit

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6443][Spark Submit]Could not submit app...

2015-04-26 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5116#issuecomment-96374948 ping @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7031][ThriftServer]let thrift server ta...

2015-04-26 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5609#issuecomment-96374926 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-7086][Deploy]Do not retry when public s...

2015-04-26 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5657#issuecomment-96377454 Considering one condition: user submit apps to master with a port config, let's say `spark://somehost:7077`, and let workers connect to master same way. Once

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5705#issuecomment-96366127 [Test build #30957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30957/consoleFull) for PR 5705 at commit

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326
Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96369283 Thanks, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369994 @tgravescs i think this PR is useful for you. you can try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116119 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -68,6 +52,8 @@ class LDA private ( def this() = this(k = 10,

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116122 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116118 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -42,17 +37,6 @@ import org.apache.spark.util.Utils * - token:

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116124 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116126 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116125 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116128 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116130 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116132 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 ent...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5697#issuecomment-96451312 [Test build #712 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/712/consoleFull) for PR 5697 at commit

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-96450346 [Test build #711 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/711/consoleFull) for PR 5685 at commit

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread piaozhexiu
Github user piaozhexiu commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96448175 @sryza , thank you for the patch. I tried it with my queries, and it works very well. I look forward to getting this issue fixed in 1.3 branch. --- If your project

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5661#issuecomment-96448979 Sorry for the delay! I'll review the PR now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5661#issuecomment-96450719 @hhbyyh Thanks for the PR! It looks good, except for 1 item on which I think we weren't clear before: I meant for us to separate the Optimizer and

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 ent...

2015-04-26 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29116184 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -556,6 +579,28 @@ class SparseVector( i += 1 }

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96451839 partition id doesn't change between retries, does it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5709#discussion_r29116363 --- Diff: python/pyspark/sql/functions.py --- @@ -103,8 +103,28 @@ def countDistinct(col, *cols): return Column(jc) +def

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5709#discussion_r29116369 --- Diff: python/pyspark/sql/functions.py --- @@ -103,8 +103,28 @@ def countDistinct(col, *cols): return Column(jc) +def

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113464 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113581 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113605 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113602 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96436586 Could it be confusing to users that the ID associated with each record might be different on stage or task retries? The fact that ordering within a partition is not

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-96436773 By the way - if we did end up deciding to include this, I do feel that: 1. We should not mark this as solving SPARK-3376 (the goal there was to build a

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111863 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112442 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112530 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-7155] [CORE] Allow newAPIHadoopFile to ...

2015-04-26 Thread yongtang
GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/5708 [SPARK-7155] [CORE] Allow newAPIHadoopFile to support comma-separated list of files as input See JIRA: https://issues.apache.org/jira/browse/SPARK-7155 SparkContext's newAPIHadoopFile()

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96430170 [Test build #30959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30959/consoleFull) for PR 5709 at commit

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113435 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96431732 [Test build #30959 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30959/consoleFull) for PR 5709 at commit

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113570 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113616 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111872 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111967 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112203 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

2015-04-26 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5577#issuecomment-96422091 Seems good to me - @rxin any comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/5709 [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark inc-id

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111609 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111716 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111886 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112308 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112469 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -53,9 +53,14 @@ private[spark] abstract class BlockObjectWriter(val

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112559 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C](

  1   2   3   >