date:20150426

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107415 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107413 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107419 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107422 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5705#issuecomment-96350256 [Test build #30957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30957/consoleFull) for PR 5705 at commit

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29107445 --- Diff: dev/run-tests --- @@ -17,239 +17,394 @@ # limitations under the License. # -# Go to the Spark project root directory

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326

Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/5608#discussion_r29107048 --- Diff: core/src/main/scala/org/apache/spark/util/SizeEstimator.scala --- @@ -204,25 +204,36 @@ private[spark] object SizeEstimator extends Logging {

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-9633 [Test build #710 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/710/consoleFull) for PR 5687 at commit

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107372 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107373 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107408 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107416 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: SPARK-6954. ExecutorAllocationManager can end ...

2015-04-26 Thread sryza

GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/5704 SPARK-6954. ExecutorAllocationManager can end up requesting a negative n... ...umber of executors You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107399 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -89,6 +89,12 @@ abstract class BinaryExpression extends

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107396 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107402 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-26 Thread sryza

Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-96334096 #5704 demonstrates what I outlined in my above comment, and should supersede this PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96336202 [Test build #30956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30956/consoleFull) for PR 5704 at commit

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5687#issuecomment-96344428 [Test build #710 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/710/consoleFull) for PR 5687 at commit

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5705#discussion_r29107784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -317,18 +331,13 @@ object functions { def not(e: Column): Column = !e

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread rxin

GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/5705 [SPARK-7152][SQL] Add a Column expression for partition ID. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark df-pid Alternatively you

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107367 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -331,4 +331,186 @@ class ColumnExpressionSuite extends QueryTest {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107363 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/TestData.scala --- @@ -57,6 +58,15 @@ object TestData { TestData2(3, 2) :: Nil, 2).toDF()

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-96388201 [Test build #30958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30958/consoleFull) for PR 5707 at commit

[GitHub] spark pull request: [ML][SPARK-6529] Add Word2Vec transformer

2015-04-26 Thread yinxusen

Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/5596#issuecomment-96392739 @mengxr I have merged it with #5626. You can retest it when possible. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [MLLIB] SPARK-4231, SPARK-3066: Add RankingMet...

2015-04-26 Thread debasish83

Github user debasish83 commented on the pull request: https://github.com/apache/spark/pull/3098#issuecomment-96403986 was very last few weeks...update it in next few days... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-26 Thread viirya

Github user viirya commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-96406187 @chenghao-intel thanks for suggestion. This indeed is just a quick fixing. Since in these test cases, users don't indicate SerDe to use, I will investigate if there is

[GitHub] spark pull request: [SPARK-7142][SQL]: Minor enhancement to Boolea...

2015-04-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/5700#discussion_r29110184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -413,6 +418,10 @@ object BooleanSimplification

[GitHub] spark pull request: [SPARK-7153][SQL] support long type ordinal in...

2015-04-26 Thread cloud-fan

Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/5706#issuecomment-96392010 Jenkins test it please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-04-26 Thread chenghao-intel

Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-96392452 @viirya thanks for so quick fixing, but my concern on `ScriptTransformation` is we don't use the `SerDe` or `InputFormat/OutputFormat` at all, and it seems a hack

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 non...

2015-04-26 Thread mengxr

Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29111497 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -556,6 +579,28 @@ class SparseVector( i += 1 } }

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 non...

2015-04-26 Thread mengxr

Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29111498 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -63,20 +63,27 @@ sealed trait Vector extends Serializable {

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29107395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5707#issuecomment-96392337 [Test build #30958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30958/consoleFull) for PR 5707 at commit

[GitHub] spark pull request: [SPARK-7142][SQL]: Minor enhancement to Boolea...

2015-04-26 Thread saucam

Github user saucam commented on a diff in the pull request: https://github.com/apache/spark/pull/5700#discussion_r29110517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -413,6 +418,10 @@ object BooleanSimplification extends

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-26 Thread mengxr

Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96406891 Sorry, I only verified SVM and thought logistic regression was implemented the same way. For SVM, could we try `normalizationMethod = none` and set the threshold as the

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread srowen

Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96365828 @shenh062326 This still doesn't compile though, see the test output. ``` [error]

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread srowen

Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96365871 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7153][SQL] support long type ordinal in...

2015-04-26 Thread cloud-fan

GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/5706 [SPARK-7153][SQL] support long type ordinal in GetItem You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark 7153 Alternatively

[GitHub] spark pull request: [SPARK-6263] Python MLlib API missing items: U...

2015-04-26 Thread Lewuathe

GitHub user Lewuathe opened a pull request: https://github.com/apache/spark/pull/5707 [SPARK-6263] Python MLlib API missing items: Utils Implement missing API in pyspark. MLUtils * appendBias * loadVectors `kFold` is also missing however I am not sure

[GitHub] spark pull request: [SPARK-5155] [PySpark] [Streaming] Mqtt stream...

2015-04-26 Thread prabeesh

Github user prabeesh commented on the pull request: https://github.com/apache/spark/pull/4229#issuecomment-96364287 @tdas please review this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-26 Thread srowen

Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5679#issuecomment-96365916 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang

Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369904 @andrewor14 for second question,i add two things for it.one is i add zip pyspark archives to pyspark/lib when we build spark jar. other is in submit if

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96359704 **[Test build #30956 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30956/consoleFull)** for PR 5704 at commit

[GitHub] spark pull request: [Minor][MLLIB] Refactor toString method in MLL...

2015-04-26 Thread asfgit

Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-6443][Spark Submit]Could not submit app...

2015-04-26 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5116#issuecomment-96374948 ping @andrewor14 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7031][ThriftServer]let thrift server ta...

2015-04-26 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5609#issuecomment-96374926 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-7086][Deploy]Do not retry when public s...

2015-04-26 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5657#issuecomment-96377454 Considering one condition: user submit apps to master with a port config, let's say `spark://somehost:7077`, and let workers connect to master same way. Once

[GitHub] spark pull request: [SPARK-7152][SQL] Add a Column expression for ...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5705#issuecomment-96366127 [Test build #30957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30957/consoleFull) for PR 5705 at commit

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-26 Thread shenh062326

Github user shenh062326 commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-96369283 Thanks, I will fix it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-26 Thread lianhuiwang

Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/5580#issuecomment-96369994 @tgravescs i think this PR is useful for you. you can try it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116119 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -68,6 +52,8 @@ class LDA private ( def this() = this(k = 10,

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116122 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116118 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -42,17 +37,6 @@ import org.apache.spark.util.Utils * - token:

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116124 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116126 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116125 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDA.scala --- @@ -220,6 +206,38 @@ class LDA private ( this } +

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116128 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116130 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5661#discussion_r29116132 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 ent...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5697#issuecomment-96451312 [Test build #712 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/712/consoleFull) for PR 5697 at commit

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-96450346 [Test build #711 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/711/consoleFull) for PR 5685 at commit

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-26 Thread piaozhexiu

Github user piaozhexiu commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-96448175 @sryza , thank you for the patch. I tried it with my queries, and it works very well. I look forward to getting this issue fixed in 1.3 branch. --- If your project

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5661#issuecomment-96448979 Sorry for the delay! I'll review the PR now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [Spark-7090][MLlib] Introduce LDAOptimizer to ...

2015-04-26 Thread jkbradley

Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5661#issuecomment-96450719 @hhbyyh Thanks for the PR! It looks good, except for 1 item on which I think we weren't clear before: I meant for us to separate the Optimizer and

[GitHub] spark pull request: [SPARK-7140][MLLIB] only scan the first 16 ent...

2015-04-26 Thread jkbradley

Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/5697#discussion_r29116184 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala --- @@ -556,6 +579,28 @@ class SparseVector( i += 1 }

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin

Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96451839 partition id doesn't change between retries, does it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5709#discussion_r29116363 --- Diff: python/pyspark/sql/functions.py --- @@ -103,8 +103,28 @@ def countDistinct(col, *cols): return Column(jc) +def

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin

Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5709#discussion_r29116369 --- Diff: python/pyspark/sql/functions.py --- @@ -103,8 +103,28 @@ def countDistinct(col, *cols): return Column(jc) +def

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113464 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113581 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113605 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113602 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread pwendell

Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96436586 Could it be confusing to users that the ID associated with each record might be different on stage or task retries? The fact that ordering within a partition is not

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-26 Thread pwendell

Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-96436773 By the way - if we did end up deciding to include this, I do feel that: 1. We should not mark this as solving SPARK-3376 (the goal there was to build a

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111863 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread sryza

Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112442 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112530 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-7155] [CORE] Allow newAPIHadoopFile to ...

2015-04-26 Thread yongtang

GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/5708 [SPARK-7155] [CORE] Allow newAPIHadoopFile to support comma-separated list of files as input See JIRA: https://issues.apache.org/jira/browse/SPARK-7155 SparkContext's newAPIHadoopFile()

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96430170 [Test build #30959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30959/consoleFull) for PR 5709 at commit

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113435 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread SparkQA

Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5709#issuecomment-96431732 [Test build #30959 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30959/consoleFull) for PR 5709 at commit

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113570 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29113616 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111872 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111967 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112203 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

2015-04-26 Thread pwendell

Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/5577#issuecomment-96422091 Seems good to me - @rxin any comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-7135][SQL] DataFrame expression for mon...

2015-04-26 Thread rxin

GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/5709 [SPARK-7135][SQL] DataFrame expression for monotonically increasing IDs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark inc-id

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111609 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111716 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-26 Thread nchammas

Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/5694#discussion_r29111886 --- Diff: dev/run-tests.py --- @@ -0,0 +1,417 @@ +#!/usr/bin/env python + +# +# Licensed to the Apache Software Foundation (ASF) under one or

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112308 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112469 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala --- @@ -53,9 +53,14 @@ private[spark] abstract class BlockObjectWriter(val

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread pwendell

Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112559 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C](

1 2 3 >

1 - 100 of 202 matches

Mail list logo