[GitHub] spark pull request: [SPARK-3952] [Streaming] [PySpark] add Python ...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2808#issuecomment-59600543 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/397/consoleFull) for PR 2808 at commit [`26a7e37`](https://github.com/a

[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2806#issuecomment-59600547 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/398/consoleFull) for PR 2806 at commit [`09561e8`](https://github.com/a

[GitHub] spark pull request: [SPARK-3720][SQL]initial support ORC in spark ...

2014-10-17 Thread scwf
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/2576#discussion_r19051903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableOperations.scala --- @@ -504,19 +505,41 @@ private[parquet] object FileSystemHelper {

[GitHub] spark pull request: [WIP][SPARK-3822] Executor scaling mechanism f...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2840#issuecomment-59598147 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull) for PR 2840 at commit [`d987b3e`](https://github.com/a

[GitHub] spark pull request: [WIP][SPARK-3822] Executor scaling mechanism f...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2840#issuecomment-59598148 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59598071 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59598067 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21873/consoleFull) for PR 2828 at commit [`fe0e02f`](https://github.com/a

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59597958 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21872/consoleFull) for PR 2838 at commit [`8872914`](https://github.com/a

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59597959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [WIP][SPARK-3822] Executor scaling mechanism f...

2014-10-17 Thread PraveenSeluka
Github user PraveenSeluka commented on the pull request: https://github.com/apache/spark/pull/2840#issuecomment-59597063 Hey @andrewor14, One quick comment on the API. Instead of `killExecutor(executorId: String)`. It will be better to have `killExecutors(executorIds: List[String])`.

[GitHub] spark pull request: [WIP][SPARK-3822] Executor scaling mechanism f...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2840#issuecomment-59596681 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21874/consoleFull) for PR 2840 at commit [`d987b3e`](https://github.com/ap

[GitHub] spark pull request: [Spark-3822] Ability to add/delete executors f...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2798#issuecomment-59596617 Hey @PraveenSeluka I opened a PR at #2840. Let me know if you have any questions or comments. Thanks for your work! --- If your project is set up for it, you can repl

[GitHub] spark pull request: [SPARK-3736] Workers reconnect when disassocia...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2828#issuecomment-59596562 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21873/consoleFull) for PR 2828 at commit [`fe0e02f`](https://github.com/ap

[GitHub] spark pull request: [WIP][SPARK-3822] Executor scaling mechanism f...

2014-10-17 Thread andrewor14
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/2840 [WIP][SPARK-3822] Executor scaling mechanism for Yarn This is part of a broader effort to enable dynamic scaling of executors ([SPARK-3174](https://issues.apache.org/jira/browse/SPARK-3174)). Thi

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59596462 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21872/consoleFull) for PR 2838 at commit [`8872914`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3877][YARN] Throw an exception when app...

2014-10-17 Thread zsxwing
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/2732#issuecomment-59596417 Already updated the docs and the failure message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59596407 @aarondav Yes, before reuse workers, every python task will fork a new python worker. --- If your project is set up for it, you can reply to this email and have your repl

[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2743#discussion_r19051286 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -63,9 +64,12 @@ private[spark] class PythonRDD( val localdir = env.bl

[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-17 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2743#discussion_r19051269 --- Diff: python/pyspark/conf.py --- @@ -57,6 +57,22 @@ __all__ = ['SparkConf'] +def _parse_memory(s): +""" +Parse a memory

[GitHub] spark pull request: Minor change in the comment of spark-defaults....

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2709#issuecomment-59596156 Hey @dbtsai can you update this now that #2379 has gone in? In particular this is now used by the Spark daemons too (i.e. Worker, Master, HistoryServer). I'm don't fee

[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2743#discussion_r19051237 --- Diff: python/pyspark/conf.py --- @@ -57,6 +57,22 @@ __all__ = ['SparkConf'] +def _parse_memory(s): +""" +Parse a me

[GitHub] spark pull request: [SPARK-3888] [PySpark] limit the memory used b...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2743#discussion_r19051234 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -63,9 +64,12 @@ private[spark] class PythonRDD( val localdir = en

[GitHub] spark pull request: [SPARK-3970] Remove duplicate removal of local...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2826#issuecomment-59595864 LGTM, other comments @srowen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59595776 LGTM. I think @mateiz wrote the original code so maybe he can take a look. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59595763 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21871/consoleFull) for PR 2838 at commit [`660875b`](https://github.com/a

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59595770 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2839#discussion_r19051180 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -238,8 +238,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) @

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2839#discussion_r19051176 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -911,32 +911,15 @@ abstract class RDD[T: ClassTag]( } /** - *

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/2839#discussion_r19051173 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -238,8 +238,15 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)]) @

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59594861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59594858 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21869/consoleFull) for PR 2839 at commit [`e1f06d3`](https://github.com/a

[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2825#issuecomment-59594302 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21870/consoleFull) for PR 2825 at commit [`abbc53c`](https://github.com/a

[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2825#issuecomment-59594304 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59594214 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21871/consoleFull) for PR 2838 at commit [`660875b`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59594220 Just for my understanding, is this solution that take() will cause workers to die rather than be reused with bad data in the socket? --- If your project is set up for i

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59594023 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-3935][Core] log the number of records t...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2791#issuecomment-59593868 Ok, I'll put you under wangfei --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59592259 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21869/consoleFull) for PR 2839 at commit [`e1f06d3`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2825#issuecomment-59592260 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21870/consoleFull) for PR 2825 at commit [`abbc53c`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3935][Core] log the number of records t...

2014-10-17 Thread jackylk
Github user jackylk commented on the pull request: https://github.com/apache/spark/pull/2791#issuecomment-59592210 For this PR, I used wangfei's account in JIRA. I will create my account next time. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-17 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2746#issuecomment-59592148 Hi all. I have discussed the design offline with @kayousterhout and @pwendell and we have come to the following high level consensus: - We should treat add as

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59591925 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/2825#discussion_r19050137 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -28,7 +28,9 @@ import org.apache.spark.sql.catalyst.plans

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59591794 **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21867/consoleFull)** for PR 2839 at commit [`e1f06d3`](https://github.com/apac

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59591796 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3969][SQL] Optimizer should have a supe...

2014-10-17 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/2825#discussion_r19050063 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ExpressionOptimizationSuite.scala --- @@ -30,7 +30,7 @@ class ExpressionOptimiza

[GitHub] spark pull request: [SPARK-3207][MLLIB]Choose splits for continuou...

2014-10-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2780#issuecomment-59590801 @chouqin Sorry for the slow response! About the RandomForestSuite failure: The change to fix the failure (maxBins) is OK with me. It is a somewhat brittle tes

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2753#issuecomment-59590626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2753#issuecomment-59590621 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21868/consoleFull) for PR 2753 at commit [`ccd4959`](https://github.com/a

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2607#issuecomment-59589703 @manishamde Sorry for the delay; the code is looking good! I made some small comments inline. My main overall comment is about specifying parameters. How would it b

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049181 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -73,7 +115,8 @@ private[tree] object BaggedPoint { }

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049189 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049198 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/GradientBoostingModel.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache So

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049194 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049186 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LeastSquaresError.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Softwar

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049179 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -47,20 +48,61 @@ private[tree] object BaggedPoint { * Conve

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049183 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LeastAbsoluteError.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Softwa

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049168 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049191 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/Loss.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049196 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/model/GradientBoostingModel.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache So

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049170 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049176 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/impl/BaggedPoint.scala --- @@ -47,20 +48,61 @@ private[tree] object BaggedPoint { * Conve

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049187 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049173 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] spark pull request: [MLLIB] [WIP] SPARK-1547: Adding Gradient Boos...

2014-10-17 Thread jkbradley
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/2607#discussion_r19049169 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/GradientBoosting.scala --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19047629 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59585743 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59585737 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21866/consoleFull) for PR 2684 at commit [`f14f259`](https://github.com/a

[GitHub] spark pull request: [SPARK-3250] Implement Gap Sampling optimizati...

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2455#discussion_r19047174 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -53,56 +89,238 @@ trait RandomSampler[T, U] extends Pseudorandom with Cl

[GitHub] spark pull request: [SPARK-3453] Netty-based BlockTransferService,...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2753#issuecomment-59584787 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21868/consoleFull) for PR 2753 at commit [`ccd4959`](https://github.com/ap

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046954 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046956 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046949 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046952 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046955 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046944 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046948 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: SPARK-3568 [mllib] add ranking metrics

2014-10-17 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/2667#discussion_r19046940 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RankingMetrics.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59584036 @pwendell If you get a chance, PTAL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread aarondav
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/2839#discussion_r19046865 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -911,32 +911,15 @@ abstract class RDD[T: ClassTag]( } /** - * Re

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2839#issuecomment-59583927 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21867/consoleFull) for PR 2839 at commit [`e1f06d3`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3994] Use standard Aggregator code path...

2014-10-17 Thread aarondav
GitHub user aarondav opened a pull request: https://github.com/apache/spark/pull/2839 [SPARK-3994] Use standard Aggregator code path for countByKey and countByValue See [JIRA](https://issues.apache.org/jira/browse/SPARK-3994) for more information. Also adds a note which warns a

[GitHub] spark pull request: [SPARK-3934] [SPARK-3918] [mllib] Bug fixes fo...

2014-10-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2785 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-3934] [SPARK-3918] [mllib] Bug fixes fo...

2014-10-17 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2785#issuecomment-59582528 LGTM. Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...

2014-10-17 Thread ziky90
Github user ziky90 commented on the pull request: https://github.com/apache/spark/pull/2836#issuecomment-59582401 Ok thank you. Now I can see it. Based on this I also think that it'd need much more effort than I previously thought to do the bootstrap script execution in a robu

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread ash211
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59581783 More flavor on the perf numbers was we ran 6 jobs in a row before and after (starting up a new driver on each job), discarded the first run, and took the average of the re

[GitHub] spark pull request: [SPARK-3985] [Examples] fix file path using os...

2014-10-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2834 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-3985] [Examples] fix file path using os...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2834#issuecomment-59581146 LGTM. Thanks for catching this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does no

[GitHub] spark pull request: [SPARK-3952] [Streaming] [PySpark] add Python ...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2808#issuecomment-59581079 Found one more issue (sorry, hopefully this is the last one): ![image](https://cloud.githubusercontent.com/assets/50748/4686555/40c48ce0-5647-11e4-99b5-45f03321

[GitHub] spark pull request: [SPARK-3985] [Examples] fix file path using os...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2834#discussion_r19045505 --- Diff: examples/src/main/python/sql.py --- @@ -48,7 +48,7 @@ # A JSON dataset is pointed to by path. # The path can be either a si

[GitHub] spark pull request: [SPARK-3989]Added possibility to directly inst...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2836#issuecomment-59580682 > Could you please give me an example if you see some option where possibly might be the problem. If you look at the documentation of the `ssh`, function, it sa

[GitHub] spark pull request: [SPARK-3985] [Examples] fix file path using os...

2014-10-17 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2834#discussion_r19045058 --- Diff: examples/src/main/python/sql.py --- @@ -48,7 +48,7 @@ # A JSON dataset is pointed to by path. # The path can be either a singl

[GitHub] spark pull request: [SPARK-3985] [Examples] fix file path using os...

2014-10-17 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request: https://github.com/apache/spark/pull/2834#discussion_r19045017 --- Diff: examples/src/main/python/sql.py --- @@ -48,7 +48,7 @@ # A JSON dataset is pointed to by path. # The path can be either a

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread frydawg524
Github user frydawg524 commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59578918 @JoshRosen, Awesome! Thanks for helping out with this. I'll make sure that this gets broadcasted to my team. Zach --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-3993] [PySpark] fix bug while reuse wor...

2014-10-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2838#issuecomment-59578891 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21

[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...

2014-10-17 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2806#issuecomment-59578404 @tdas Could you help to review this? The failed tests run stable locally, I'm investigating it. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread JoshRosen
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59578207 @frydawg524 Thanks for testing this out! I'm glad to hear that it solves the bug. I just pushed a new commit which adds a configuration option (`spark.hadoop.

[GitHub] spark pull request: [SPARK-2546] Clone JobConf for each task (bran...

2014-10-17 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2684#issuecomment-59578047 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21866/consoleFull) for PR 2684 at commit [`f14f259`](https://github.com/ap

[GitHub] spark pull request: [SPARK-3855][SQL] Preserve the result attribut...

2014-10-17 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2717 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enab

[GitHub] spark pull request: [SPARK-3855][SQL] Preserve the result attribut...

2014-10-17 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2717#issuecomment-59576965 Yes, please do. On Oct 17, 2014 5:10 PM, "Patrick Wendell" wrote: > I'd like to pull this in - is that alright @marmbrus > ?

  1   2   3   >