[GitHub] spark pull request: [SPARK-8137][core] Improve treeAggregate to co...

2015-07-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7461#issuecomment-122347375 Small thing, but can the title of this PR be changed to ...combine all data on *each* executor... to clarify that all the data isn't going to a single executor

[GitHub] spark pull request: [SPARK-8674] [MLlib] [WIP] Implementation of a...

2015-07-13 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7075#issuecomment-121080323 Hey @josepablocam can you rebase this on current master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-8958] Dynamic allocation: change cached...

2015-07-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7329#issuecomment-120138836 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-8884] [MLlib] [WIP] 1-sample Anderson-D...

2015-07-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7278#issuecomment-120179329 It's also worth mentioning that it's implemented in the common stats packages for scipy: http://docs.scipy.org/doc/scipy-0.15.1/reference/generated

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r34200738 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8884] [MLlib] 1-sample Anderson-Darling...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/7278#discussion_r34201295 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala --- @@ -0,0 +1,269 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r34200807 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r34200842 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r34200828 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8884] [MLlib] 1-sample Anderson-Darling...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/7278#discussion_r34196348 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +153,166 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8884] [MLlib] 1-sample Anderson-Darling...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/7278#discussion_r34196274 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +153,166 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8884] [MLlib] 1-sample Anderson-Darling...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/7278#discussion_r34201231 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala --- @@ -0,0 +1,269 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8884] [MLlib] 1-sample Anderson-Darling...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/7278#discussion_r34201204 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala --- @@ -0,0 +1,269 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214315 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34213732 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -509,6 +514,10 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34213924 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214220 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214716 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -66,6 +66,12 @@ class

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34215261 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214513 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -872,6 +872,25 @@ class DAGScheduler( // will be posted, which

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34215043 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34215363 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214210 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214494 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -872,6 +872,25 @@ class DAGScheduler( // will be posted, which

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34214893 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-07-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r34215406 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119344309 Is my understanding correct that, with this shuffle manager, we wouldn't be able to do reduce-side that sorts records, or any map-side spilling? --- If your project

[GitHub] spark pull request: [SPARK-7263] Add new shuffle manager which sto...

2015-07-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7265#issuecomment-119350825 Having the Parquet shuffle reader follow that pattern seems preferable to me over failing when spilling would be required. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33515526 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-4069] [YARN] When AppMaster finishes, t...

2015-06-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5233#issuecomment-116877465 There's no bug on the Spark side. @PraveenSeluka is this still something you're running into? If this is causing pain to a lot of users it could be worth the workaround

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6994#issuecomment-116876951 This LGTM. Can you file a separate JIRA to add Python support? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-8374] [YARN] Job frequently hangs after ...

2015-06-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7083#issuecomment-116882631 Hi @xuchenCN what makes you think this is the right fix? We already decrement `numExecutorsRunning` when a container is killed, so I think doing it again would be double

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-115565919 Hi @watermen, thanks for reporting this. Does the error occur every time or just occasionally? What Hadoop version are you running? --- If your project is set up

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33394080 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33394012 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33393943 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33394039 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33393914 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,191 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6994#issuecomment-115876109 This looks great, just had a few more minor style comments. Can you also add some documentation here: https://spark.apache.org/docs/latest/mllib-statistics.html

[GitHub] spark pull request: SPARK-8623. Hadoop RDDs fail to properly seria...

2015-06-26 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/7050 SPARK-8623. Hadoop RDDs fail to properly serialize configuration You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-8623

[GitHub] spark pull request: SPARK-8623. Hadoop RDDs fail to properly seria...

2015-06-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/7050#issuecomment-115910651 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33315886 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33316253 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33315534 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala --- @@ -28,7 +28,10 @@ private[spark] trait ExecutorAllocationClient

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317386 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317505 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317634 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317476 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -149,7 +159,13 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321039 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321013 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -158,4 +158,25 @@ object Statistics { def chiSqTest(data: RDD

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321093 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321019 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321567 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33316936 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317230 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33317350 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala --- @@ -0,0 +1,147 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33321131 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -90,3 +90,18 @@ class ChiSqTestResult private[stat] (override val

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33194722 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala --- @@ -25,6 +25,7 @@ import org.apache.hadoop.net.DNSToSwitchMapping

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33195618 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33195574 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33191664 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -158,4 +158,44 @@ object Statistics { def chiSqTest(data: RDD

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33195453 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r33196367 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6994#issuecomment-115113205 jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33224064 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33223955 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8402][MLLIB] DP Means Clustering

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6880#discussion_r33174171 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeansModel.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33188946 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33188987 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189226 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189172 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189336 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189576 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -153,4 +157,61 @@ class HypothesisTestSuite extends SparkFunSuite

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189552 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala --- @@ -19,6 +19,10 @@ package org.apache.spark.mllib.stat

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33189747 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala --- @@ -158,4 +158,44 @@ object Statistics { def chiSqTest(data: RDD

[GitHub] spark pull request: [SPARK-8598] [MLlib] Implementation of 1-sampl...

2015-06-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6994#discussion_r33190190 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala --- @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-8402][MLLIB] DP Means Clustering

2015-06-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6880#discussion_r32793919 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeansModel.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r32776975 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-113322897 @JoshRosen this should be ready for merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6394#issuecomment-112981135 Hi @jerryshao is this ready for review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-10 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-110994319 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-110566851 @shivaram @kayousterhout this approach addresses my concerns. Thanks for updating! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-110571366 I definitely don't think we rely on it in Spark. On Cloudera setups, as well as presumably Hortonworks and MapR setups, client configurations are synchronized globally

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31962024 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -438,6 +519,15 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31973076 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -225,12 +243,74 @@ private[yarn] class YarnAllocator( logInfo

[GitHub] spark pull request: [SPARK-4352][YARN][WIP] Incorporate locality p...

2015-06-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/6394#discussion_r31972804 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -225,12 +243,74 @@ private[yarn] class YarnAllocator( logInfo

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-109890835 Ah, yeah, that was the change I was referring to. I'm not sure about the Mesos deployment model, but on Standalone mode at least, it would be possible

[GitHub] spark pull request: [SPARK-8099] set executor cores into system in...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6643#issuecomment-109383319 LGTM, merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-109397104 When there is no skew, are there situations where this would lead to worse performance? E.g. will it make tasks bunch up on nodes more than before and / or result

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-109398051 Also, is there an intuitive justification for why 5 is a good number? It seems a little weird to me that it's independent of the number of tasks, the cluster size

[GitHub] spark pull request: [SPARK-7699][Core] Lazy start the scheduler fo...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6430#issuecomment-109395777 This looks right to me. Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-109450997 Ahhh, I misunderstood and thought that all the reduce tasks for a stage got the same locality preferences. I withdraw my concern. --- If your project is set up

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-109445990 On the other side, I can also envision situations where it would be really helpful to request a higher number of preferred locations. Consider a large YARN cluster where

[GitHub] spark pull request: [SPARK-2774] Set preferred locations for reduc...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6652#issuecomment-109444936 My worry is the following situation: * Map outputs from stage 1 exhibit little skew and are distributed evenly across nodes. * 5 nodes are chosen as preferred

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-109493444 Where do we end up cloning Configuration objects? With these changes, we avoid loading defaults when we reconstitute Configuration objects from bytes. Are there hot

[GitHub] spark pull request: SPARK-8135. In SerializableWritable, don't loa...

2015-06-05 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/6679 SPARK-8135. In SerializableWritable, don't load defaults when instant… …iating Configuration You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-8136][YARN] Fix flakiness in YarnCluste...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6680#issuecomment-109502268 Does this comment still apply: `// If we are running in yarn-cluster mode, verify that driver logs are downloadable.`? --- If your project is set up for it, you can

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-109502593 Makes sense. In that case, this should be ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-109499883 In light of this change, do you think we should remove the broadcasting of Configurations? While we avoid the much larger cost of reading and parsing XML for each task

[GitHub] spark pull request: SPARK-8135. Don't load defaults when reconstit...

2015-06-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/6679#issuecomment-109499640 There's a situation in which there could be a behavior change in situations where the executor somehow has a different Hadoop configuration file than the driver. But I

<    1   2   3   4   5   6   7   8   9   10   >