Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7461#issuecomment-122347375
Small thing, but can the title of this PR be changed to ...combine all
data on *each* executor... to clarify that all the data isn't going to a
single executor
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7075#issuecomment-121080323
Hey @josepablocam can you rebase this on current master?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7329#issuecomment-120138836
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7278#issuecomment-120179329
It's also worth mentioning that it's implemented in the common stats
packages for scipy:
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r34200738
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/7278#discussion_r34201295
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r34200807
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r34200842
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r34200828
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +157,101 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/7278#discussion_r34196348
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +153,166 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/7278#discussion_r34196274
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +153,166 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/7278#discussion_r34201231
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/7278#discussion_r34201204
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/ADTest.scala ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214315
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34213732
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -509,6 +514,10 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34213924
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214220
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214716
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
---
@@ -66,6 +66,12 @@ class
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34215261
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214513
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -872,6 +872,25 @@ class DAGScheduler(
// will be posted, which
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34215043
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34215363
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214210
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
@@ -627,6 +641,29 @@ private[spark] class ExecutorAllocationManager
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214494
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -872,6 +872,25 @@ class DAGScheduler(
// will be posted, which
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34214893
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r34215406
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala
---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7265#issuecomment-119344309
Is my understanding correct that, with this shuffle manager, we wouldn't be
able to do reduce-side that sorts records, or any map-side spilling?
---
If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7265#issuecomment-119350825
Having the Parquet shuffle reader follow that pattern seems preferable to
me over failing when spilling would be required.
---
If your project is set up for it, you can
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33515526
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/5233#issuecomment-116877465
There's no bug on the Spark side. @PraveenSeluka is this still something
you're running into? If this is causing pain to a lot of users it could be
worth the workaround
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6994#issuecomment-116876951
This LGTM. Can you file a separate JIRA to add Python support?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7083#issuecomment-116882631
Hi @xuchenCN what makes you think this is the right fix? We already
decrement `numExecutorsRunning` when a container is killed, so I think doing it
again would be double
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-115565919
Hi @watermen, thanks for reporting this. Does the error occur every time
or just occasionally? What Hadoop version are you running?
---
If your project is set up
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33394080
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33394012
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33393943
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33394039
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33393914
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,191 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6994#issuecomment-115876109
This looks great, just had a few more minor style comments. Can you also
add some documentation here:
https://spark.apache.org/docs/latest/mllib-statistics.html
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/7050
SPARK-8623. Hadoop RDDs fail to properly serialize configuration
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-8623
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/7050#issuecomment-115910651
jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33315886
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33316253
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33315534
--- Diff:
core/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala ---
@@ -28,7 +28,10 @@ private[spark] trait ExecutorAllocationClient
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317386
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317505
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317634
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317476
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -149,7 +159,13 @@ private[yarn] class YarnAllocator
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321039
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321013
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -158,4 +158,25 @@ object Statistics {
def chiSqTest(data: RDD
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321093
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321019
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321567
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33316936
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317230
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33317350
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategySuite.scala
---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33321131
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -90,3 +90,18 @@ class ChiSqTestResult private[stat] (override val
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33194722
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala ---
@@ -25,6 +25,7 @@ import org.apache.hadoop.net.DNSToSwitchMapping
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33195618
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33195574
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33191664
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -158,4 +158,44 @@ object Statistics {
def chiSqTest(data: RDD
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33195453
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r33196367
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,200 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6994#issuecomment-115113205
jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33224064
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33223955
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6880#discussion_r33174171
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeansModel.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33188946
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33188987
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189226
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189172
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189336
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189576
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -153,4 +157,61 @@ class HypothesisTestSuite extends SparkFunSuite
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189552
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/stat/HypothesisTestSuite.scala ---
@@ -19,6 +19,10 @@ package org.apache.spark.mllib.stat
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33189747
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/Statistics.scala
---
@@ -158,4 +158,44 @@ object Statistics {
def chiSqTest(data: RDD
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6994#discussion_r33190190
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/test/KSTest.scala ---
@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6880#discussion_r32793919
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeansModel.scala ---
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r32776975
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/ContainerPlacementStrategy.scala
---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-113322897
@JoshRosen this should be ready for merge
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6394#issuecomment-112981135
Hi @jerryshao is this ready for review?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-110994319
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-110566851
@shivaram @kayousterhout this approach addresses my concerns. Thanks for
updating!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-110571366
I definitely don't think we rely on it in Spark. On Cloudera setups, as
well as presumably Hortonworks and MapR setups, client configurations are
synchronized globally
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r31962024
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -438,6 +519,15 @@ private[yarn] class YarnAllocator
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r31973076
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -225,12 +243,74 @@ private[yarn] class YarnAllocator(
logInfo
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/6394#discussion_r31972804
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
@@ -225,12 +243,74 @@ private[yarn] class YarnAllocator(
logInfo
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-109890835
Ah, yeah, that was the change I was referring to.
I'm not sure about the Mesos deployment model, but on Standalone mode at
least, it would be possible
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6643#issuecomment-109383319
LGTM, merging
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-109397104
When there is no skew, are there situations where this would lead to worse
performance? E.g. will it make tasks bunch up on nodes more than before and /
or result
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-109398051
Also, is there an intuitive justification for why 5 is a good number? It
seems a little weird to me that it's independent of the number of tasks, the
cluster size
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6430#issuecomment-109395777
This looks right to me. Merging.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-109450997
Ahhh, I misunderstood and thought that all the reduce tasks for a stage got
the same locality preferences. I withdraw my concern.
---
If your project is set up
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-109445990
On the other side, I can also envision situations where it would be really
helpful to request a higher number of preferred locations. Consider a large
YARN cluster where
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6652#issuecomment-109444936
My worry is the following situation:
* Map outputs from stage 1 exhibit little skew and are distributed evenly
across nodes.
* 5 nodes are chosen as preferred
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-109493444
Where do we end up cloning Configuration objects? With these changes, we
avoid loading defaults when we reconstitute Configuration objects from bytes.
Are there hot
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/6679
SPARK-8135. In SerializableWritable, don't load defaults when instantâ¦
â¦iating Configuration
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6680#issuecomment-109502268
Does this comment still apply: `// If we are running in yarn-cluster mode,
verify that driver logs are downloadable.`?
---
If your project is set up for it, you can
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-109502593
Makes sense. In that case, this should be ready for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-109499883
In light of this change, do you think we should remove the broadcasting of
Configurations? While we avoid the much larger cost of reading and parsing XML
for each task
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/6679#issuecomment-109499640
There's a situation in which there could be a behavior change in situations
where the executor somehow has a different Hadoop configuration file than the
driver. But I
101 - 200 of 1255 matches
Mail list logo