Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2440#issuecomment-56132627
Removing this sounds good to me too. Will upload a patch. I think a
measure of how long a task spends in shuffle would be useful though, as it
helps users understand
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2232#issuecomment-55858105
Could this change behavior in cases where the spark.yarn.dist.files is
configured with no scheme? Without this change, it would interpret no scheme
to mean that it's
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/2440
SPARK-3574. Shuffle finish time always reported as -1
The included test waits 100 ms after job completion for task completion
events to come in so it can verify they have reasonable finish times
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2232#issuecomment-55985896
Hmm. My feeling is that it's better to be consistent here and consider the
old behavior a bug than to maintain compatibility than to support a cornerish
case
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1388#issuecomment-55831832
Had noticed that. Haven't had time to fix these but will get to them soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2350#issuecomment-55309683
+1 for making Client private. This should go through SparkSubmit, and, as
Patrick mentioned, I'd be surprised if we haven't broken any code that's
relying on that already
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2338#issuecomment-55334340
I don't have any great ideas for how to write a test for it, but this looks
good to me as well.
---
If your project is set up for it, you can reply to this email and have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2087#issuecomment-55355339
Updated patch includes fallback to the split size
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1388#issuecomment-55182360
Thanks @davies for catching those. Did another pass to make sure I didn't
miss any others.
---
If your project is set up for it, you can reply to this email and have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/663#issuecomment-55066027
Upmerged
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2338#issuecomment-55066419
Hi @davies , sorry for causing this bug and thanks for picking it up. To
avoid making the deep copy unnecessarily when running in non-local mode, we
could instead make
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/2274#discussion_r17222926
--- Diff: python/pyspark/tests.py ---
@@ -405,22 +404,6 @@ def test_zip_with_different_number_of_items(self):
self.assertEquals(a.count(), b.count
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2087#issuecomment-54780033
It looks like all the core tests are passing, but there are some failures
in streaming and SQL tests. Have those been showing up elsewhere?
---
If your project is set up
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/655#issuecomment-54780393
Unfortunately the cleanup refactored a bunch of common code between
yarn-alpha and yarn-stable that no longer would have been common after this
patch (because, after 2.2
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2087#issuecomment-54852759
Just to make sure it's clear, the issue isn't only that we can be a few
bytes off when we're reading outside of split boundaries, but that it'll look
like we read the full
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/2324
SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-3422
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2274#issuecomment-54677627
Updated patch adds Python back in and adds the 's' at the end.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17198796
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -309,4 +322,44 @@ private[spark] object HadoopRDD {
f(inputSplit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/2274#discussion_r17201280
--- Diff: python/pyspark/rdd.py ---
@@ -515,6 +515,30 @@ def __add__(self, other):
raise TypeError
return self.union(other
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/2274
SPARK-2978. Transformation with MR shuffle semantics
I didn't add this to the transformations list in the docs because it's kind
of obscure, but would be happy to do so if others think it would
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1388#discussion_r17101588
--- Diff: python/pyspark/mllib/regression.py ---
@@ -66,6 +66,9 @@ def weights(self):
def intercept(self):
return self._intercept
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2253#issuecomment-54516530
I think it's preferable to give the user the size they actually request.
This avoids them requesting the same size later under different conditions and
unexpectedly
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/2274#issuecomment-54555319
Updated patch removes Python version, adds Java version, and adds some
additional doc.
---
If your project is set up for it, you can reply to this email and have your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1375#issuecomment-54383155
I think this would be extremely useful. Getting executor logs with Spark
on YARN currently requires clicking like 6 links on the ResourceManager page.
---
If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54114366
Here's the exception:
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1388#issuecomment-54115928
Updated the patch to match existing conventions
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1388#issuecomment-54174652
I believe the failure is unrelated. I noticed it on SPARK-3052 as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-54174772
I believe the failure is unrelated. I noticed it on SPARK-2461 as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1934#discussion_r17005626
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
---
@@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1934#discussion_r17005920
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
---
@@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1934#discussion_r17006495
--- Diff:
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
---
@@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1388#discussion_r17006876
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala
---
@@ -74,6 +74,8 @@ abstract class GeneralizedLinearModel
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020310
--- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020360
--- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020639
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -1296,7 +1298,25 @@ class DAGScheduler(
// If the RDD has some
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020666
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -1296,7 +1298,25 @@ class DAGScheduler(
// If the RDD has some
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020754
--- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r17020803
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -309,4 +322,44 @@ private[spark] object HadoopRDD {
f(inputSplit
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1388#issuecomment-53994494
Sorry for the delay. Updated patch adds this for Python as well.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1388#discussion_r16936730
--- Diff: python/pyspark/mllib/regression.py ---
@@ -66,6 +66,9 @@ def weights(self):
def intercept(self):
return self._intercept
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1608#discussion_r16606710
--- Diff: external/hbase/pom.xml ---
@@ -0,0 +1,140 @@
+?xml version=1.0 encoding=UTF-8?
+!--
+ ~ Licensed to the Apache Software Foundation (ASF
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/2087
SPARK-2621. Update task InputMetrics incrementally
The patch takes advantage an API provided in Hadoop 2.5 that allows getting
accurate data on Hadoop FileSystem bytes read. It eliminates the old
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1845#issuecomment-52713440
When we added spark-submit originally, we went with the current approach
(Bash-Scala) because @mateiz had concerns about the overhead of starting two
JVMs.
---
If your
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1984#discussion_r16372445
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1984#discussion_r16378515
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1984#discussion_r16380717
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1984#issuecomment-52552878
Posted a patch that removes the queue resources log message entirely
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1279#discussion_r16387993
--- Diff: docs/running-on-yarn.md ---
@@ -125,6 +125,14 @@ Most of the configs are the same for Spark on YARN as
for other deployment modes
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1984
SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste...
...d queue doesn't exist
You can merge this pull request into a Git repository by running:
$ git pull https
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1977#issuecomment-52381086
Does / will the same functionality exist in Scala/Java?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1934#discussion_r16271945
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
---
@@ -213,28 +213,22 @@ class ApplicationMaster(args
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1934#discussion_r16272089
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
---
@@ -613,29 +593,6 @@ object YarnAllocationHandler
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1934#issuecomment-52256543
Updated patch fixes an issue that @andrewor14 pointed out.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1956
SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ...
...job fails while reading from Hadoop
You can merge this pull request into a Git repository by running:
$ git pull
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52262225
This occurs when an executor process shuts down while tasks are executing
(e.g. because the driver disassociated or an OOME).
Hadoop FileSystems register
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1956#issuecomment-52267702
Ah and the order they should be shut down in is RecordReader then
FileSystem?
Right
---
If your project is set up for it, you can reply to this email
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1961
SPARK-3028. sparkEventToJson should support SparkListenerExecutorMetrics...
...Update
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1961#issuecomment-52274079
Ooops, fixed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r16204968
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -243,10 +244,23 @@ class HadoopRDD[K, V](
new HadoopMapPartitionsWithSplitRDD
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r16205012
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -304,4 +318,48 @@ private[spark] object HadoopRDD {
f(inputSplit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r16205064
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -304,4 +318,48 @@ private[spark] object HadoopRDD {
f(inputSplit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r16205236
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -304,4 +318,48 @@ private[spark] object HadoopRDD {
f(inputSplit
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1486#discussion_r16205425
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -304,4 +318,48 @@ private[spark] object HadoopRDD {
f(inputSplit
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1934
SPARK-3014. Log a more informative messages in a couple failure scenario...
...s
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1890#issuecomment-51805750
FWIW I think this is already what happens in YARN, as we use Hadoop's
distributed cache to send out the jars and include them on the executor
classpath at startup
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1844#issuecomment-51653564
On the executor side, framework jars come first unless
spark.files.userClassPathFirst is set to true.
At least for Spark on YARN, executors are not launched with spark
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1825#issuecomment-51435337
This will allow spark-shell to take spark-submit options, but will remove
its ability to take spark-shell-specific options (currently there's only one,
file). I'm unclear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1825#issuecomment-51436115
org.apache.spark.repl.SparkRunnerSettings
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1826
SPARK-2900. aggregate inputBytes per stage
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-2900
Alternatively you can
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1826#issuecomment-51440037
The failure appears to be unrelated (something with connections and Kafka).
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1826#issuecomment-51440066
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1834#issuecomment-51543219
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-51390403
thanks Patrick
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1507#discussion_r15900998
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala ---
@@ -191,7 +184,7 @@ object BlockFetcherIterator
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1507#discussion_r15906274
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -73,11 +75,16 @@ class TaskMetrics extends Serializable {
var
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-51168899
Looking into it. I ran the test that it was hanging on and things
completed fine. I also combed the code and didn't see anywhere where this
patch had changed how things
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-51025499
Updated patch addresses @pwendell and @kayousterhout 's comments and adds
tests.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-51001765
Updated patch keeps it as ShuffleWriteMetrics for now.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1507#issuecomment-50976412
Just tested this and observed the shuffle bytes read going up for
in-progress tasks.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1481#issuecomment-50977756
I hadn't noticed this before, but DiskObjectWriter is used for tracking
bytes spilled by ExternalSorter and ExternalAppendOnlyMap in addition to
shuffle bytes written. So
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1056#issuecomment-50853874
Thanks @pwendell and @andrewor14 for your continued reviews.
10 seconds sounds fine to me. Not that it's a shining beacon of
performance, but MapReduce actually
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15684813
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -991,6 +994,9 @@ class SparkContext(config: SparkConf) extends Logging
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1657#issuecomment-50922542
This makes sense to me. However, we should also document it and mention
that it only currently works for YARN.
---
If your project is set up for it, you can reply
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1507#discussion_r15716773
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
---
@@ -73,11 +75,16 @@ class TaskMetrics extends Serializable {
var
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1665#discussion_r15631418
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -184,7 +184,7 @@ object SparkSubmit {
OptionAssigner(args.archives
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1699#issuecomment-50837155
I'm worried that treating unknown args as app args would make typos
difficult to debug.
spark-submit --executor-croes 10
should print out an error
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1665
SPARK-2664. Deal with `--conf` options in spark-submit that relate to fl...
...ags
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15526822
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -348,4 +353,48 @@ private[spark] class Executor
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15526958
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -155,6 +156,23 @@ class DAGScheduler(
eventProcessActor
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15527486
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
@@ -56,7 +56,7 @@ private[jobs] object UIData {
}
case class
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15528611
--- Diff: docs/configuration.md ---
@@ -524,6 +524,13 @@ Apart from these, the following properties are also
available, and may be useful
output
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15559871
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -348,4 +353,48 @@ private[spark] class Executor
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1056#issuecomment-50556357
Latest patch incorporates latest feedback and adds BlockManagerSuite back
in. I tested on a small cluster and saw executors shut down fine (but haven't
run at scale
GitHub user sryza opened a pull request:
https://github.com/apache/spark/pull/1642
SPARK-2738. Remove redundant imports in BlockManagerSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sryza/spark sandy-spark-2738
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/789#issuecomment-50431290
@mateiz I posted a couple ideas and was waiting on feedback. Any thoughts?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1486#issuecomment-50282111
I think reflection is definitely the right way to go here
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1056#issuecomment-50173304
As far as I can tell, you're right - I don't see why updateShuffleMetrics
needs to be synchronized.
Uploading a patch that:
* Adds comments to TaskMetrics
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1056#issuecomment-49970751
I don't entirely understand the advantage of having a separate
PartialTaskMetrics. Ultimately every field of TaskMetrics except for maybe
shuffleFinishTime will be able
Github user sryza commented on a diff in the pull request:
https://github.com/apache/spark/pull/1056#discussion_r15333186
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -348,4 +352,46 @@ private[spark] class Executor
Github user sryza commented on the pull request:
https://github.com/apache/spark/pull/1507#issuecomment-49778015
Exactly. The idea is to call mergeShuffleReadMetrics when we're about to
send the metrics update.
---
If your project is set up for it, you can reply to this email
1001 - 1100 of 1255 matches
Mail list logo