[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55309683 +1 for making Client private. This should go through SparkSubmit, and, as Patrick mentioned, I'd be surprised if we haven't broken any code that's relying on that already

[GitHub] spark pull request: [SPARK-3465] fix task metrics aggregation in l...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2338#issuecomment-55334340 I don't have any great ideas for how to write a test for it, but this looks good to me as well. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-55355339 Updated patch includes fallback to the split size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-55831832 Had noticed that. Haven't had time to fix these but will get to them soon. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

2014-09-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55858105 Could this change behavior in cases where the spark.yarn.dist.files is configured with no scheme? Without this change, it would interpret no scheme to mean that it's

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-17 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2440 SPARK-3574. Shuffle finish time always reported as -1 The included test waits 100 ms after job completion for task completion events to come in so it can verify they have reasonable finish times

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

2014-09-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55985896 Hmm. My feeling is that it's better to be consistent here and consider the old behavior a bug than to maintain compatibility than to support a cornerish case

[GitHub] spark pull request: [SPARK-3560] Fixed setting spark.jars system p...

2014-09-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2449#discussion_r17761498 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -205,6 +205,7 @@ object SparkSubmit { OptionAssigner(args.jars, YARN

[GitHub] spark pull request: [SPARK-3560] Fixed setting spark.jars system p...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2449#issuecomment-56116823 Saw this already went in, but had one stylistic/organization nit that might be worth fixing --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56132524 @nishkamravi2 mind resolving the merge conflicts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56132497 These changes look good to me. This addresses what continues to be the #1 issue that we see in Cloudera customer YARN deployments. It's worth considering boosting

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56132627 Removing this sounds good to me too. Will upload a patch. I think a measure of how long a task spends in shuffle would be useful though, as it helps users understand

[GitHub] spark pull request: SPARK-3605. Fix typo in SchemaRDD.

2014-09-19 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2460 SPARK-3605. Fix typo in SchemaRDD. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-3605 Alternatively you can review

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56224377 As far as I can tell the test failures are unrelated (something to do with not being able to find PySpark modules) --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-3612. Executor shouldn't quit if heartbe...

2014-09-22 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2487 SPARK-3612. Executor shouldn't quit if heartbeat message fails to reach ... ...the driver You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-56342496 If #2485 is the replacement, can we close this one out? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2485#discussion_r17837060 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -117,9 +118,10 @@ private[yarn] abstract class YarnAllocator

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56343225 It would also be nice to log what it is when we fail to get a container large enough or it fails due to the cluster max allocation limit was hit. @tgravescs I

[GitHub] spark pull request: SPARK-3642. Document the nuances of shared var...

2014-09-22 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2490 SPARK-3642. Document the nuances of shared variables. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-3642 Alternatively

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-56461580 MapReduce doesn't use getPos, but it does look like it might be helpful in some situations. One caveat is that pos only means # bytes for file input formats. For example

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-23 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2504 SPARK-3172 and SPARK-3577 The posted patch addresses both SPARK-3172 and SPARK-3577. It renames ShuffleWriteMetrics to WriteMetrics and uses it for tracking all three of shuffle write, spilling

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-23 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56480298 This looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-56736604 Updated patch adds the APIs discussed. It relies on a new property spark.kryo.classesToRegister, which registerKryoClasses appends to. The change also enables users

[GitHub] spark pull request: [SPARK-3476] Remove outdated memory checks in ...

2014-09-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2528#issuecomment-56786377 The changes conceptually look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56865310 @nishkamravi2 arrived at this through experimentation. He had a few details on his experiments on the previous incarnation of this PR #1391 . If anything, I think 0.07

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-25 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/789#discussion_r18053706 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -140,6 +142,20 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging

[GitHub] spark pull request: SPARK-3172 and SPARK-3577

2014-09-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2504#issuecomment-56879452 I considered that approach as well, but found that this one sat more elegantly with the metrics collecting code. DiskObjectWriter, which is used both when spilling

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-56880976 Thanks for taking a look @andrewor14. Will post a patch that addresses your comments and makes sure things work in Java. --- If your project is set up for it, you can

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-09-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2485#issuecomment-56892181 We could also expose both and make memoryOverhead override the other. I think this could be reasonable because the scale is most likely to be set in spark-defaults.conf

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/789#discussion_r18182436 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -140,6 +141,20 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57236118 The current approach couples the updating of this metric with the heartbeats in a way that seems strange. The heartbeats (and task completion, which, my bad, I

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-57370056 Updated the patch to address Patrick's comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-09-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-57395416 That's correct (documented this on the conf page). My thought was that we could hit strange interactions, for example if the same class is registered both with a custom

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-57421448 Updated patch switches from the pull to push model as requested by @pwendell and adds a test. I verified that the test succeeds against both Hadoop 2.2 and Hadoop 2.5

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-57585666 Updated patch allows using both at the same time --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-57592234 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-10-02 Thread sryza
Github user sryza closed the pull request at: https://github.com/apache/spark/pull/1388 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-57593494 Closing this in favor of #2625 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-57593579 w00t! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2461] [PySpark] Add a toString method t...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2625#issuecomment-57593551 Thanks for picking this up @davies. This looks good to me if Jenkins OKs it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2778] [yarn] Add workaround for race in...

2014-10-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2605#issuecomment-57742306 I took a look and found where the behavior that Marcelo is observing is occurring in YARN. (For future reference), when the ResourceManager's ClientRMService starts up

[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...

2014-10-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-57837840 As horizontal space is precious for including more metrics, might it make sense to combine Address / Executor and Executor ID into a single Executor column, with values

[GitHub] spark pull request: [SPARK-3777] Display Executor ID for Tasks i...

2014-10-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2642#issuecomment-58253857 This LGTM too. Thanks for making those changes @zsxwing . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1253#discussion_r15202044 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -290,6 +291,14 @@ private[spark] class SparkSubmitArguments(args: Seq

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49679081 Updated patch addresses Patrick's feedback --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15204407 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -129,7 +128,7 @@ class BlockManagerMasterActor(val isLocal: Boolean

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49700737 Upmerged and incorporated review comments. Also added a random sleep at the start so that the executor heartbeats are less likely to get in sync. --- If your

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-49778015 Exactly. The idea is to call mergeShuffleReadMetrics when we're about to send the metrics update. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15246387 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -191,7 +183,7 @@ object BlockFetcherIterator

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49780719 The failure appears to be unrelated (MIMA compatibility issue in MLLib). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15246688 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -75,9 +76,12 @@ class TaskMetrics extends Serializable

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15246919 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -131,7 +122,9 @@ object BlockFetcherIterator { val

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49813666 Made stylistic fixes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49970751 I don't entirely understand the advantage of having a separate PartialTaskMetrics. Ultimately every field of TaskMetrics except for maybe shuffleFinishTime will be able

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15333186 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +352,46 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50173304 As far as I can tell, you're right - I don't see why updateShuffleMetrics needs to be synchronized. Uploading a patch that: * Adds comments to TaskMetrics

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-07-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-50282111 I think reflection is definitely the right way to go here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-07-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-50431290 @mateiz I posted a couple ideas and was waiting on feedback. Any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15526822 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15526958 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -155,6 +156,23 @@ class DAGScheduler( eventProcessActor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15527486 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -56,7 +56,7 @@ private[jobs] object UIData { } case class

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15528611 --- Diff: docs/configuration.md --- @@ -524,6 +524,13 @@ Apart from these, the following properties are also available, and may be useful output

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15559871 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50556357 Latest patch incorporates latest feedback and adds BlockManagerSuite back in. I tested on a small cluster and saw executors shut down fine (but haven't run at scale

[GitHub] spark pull request: SPARK-2738. Remove redundant imports in BlockM...

2014-07-29 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1642 SPARK-2738. Remove redundant imports in BlockManagerSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2738

[GitHub] spark pull request: SPARK-2664. Deal with `--conf` options in spar...

2014-07-30 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1665 SPARK-2664. Deal with `--conf` options in spark-submit that relate to fl... ...ags You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark

[GitHub] spark pull request: SPARK-2664. Deal with `--conf` options in spar...

2014-07-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1665#discussion_r15631418 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -184,7 +184,7 @@ object SparkSubmit { OptionAssigner(args.archives

[GitHub] spark pull request: [SPARK-2678][Core] Prevents `spark-submit` fro...

2014-07-31 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1699#issuecomment-50837155 I'm worried that treating unknown args as app args would make typos difficult to debug. spark-submit --executor-croes 10 should print out an error

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-08-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50853874 Thanks @pwendell and @andrewor14 for your continued reviews. 10 seconds sounds fine to me. Not that it's a shining beacon of performance, but MapReduce actually

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-08-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15684813 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -991,6 +994,9 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-2641: Fixing how spark arguments are loa...

2014-08-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1657#issuecomment-50922542 This makes sense to me. However, we should also document it and mention that it only currently works for YARN. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15716773 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -73,11 +75,16 @@ class TaskMetrics extends Serializable { var

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-50976412 Just tested this and observed the shuffle bytes read going up for in-progress tasks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-50977756 I hadn't noticed this before, but DiskObjectWriter is used for tracking bytes spilled by ExternalSorter and ExternalAppendOnlyMap in addition to shuffle bytes written. So

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51001765 Updated patch keeps it as ShuffleWriteMetrics for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51025499 Updated patch addresses @pwendell and @kayousterhout 's comments and adds tests. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51168899 Looking into it. I ran the test that it was hanging on and things completed fine. I also combed the code and didn't see anywhere where this patch had changed how things

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51390403 thanks Patrick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-06 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15900998 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -191,7 +184,7 @@ object BlockFetcherIterator

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-06 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15906274 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -73,11 +75,16 @@ class TaskMetrics extends Serializable { var

[GitHub] spark pull request: [SPARK-2894] spark-shell doesn't accept flags

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1825#issuecomment-51435337 This will allow spark-shell to take spark-submit options, but will remove its ability to take spark-shell-specific options (currently there's only one, file). I'm unclear

[GitHub] spark pull request: [SPARK-2894] spark-shell doesn't accept flags

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1825#issuecomment-51436115 org.apache.spark.repl.SparkRunnerSettings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1826 SPARK-2900. aggregate inputBytes per stage You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2900 Alternatively you can

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1826#issuecomment-51440037 The failure appears to be unrelated (something with connections and Kafka). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1826#issuecomment-51440066 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2904] Remove non-used local variable in...

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1834#issuecomment-51543219 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2913] Place our log4j.properties at the...

2014-08-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1844#issuecomment-51653564 On the executor side, framework jars come first unless spark.files.userClassPathFirst is set to true. At least for Spark on YARN, executors are not launched with spark

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

2014-08-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1890#issuecomment-51805750 FWIW I think this is already what happens in YARN, as we use Hadoop's distributed cache to send out the jars and include them on the executor classpath at startup

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58387498 Great catch. A concern is that calling Array#take requires an implicit conversion, which has some performance impact that might be unacceptable for this method

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58473731 Hmm, yeah, copyBytes is no good if it doesn't appear in Hadoop 1. My suggestion would be to use from copyOfRange from java.util.Arrays. --- If your project is set

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2712#discussion_r18732020 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1409,7 +1411,7 @@ object SparkContext extends Logging

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18732461 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18732711 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18732982 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18733008 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-10 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18733056 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732168 One more nit: the added java import should go with the other java imports. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3121] Wrong implementation of implicit ...

2014-10-10 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2712#issuecomment-58732173 Otherwise, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2746#issuecomment-58759844 Awesome, sounds good, will hold off. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [WIP][SPARK-3795] Heuristics for dynamically s...

2014-10-11 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2746#discussion_r18743600 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorScalingManager.scala --- @@ -0,0 +1,324 @@ +/* + * Licensed to the Apache Software

  1   2   3   4   5   6   7   8   9   10   >