[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-07 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 200k+ connections seems to be your problem then. Is this all a single application? You say 6000 nodes with 64 executors on each host, how many cores per executor? Or do you mean basically

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-06 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 ok sorry I forgot you had the screenshot there. so as you mention in that post if we are just creating to many outboundbuffers before they can actual be sent over the network then we should try

[GitHub] spark issue #18547: [SPARK-21321][Spark Core] Spark very verbose on shutdown

2017-07-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18547 There is no reason to print out messages that aren't useful to the users. Many users see Warnings and read them and think there is a problem with their application or configuration. Mo

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 So that is an issue. If users are running spark 1.6 or spark 2.1 on the same cluster as the new one with this feature, you can't upgrade the shuffle service until no one runs those. W

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 making the external shuffle service incompatible is a huge issue for deployments. For the yarn side you would have to have the nodemanager run 2 versions (which as far as I know hasn't

[GitHub] spark pull request #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus ev...

2017-06-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18476#discussion_r125138598 --- Diff: docs/configuration.md --- @@ -1398,6 +1398,15 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus ev...

2017-06-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18476#discussion_r125138509 --- Diff: docs/configuration.md --- @@ -1398,6 +1398,15 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark issue #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus event que...

2017-06-30 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18476 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus ev...

2017-06-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18476#discussion_r125137046 --- Diff: docs/configuration.md --- @@ -1398,6 +1398,15 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #18476: [SPARK-20858][DOC][MINOR] Document ListenerBus ev...

2017-06-30 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18476#discussion_r125038588 --- Diff: docs/configuration.md --- @@ -1398,6 +1398,15 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-06-27 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 I think having both sides would probably be good. limit the reducer connections and simultaneous block calls but have a fail safe on the shuffle server side where it can reject connections also

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-06-27 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18388 Haven't looked at the path in detail yet. High level questions/thoughts. So you say the memory usage is by the netty chunks, so my assumption is this is during the actual transfer? fa

[GitHub] spark issue #17113: [SPARK-13669][SPARK-20898][Core] Improve the blacklist m...

2017-06-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 +1 finally got a clean build, will merge to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-23 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 thanks for the reviews @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 failure is from previous push of code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 upmerged to master and updated default and removed unneeded changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 will upmerge shortly, since there are conflicts --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-06-22 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r123527201 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -295,4 +295,12 @@ package object config { "above

[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

2017-06-22 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18070 sorry for my delay on getting back to this. So if we do that you would have to have taskKilledReason extend TaskFailedReason so because things rely on the countTowardsTaskFailures field. Then

[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-06-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r123273222 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf

[GitHub] spark issue #17113: [SPARK-13669][SPARK-20898][Core] Improve the blacklist m...

2017-06-21 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17113: [SPARK-13669][SPARK-20898][Core] Improve the blacklist m...

2017-06-21 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 these failures definitely look unrelated. I'll kick once more to try to get clean run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-21 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 sorry missed that you had commented, yes we can change that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-06-21 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r123262316 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf

[GitHub] spark issue #17113: [SPARK-13669][SPARK-20898][Core] Improve the blacklist m...

2017-06-20 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-20 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18213: [SPARK-20996][YARN] Better handling AM reattempt based o...

2017-06-20 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18213 I think this is a very hard thing for us to know, to many different failure types. I agree that setting to 1 is better then us getting it wrong, although I question a bit still if that is right

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-14 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 sorry been out on vacation and probably won't have time this week to respond much but will update early next week. Thanks @rdblue . I am running this in our production as well and can cl

[GitHub] spark issue #18150: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-06-13 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18150 Sorry been out.on vacation I think invalidating all and having feature flag makes sense for now. If we get more data on it causing issues we can revisit. Sorry won't have time to review in d

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 I'm not sure what you mean by its not doable? what places are you seeing update the block statuses that I haven't covered here? most of it was done by the BlockManager. Maybe I

[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-06-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119943504 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala --- @@ -528,7 +528,13 @@ class JobProgressListener(conf: SparkConf

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 turned on by default for backwards compatibility but don't really agree with it. We should make it more stable/usable for people by turning it off. I'm assuming anyone that is using

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 ok, I'll update the default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enable

[GitHub] spark pull request #18150: [SPARK-19753][CORE] Un-register all shuffle outpu...

2017-06-01 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18150#discussion_r119736751 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1383,19 +1394,43 @@ class DAGScheduler( */ private

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-01 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 @JoshRosen what do you think should we add the deprecated? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

2017-06-01 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18070 thanks for the udpates. I was testing this out by running large job with speculative tasks and I am still seeing the stage summary show failed tasks. It looks like its due to this code

[GitHub] spark issue #17113: [SPARK-13669][SPARK-20898][Core] Improve the blacklist m...

2017-06-01 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 changes LGTM. @squito did you have any further comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-01 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 It would be nice to get this into spark 2.2 if we can --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18162: [SPARK-20923] turn tracking of TaskMetrics._updatedBlock...

2017-06-01 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 Yeah I was figuring I would file another jira to remove it later. I can add the deprecated flag here if you guys agree. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request #18162: [SPARK-20923] turn tracking of TaskMetrics._updat...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119525021 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -295,4 +295,12 @@ package object config { "above

[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

2017-05-31 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 taskMetrics doesn't take the sparkconf or anything to get at a config so we would have to config out everywhere its incrementing or adding things. I think that wouldn't be to hard.

[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

2017-05-31 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 Updated, I put the TaskMetrics api back with deprecated marking and just had it return Nil. @JoshRosen Were you thinking of adding more back? --- If your project is set up for it, you can

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119444253 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -368,8 +356,7 @@ private[spark] object JsonProtocol { ("Sh

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119422345 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -368,8 +356,7 @@ private[spark] object JsonProtocol { ("Sh

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119422001 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -110,15 +109,6 @@ class TaskMetrics private[spark] () extends

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18162#discussion_r119421473 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -368,8 +356,7 @@ private[spark] object JsonProtocol { ("Sh

[GitHub] spark issue #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses

2017-05-31 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18162 @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #18162: [SPARK-20923] Remove TaskMetrics._updatedBlockSta...

2017-05-31 Thread tgravescs
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/18162 [SPARK-20923] Remove TaskMetrics._updatedBlockStatuses ## What changes were proposed in this pull request? Remove TaskMetrics._updatedBlockStatuses. As far as I can see its not used by

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-31 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r119366199 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -51,29 +51,19 @@ import org.apache.spark.util.{AccumulatorV2

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-31 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 please add jira SPARK-20898 to the description since fixing that here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-30 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 so unfortunately I haven't actually been seeing this. You can see with external shuffle is something happens to the NM and it does cause job failure. NM crashes for OOM, something else kil

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 @squito just double checking, are you ok with this change and did you have any comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118751271 --- Diff: docs/configuration.md --- @@ -1449,6 +1449,14 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118751130 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,72 @@ private[scheduler] class BlacklistTracker

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118748969 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,72 @@ private[scheduler] class BlacklistTracker

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118747665 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,72 @@ private[scheduler] class BlacklistTracker

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 I'm curious did you test the killing part on an actual yarn job? I was trying it on master and I don't think it works at all due to the way its passing allocation client. Its a sepa

[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

2017-05-26 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18070#discussion_r118711377 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -459,7 +459,7 @@ private[spark] class Executor( case CausedBy

[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

2017-05-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/18070 sorry the case I was talking about is with a fetch failure. The true abort stage doesn't happen until it retries 4 times. in that mean time you can have tasks from the same stage (diff

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-24 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118315661 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,75 @@ private[scheduler] class BlacklistTracker

[GitHub] spark issue #17113: [SPARK-13669][Core] Improve the blacklist mechanism to h...

2017-05-24 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17113 @jerryshao sorry my delay on this, we have rough design what we want to do for future changes but I think those are going to take a while and in the mean time I think this is a useful addition

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-24 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118260480 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,75 @@ private[scheduler] class BlacklistTracker

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-24 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118261422 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,75 @@ private[scheduler] class BlacklistTracker

[GitHub] spark pull request #17113: [SPARK-13669][Core] Improve the blacklist mechani...

2017-05-24 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17113#discussion_r118260789 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -145,6 +146,75 @@ private[scheduler] class BlacklistTracker

[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

2017-05-23 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/18070#discussion_r117977100 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -338,6 +340,9 @@ private[spark] class Executor

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-05-16 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17955 at a high level this definitely makes sense. I need to look at in more detail, I'll try to do that in the next day or two. I am wondering what all testing you have done on this?

[GitHub] spark issue #16705: [SPARK-19354] [Core] Killed tasks are getting marked as ...

2017-05-11 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/16705 @devaraj-kavali thanks, it looks like this is already fixed in spark 2.2 with SPARK-20217 please close. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #16705: [SPARK-19354] [Core] Killed tasks are getting marked as ...

2017-05-10 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/16705 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-09 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 looks good, I'll merge. thanks @redsanket --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-08 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17854 also what is the exact error/stack trace you see when you say "failed to connect"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-08 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17854 what is your network timeout (spark.network.timeout) set to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17854 > If that's what you mean, there's no need for retrying. No RPC calls retry anymore. See #16503 (comment) for an explanation. I see, I guess with the way we have the rpc i

[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17854 I took a quick look at the registerExecutor call in CoarseGrainedExecutorBackend and its not retrying at all. We should change that to retry. We retry heartbeats and many other things so it

[GitHub] spark issue #17854: [SPARK-20564][Deploy] Reduce massive executor failures w...

2017-05-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17854 to slow down launching you could just set spark.yarn.containerLauncherMaxThreads to be smaller. that isn't guaranteed but neither is this really. Just an alternative or something you c

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-05-05 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/15009 we should update the tags to 2.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114997435 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -283,10 +283,15 @@ private[spark] object EventLoggingListener

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 @srowen @vanzin do either of you know where the jenkins stuff is configured? wondering why this isn't working for me. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-04 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-03 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 there are also a few formatting things that look like they were just line wraps and extra new lines. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114647868 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -60,6 +60,8 @@ private[spark] class SparkUI private ( var appId

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-05-03 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17744 Yes conceptually it could be removed but as you say is a bigger change. Are you still seeing memory issues after this change? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-05-02 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r114363985 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -742,6 +743,7 @@ private[history] object FsHistoryProvider

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-05-02 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 Jenkins, okay to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #17744: [SPARK-20426] Lazy initialization of FileSegmentManagedB...

2017-04-27 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17744 We see the same issue on some of our clusters. I was planning on doing 2 things. Something like this to reduce that memory usage and then on the other side you could change the shuffle fetcher

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-27 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17744#discussion_r113697866 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,14 +92,25 @@ protected void

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...

2017-04-26 Thread tgravescs
GitHub user tgravescs reopened a pull request: https://github.com/apache/spark/pull/17748 [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro… …ss NFS directories ## What changes were proposed in this pull request? Change from using java

[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...

2017-04-26 Thread tgravescs
Github user tgravescs closed the pull request at: https://github.com/apache/spark/pull/17748 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #17748: [SPARK-19812] YARN shuffle service fails to relocate rec...

2017-04-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17748 Thanks for the review @mridulm merged to master and branch-2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-26 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-25 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17744#discussion_r113323069 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,14 +92,25 @@ protected void

[GitHub] spark pull request #17658: [SPARK-20355] Add per application spark version o...

2017-04-25 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17658#discussion_r113196872 --- Diff: core/src/main/scala/org/apache/spark/ui/SparkUI.scala --- @@ -139,6 +140,8 @@ private[spark] abstract class SparkUITab(parent: SparkUI, prefix

[GitHub] spark issue #17658: [SPARK-20355] Add per application spark version on the h...

2017-04-25 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/17658 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...

2017-04-25 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/17748#discussion_r113192444 --- Diff: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java --- @@ -363,25 +362,29 @@ protected File

[GitHub] spark pull request #17748: [SPARK-19812] YARN shuffle service fails to reloc...

2017-04-24 Thread tgravescs
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/17748 [SPARK-19812] YARN shuffle service fails to relocate recovery DB acro… …ss NFS directories ## What changes were proposed in this pull request? Change from using java

[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...

2017-04-24 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/15009 @kishorvpatil please fix documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

<    4   5   6   7   8   9   10   11   12   13   >