[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Please note that this applies only the spark artifacts - not any of the others (which, as you mentioned, will be a simple disk lookup). Since I was moving it for spark generated artifacts anyway

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Thanks for the link to 'skip' - will test it out ! About test compilation : For example, in common/network-shuffle/pom.xml, we have

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 @srowen You are right; for normal build, this should be a no-op. Currently, there is no way to suppress compilation of tests (we can suppress running test via -DskipTests but it will still

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84049757 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84047904 --- Diff: pom.xml --- @@ -2415,6 +2389,67 @@ + + docBuild + + + maven.javadoc.skip

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84047441 --- Diff: common/network-common/pom.xml --- @@ -77,27 +77,40 @@ compile - - - log4j - log4j

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/15553 [SPARK-18008] [build] Add support for -Dmaven.test.skip=true and -Dmaven.javadoc.skip=true ## What changes were proposed in this pull request? Add support for -Dmaven.test.skip=true

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 +CC @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15550: [SPARK-18003][Spark Core] Fix bug of RDD zipWithI...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15550#discussion_r84004989 --- Diff: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala --- @@ -64,8 +64,14 @@ class ZippedWithIndexRDD[T: ClassTag](prev: RDD[T

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 BTW, it was interesting that the earlier change did not trigger a test failure (the issue @viirya pointed out - about needing to move RemoveExecutor to receive) --- If your project is set up

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 LGTM, @zsxwing any comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing Ah, then simply making it send() instead of askWithRetry() should do, no ? That was actually in the initial PR - I was not sure if we want to change the behavior from askWithRetry

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @scwf I think the initial fix with a small change might be sufficient. What I meant was something like this : ``` protected def reset(): Unit = { val executors

[GitHub] spark pull request #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.d...

2016-10-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15382#discussion_r83928417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -741,7 +741,7 @@ private[sql] class SQLConf extends Serializable

[GitHub] spark issue #15531: [SQL][STREAMING][TEST] Follow up to remove Option.contai...

2016-10-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15531 @srowen Unfortunately the 'Some' in the == is usually missed out, resulting in bugs (and we have had a fair share of them in the past - some more severe than others). Given

[GitHub] spark pull request #15512: [SPARK-17930][CORE]The SerializerInstance instanc...

2016-10-17 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15512#discussion_r83752469 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala --- @@ -84,6 +90,7 @@ private[spark] class TaskResultGetter(sparkEnv

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 Merged to master, thanks @zhzhan ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-15 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 I am assuming @kayousterhout does not have comments on this. Can you please fix the conflict @zhzhan ? I will merge it in after that to master. --- If your project is set up for it, you can

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Would be cleaner to simply copy executorDataMap.keys and works off that to ensure there is no coupling between actor thread and invoker. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15454: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-13 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15454#discussion_r83155316 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -171,7 +175,11 @@ class NewHadoopRDD[K, V]( override def

[GitHub] spark pull request #15454: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-13 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15454#discussion_r83155108 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -245,8 +248,7 @@ class HadoopRDD[K, V]( try { finished

[GitHub] spark issue #15422: [SPARK-17850][Core]Add a flag to ignore corrupt files

2016-10-12 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 Merged - had issue with pip (new laptop, sigh), and so jira and pr did not get closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932947 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -588,6 +588,12 @@ object SQLConf { .doubleConf

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82933077 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -170,4 +170,9 @@ package object config { .doc("Port t

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932992 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -179,7 +183,16 @@ class NewHadoopRDD[K, V]( override def

[GitHub] spark pull request #15422: [SPARK-17850][Core]Add a flag to ignore corrupt f...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15422#discussion_r82932645 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -253,8 +256,12 @@ class HadoopRDD[K, V]( try { finished

[GitHub] spark issue #12524: [SPARK-12524][Core]DagScheduler may submit a task set fo...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12524 @seayi any progress on this ? Would be good to add this in if consistently reproducible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @srowen Since this is happening 'below' the user code (in the hadoop rdd), is there a way around how to handle this ? I agree that for a lot of usecases where it is critical to work off

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @zsxwing The map task is run by https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @marmbrus +1 on logging, that is definitely something which was probably missed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @zsxwing You are right, NewHadoopRDD is not handling this case. Probably would be good to add exception handling there when nextKeyValue throws exception ? Context is, for large jobs

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @srowen The tuples already returned would have been valid, it is the subsequent block decompression which has failed. For example, in a 1gb file, the last few bytes missing (or corrupt) will cause

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 Why would corrupt record cause EOFException to be thrown ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-11 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82816160 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,129 @@ +/* + * Licensed under the Apache License

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15408 Barring query to @rxin (regarding buffer pooling), I am fine with the change - pretty neat, thanks @sitalkedia ! Would be good if more eyeballs look at it though given how fundamental

[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15371#discussion_r82683704 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -444,7 +444,9 @@ class CollectionAccumulator[T] extends AccumulatorV2[T

[GitHub] spark pull request #15371: [SPARK-17816] [Core] Fix ConcurrentModificationEx...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15371#discussion_r82670700 --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala --- @@ -444,7 +444,9 @@ class CollectionAccumulator[T] extends AccumulatorV2[T

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15408 There is a behavioral change with this PR, which I am not sure is relevant. BufferedInputStream supports mark/reset, while we are not doing so here - does deserialization and other codepaths

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r8279 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665886 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665543 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82665139 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,127 @@ +/* + * Licensed under the Apache

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82664592 --- Diff: core/src/main/java/org/apache/spark/io/NioBasedBufferedFileInputStream.java --- @@ -0,0 +1,120 @@ +/* + * Licensed under the Apache

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 @zhzhan I am curious why this is the case for the jobs being mentioned. This pr should have an impact if the locality preference of the taskset being run is fairly suboptimal to begin

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 As I mentioned before, this is definitely a huge step in the right direction ! Having said that, I want to ensure we dont aggressively blacklist executors and nodes - at scale, I have seen

[GitHub] spark issue #15218: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15218 Btw, taking a step back, I am not sure this will work as you expect it to. Other than a few taskset's - those without locality information - the schedule is going to be highly biased

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 Thinking more, and based on what @squito mentioned, I was considering the following : Since we are primarily dealing with executor or nodes which are 'bad' as opposed to recoverable

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 If I understood the change correctly, a node can get blacklisted for a taskset if sufficient (even different) tasks fail on executers on it. Which can potentially cause all nodes

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @squito I am hoping we _can_ remove the old code/functionality actually (it is klunky very specific to single executor resource contention/shutdown usecase - unfortunately common enough to warrant

[GitHub] spark pull request #15218: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-06 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15218#discussion_r82296969 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @tgravescs re(1): It was typically observed when yarn is killing the executor. Usually when it run over the memory limits (not sure if it was happening during pre-emption also). --- If your

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @kayousterhout Agree with (1) - permanent blacklist will effectively work the same way for executor shutdown. Re(2) - A task failure is not necessarily only due to resource

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @squito I agree, you are right - even the functionality is blacklisting in both cases, the problem each is trying to tackle are quite different. Transient issues (resource, shutdown, etc

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @kayousterhout When an executor or node is shutting down it is actually at driver level (not just taskset level) - since all tasks would fail on executors when they are shutting down

[GitHub] spark issue #15374: [SPARK-17800] Introduce InterfaceStability annotation

2016-10-06 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15374 👍 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @squito Re:(a) The earlier behavior was used fairly extensively in various properties at yahoo (web search, groups, bigml, image search, etc). It was added for a specific failure mode

[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-10-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15249 @squito Thanks for clarifying, that makes some of the choices clearer ! A few points to better understand : a) timeout does not seem to be enforced in this pr. Which means

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

2016-10-04 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12436 I am curious how this is resilient to epoch changes which will be triggered due to executor loss for a shuffle task when its shuffle map task executor is gone. Wont it not create issues

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81866263 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81862583 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81857297 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81859869 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ExecutorFailuresInTaskSet.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81861056 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -243,8 +243,8 @@ private[spark] class TaskSchedulerImpl

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81857430 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81865635 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetBlacklist.scala --- @@ -0,0 +1,136 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...

2016-10-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81869821 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -766,9 +782,11 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-04 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r62129859 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -296,10 +290,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

2016-04-21 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/12506#issuecomment-213046545 No, you are right - this is called only from the event loop - which should ensure thread safety. I misread where the re-registeration was happening as outside

[GitHub] spark pull request: [SPARK-14736][core] Deadlock in registering ap...

2016-04-20 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/12506#issuecomment-212546285 No. Updates to waitingAppsWhileRecovering happens from different threads right ? Hence you will need to protect it. --- If your project is set up for it, you can

[GitHub] spark pull request: fixing SPARK-14736 Deadlock in registering app...

2016-04-19 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/12506#issuecomment-212145151 You will need to make this thread safe - the applications are added/re-registered from separate threads, right ? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/12113#issuecomment-207111462 Looks good to me; would be nice to get feedback from @rxin or @JoshRosen also ... --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58953996 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58950637 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -492,16 +624,51 @@ private[spark] object MapOutputTracker extends Logging

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58254590 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -477,12 +605,16 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58252476 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -477,12 +605,16 @@ private[spark] class MapOutputTrackerWorker(conf: SparkConf

[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-04-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/12113#discussion_r58252199 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -428,40 +503,93 @@ private[spark] class MapOutputTrackerMaster(conf: SparkConf

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

2016-02-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10838#issuecomment-178141964 @zsxwing Other than the PR title - I think everything else says inflight in code/documentation. Did you see anything to the contrary ? --- If your project is set up

[GitHub] spark pull request: [SPARK-10193] [core] [wip] Eliminate Skipped S...

2016-01-24 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/8427#issuecomment-174271331 Just a note about MapOutputTracker - it is fairly trivial to make it use bare minimum amount of memory even if it does not get cleaned up for 'old' stages : using

[GitHub] spark pull request: [SPARK-6166] Limit number of concurrent outbou...

2016-01-19 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10838#issuecomment-173009800 LGTM - though would be better for someone else to also review this since I might be too close to the code ! --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-6166] [Spark Core] Limit number of conc...

2015-12-30 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5852#issuecomment-168113370 Tom might be interested in this one. At our scale, without this change, spark does not work. Since I went through this once already and it was largely

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-12-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-165304077 @squito Unless an executor or node is dying/has resource issues, in most cases, the failure is not repeatable across tasksets due to differing execution/resource

[GitHub] spark pull request: [SPARK-7313] [Spark CORE] Configurable limit t...

2015-12-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/5848#issuecomment-165168162 What the exact value should be - I am not sure, the current value does seem to work well for most users - except for degenerate cases like ours : hence hardcoding

[GitHub] spark pull request: [SPARK-4681] Enable executor blacklisting by d...

2015-12-02 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10045#issuecomment-161370485 Sure, I am fine with it not being default behavior - but there should be some way to override the default behavior. For example here, as long as there exists a way

[GitHub] spark pull request: [SPARK-4681] Enable executor blacklisting by d...

2015-12-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10045#issuecomment-161130625 @kayousterhout Permanently disabling task on an executor can be problematic for a few reasons : a) When locality timeout's are aggressively high (user config, so we

[GitHub] spark pull request: [SPARK-4681] Enable executor blacklisting by d...

2015-12-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10045#issuecomment-161137821 @kayousterhout I have a job with 200k tasks where some tasks fail for as much as 22 times (no kidding - actual number) and then succeeds :-) Specific examples

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-12-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/8760#discussion_r46249345 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-12-01 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-160890985 Since I am no longer very familiar with scheduler codebase, I am sort of giving up on my review halfway through - save the notes already added. Hopefully someone else

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-11-30 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/8760#discussion_r46247212 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistStrategy.scala --- @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4681] Enable executor blacklisting by d...

2015-11-30 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/10045#issuecomment-160879820 @kayousterhout I would have preferred if this "feature" had been actually fixed properly by now - it was added as a temporary workaround (and so int

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-11-30 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/8760#discussion_r46248240 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-11-30 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/8760#discussion_r46248217 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...

2015-11-30 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-160884866 What would be more ideal is an implementation which allows for blacklisting single task (as is current), to multiple tasks for a taskset on a single executor (in case

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-17 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/7243 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-15 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7431#issuecomment-121762507 Looks good to me ! If Tom does not come back with objections I am +1 on this. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7243#issuecomment-119090177 @tgravescs pls take a look, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7243#issuecomment-119088417 The failures (in YarnClusterSuite) are unrelated to this change - and happen with a clean checkout as well. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/7243#discussion_r34098979 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -218,11 +218,12 @@ private[spark] class ApplicationMaster

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7243#issuecomment-119359281 @vanzin The endpoint is for messages from core to yarn if I am not wrong ? This would flow the other way - from yarn to core. Or am I missing something

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7243#issuecomment-119375419 @vanzin I do not disagree with you - it would indeed be cleaner, and I do agree with your point of view. I had not looked at client mode when I fixed this, and my

[GitHub] spark pull request: [SPARK-8297] [YARN] Scheduler backend is not n...

2015-07-07 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/7243#issuecomment-119362544 Hmm, I think I will punt on fixing the client and defend against NPE's with null checks - and leave it to someone more familiar with yarn-client mode to patch

<    4   5   6   7   8   9   10   11   12   13   >