[GitHub] spark pull request #16189: [SPARK-18761][CORE][WIP] Introduce "task reaper" ...

2016-12-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91257339 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +435,57 @@ private[spark] class Executor

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16098 @srowen This is where try-with-resources is really useful in java. Util.tryWithSafeFinallyAndFailureCallbacks is an attempt at doing that in scala with mixed results. I agree

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16098 That looks like a bug in ParquetRecordWriter (the contract of close() is unambiguous) ... but then, I guess there is no point in fighting against buggy code : we have to integrate with a lot

[GitHub] spark issue #16098: [SPARK-18672][CORE] Close recordwriter in SparkHadoopMap...

2016-12-01 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16098 close is idempotent - simply close it before 'committer.commitTask(taskContext)' and you should be done. The second close in finally will become no-op in case of successful commit, and will handle

[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15736 (Particularly) as the number of partitions increase, "if (a._1 != b._1)" might be better for bpt reasons. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...

2016-11-29 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15861 @jiangxb1987 I did a single pass review - particularly given the similarities in both the codepaths and the classnames, I will need to go over it again to ensure we dont miss anything

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r87708046 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90119536 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90124359 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90121670 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90129259 --- Diff: core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala --- @@ -561,7 +561,7 @@ class PairRDDFunctionsSuite extends SparkFunSuite

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r88077635 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90122251 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90121527 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90120144 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90127987 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala --- @@ -0,0 +1,408 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r88075283 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapRedCommitProtocol.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90129155 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1089,66 +1064,10 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark pull request #15861: [SPARK-18294][CORE] Implement commit protocol to ...

2016-11-29 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15861#discussion_r90116879 --- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala --- @@ -1016,11 +1013,6 @@ class PairRDDFunctions[K, V](self: RDD[(K, V

[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-29 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16038 Without understanding the specifics of the ML part here - wont the actual impact of a large dense vector on Task 'bytes' be minimal at best ? We do compress the task binary; and 1B zero's should

[GitHub] spark issue #15861: [SPARK-18294][CORE] Implement commit protocol to support...

2016-11-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15861 @rxin I did see this PR, unfortunately it is a bit big and I am tied up with other things - cant get to it for next few days. --- If your project is set up for it, you can reply

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Merging into master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Looks good to me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 Interesting, thanks for clarifying. The way I was looking at the change was, given we have ability for custom credential providers now, we would need to support for out-of-band

[GitHub] spark issue #15563: [SPARK-16759][CORE] Add a configuration property to pass...

2016-11-09 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15563 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15823 LGTM. Not at my laptop, would be great if you can merge @rxin, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 @vanzin You are right, this does look orthogonal to what you and Tom described. 'Where' the renewal is happening is the difference - from within spark or from outside of spark (oozie/from

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87116199 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -189,6 +190,39 @@ private[spark] abstract class

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 @tgravescs Interesting, so it means that there wont be a --principal/--keytab, and what is being done in AMCredentialRenewer will be done in the launcher itself ? So the driver behaves just like

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87072908 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -64,7 +64,7 @@ class

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87075067 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -138,6 +138,14 @@ class SparkHadoopUtil extends Logging

[GitHub] spark pull request #14789: [SPARK-17209][YARN] Add the ability to manually u...

2016-11-08 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14789#discussion_r87078341 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala --- @@ -189,6 +190,39 @@ private[spark] abstract class

[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/14789 Hi @tgravescs, This is specifically for the case where new input source/output destinations are getting added after the driver has already started. Notebooks, long running streaming

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r86874659 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2569,7 +2572,8 @@ private[spark] class CallerContext

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r86874432 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2591,6 +2595,16 @@ private[spark] class CallerContext

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r86874386 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2582,7 +2586,7 @@ private[spark] class CallerContext( val callerContext

[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15563#discussion_r86873923 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2523,6 +2524,8 @@ private[spark] class CallerContext( taskId: Option[Long

[GitHub] spark issue #15618: [SPARK-14914][CORE] Fix Resource not closed after using,...

2016-11-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15618 LGTM, merging it into master. Thx @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15769: [SPARK-18191][CORE] Port RDD API to use commit pr...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15769#discussion_r86855791 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala --- @@ -0,0 +1,243 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15769: [SPARK-18191][CORE] Port RDD API to use commit pr...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15769#discussion_r86859239 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala --- @@ -0,0 +1,243 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15769: [SPARK-18191][CORE] Port RDD API to use commit pr...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15769#discussion_r86858760 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala --- @@ -0,0 +1,243 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #15769: [SPARK-18191][CORE] Port RDD API to use commit pr...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15769#discussion_r86856937 --- Diff: core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala --- @@ -0,0 +1,243 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #15743: [SPARK-18236] Reduce duplicate objects in Spark UI and H...

2016-11-07 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15743 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86841779 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -197,8 +196,32 @@ private[spark] object UIData { shuffleWriteMetrics

[GitHub] spark pull request #15752: [SPARK-18250] [SQL] Minor fixes to UTF8String

2016-11-03 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/15752 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15752: [SPARK-18250] [SQL] Minor fixes to UTF8String

2016-11-03 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/15752 [SPARK-18250] [SQL] Minor fixes to UTF8String ## What changes were proposed in this pull request? A few minor changes to utf8 string which were observed in the code. ## How

[GitHub] spark issue #15743: [SPARK-18236] Reduce duplicate objects in Spark UI and H...

2016-11-03 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15743 Looks good to me, some minor queries and comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86312868 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -197,8 +196,32 @@ private[spark] object UIData { shuffleWriteMetrics

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86312686 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -197,8 +196,32 @@ private[spark] object UIData { shuffleWriteMetrics

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86312418 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1089,7 +1089,7 @@ class DAGScheduler( // To avoid UI cruft

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86311221 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -885,8 +885,8 @@ private[spark] object JsonProtocol { if (json

[GitHub] spark pull request #15743: [SPARK-18236] Reduce duplicate objects in Spark U...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15743#discussion_r86310825 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -885,8 +885,8 @@ private[spark] object JsonProtocol { if (json

[GitHub] spark issue #15743: [SPARK-18236] Reduce duplicate objects in Spark UI and H...

2016-11-03 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15743 +1 on UseStringDeduplication ! Though jdk8 specific --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15741: [SPARK-18200][GRAPHX] Support zero as an initial ...

2016-11-03 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15741#discussion_r86296072 --- Diff: core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala --- @@ -271,8 +271,12 @@ class OpenHashSet[@specialized(Long, Int) T

[GitHub] spark issue #15731: [SPARK-18219] Move commit protocol API from sql/core to ...

2016-11-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15731 Thx for clarifying @rxin, looks good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15731: [SPARK-18219] Move commit protocol API from sql/core to ...

2016-11-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15731 I am not sure I follow, why do we need to redefine OutputCommitter from mapreduce with slightly differing api ? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15713: [SPARK-18196] [CORE] Optimise CompactBuffer implementati...

2016-11-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15713 @andrewcraik If we are simply delegating to ArrayBuffer, it makes sense to remove the class - it is an additional object created for each key being aggregated. Alternative would be to implement

[GitHub] spark pull request #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuff...

2016-11-02 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15736#discussion_r86173509 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedAppendOnlyMap.scala --- @@ -30,7 +30,21 @@ private[spark] class

[GitHub] spark issue #15722: [SPARK-18208] [Shuffle] Executor OOM due to a growing Lo...

2016-11-01 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15722 I agree with @jiexiong's analysis : @davies to support the scenario you mentioned, we should not be free'ing the pages. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #15722: [SPARK-18208] [Shuffle] Executor OOM due to a mem...

2016-11-01 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15722#discussion_r86046017 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -903,11 +906,12 @@ public void reset() { numKeys = 0

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-11-01 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Closing PR given pushback to commit'ing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-11-01 Thread mridulm
Github user mridulm closed the pull request at: https://github.com/apache/spark/pull/15553 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #15713: [SPARK-18196] [CORE] Optimise CompactBuffer implementati...

2016-11-01 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15713 +CC @rxin who can elaborate better. The reason iirc why the first two variables are inline is to do with usual low cardinality of the buffer - it effectively comes for "free" due

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859191 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -55,6 +65,11 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85860115 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85872283 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -307,6 +387,7 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859768 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,67 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859164 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -46,6 +50,12 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859942 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r85859906 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala --- @@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] SparkContext.addFile doesn't w...

2016-10-31 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15669 I agree with Tom, for yarn cluster mode we should rely on distributed cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-28 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Thanks for the comments. Specifically for our usecase, this is for automated full builds, where there is a container teardown after each (so that builds dont pollute each other : new

[GitHub] spark pull request #15618: [SPARK-14914][CORE] Fix Resource not closed after...

2016-10-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15618#discussion_r85436193 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -194,10 +194,13 @@ class ReceiverTracker(ssc

[GitHub] spark pull request #15618: [SPARK-14914][CORE] Fix Resource not closed after...

2016-10-28 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15618#discussion_r85435956 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala --- @@ -209,7 +209,8 @@ class TaskResultGetterSuite extends

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Since I don't have much experience with the build system, I will defer to your opinion about it. Having said that, I want to know what can be done to add support for avoiding test compilation

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 I will leave it open for a couple of days more to solicit suggestions /comments : if there is a way to reduce build complexity of the change please do let me know : would be great

[GitHub] spark issue #15632: [SPARK-18105] fix buffer overflow in LZ4

2016-10-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15632 @davies that check looks wrong - behaviorally we might not be hitting it due to other preconditions enforced ; worrying to see this On Oct 26, 2016 10:18 AM, "Davies Liu" &

[GitHub] spark issue #15612: [SPARK-18078] Add zipPartitionsWithPreferredLocation fun...

2016-10-26 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15612 @rxin I am not sure I like this better than previous version. We have traded an RDD reference for a boolean, and enforced constrain that the 'major' RDD must be the first to get zipped against

[GitHub] spark pull request #15612: [SPARK-18078] Add zipPartitionsWithPreferredLocat...

2016-10-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15612#discussion_r85227464 --- Diff: core/src/main/scala/org/apache/spark/rdd/ZippedPartitionsRDD.scala --- @@ -45,7 +45,8 @@ private[spark] class ZippedPartitionsPartition

[GitHub] spark issue #15618: [SPARK-14914][CORE] Fix Resource not closed after using,...

2016-10-25 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15618 @HyukjinKwon So the idea is that you acquire resources required and dont need to track it by wrapping them in Utils.tryWithResource (similar to memory management in jvm). As an example

[GitHub] spark issue #15632: [SPARK-18105] fix buffer overflow in LZ4

2016-10-25 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15632 @srowen The last update to the upstream library was 1.5+ years ago. Is it still maintained ? You are right, pushing it there and getting it released it might be good - if possible

[GitHub] spark issue #15632: [SPARK-18105] fix buffer overflow in LZ4

2016-10-25 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15632 Ah, nice catch ! LGTM, can you merge it when build it done ? Thx. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15618: [SPARK-14914][CORE] Fix Resource not closed after using,...

2016-10-25 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15618 There are a bunch of methods in Utils which actually nicely apply to this PR. * Utils.tryWithSafeFinally * Utils.tryWithResource, etc They also handle exception suppression, etc

[GitHub] spark pull request #15612: [SPARK-18078] Add option for customize zipPartiti...

2016-10-24 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15612#discussion_r84742854 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -848,39 +849,44 @@ abstract class RDD[T: ClassTag]( * of elements in each

[GitHub] spark pull request #15612: [SPARK-18078] Add option for customize zipPartiti...

2016-10-24 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15612#discussion_r84743321 --- Diff: core/src/main/scala/org/apache/spark/rdd/ZippedPartitionsRDD.scala --- @@ -58,10 +63,14 @@ private[spark] abstract class ZippedPartitionsBaseRDD

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-23 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-23 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 @srowen This is explicitly supports maven.test.skip=true - the rationale for using it can be project specific (generating clean builds for various profile combinations, internal release builds from

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-22 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 @srowen -Dmaven.test.skip=true is the documented way in maven to skip test compilation (http://maven.apache.org/surefire/maven-surefire-plugin/examples/skipping-test.html) which we

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84533967 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark issue #15553: [SPARK-18008] [build] Add support for -Dmaven.test.skip=...

2016-10-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15553 Changes done, please do let me know of any other comments, thanks ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #15579: Added support for extra command in front of spark.

2016-10-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15579 Btw, curious if you have actually tested this in yarn - I have a feeling it wont work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84374076 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-20 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84225583 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 Ah ! Apologies, I got confused. Yes, I agree, that is a better approach. It also means we can get rid of the RemoveExecutor pattern match from receive right ? As it stands now, that looks

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing I think the issue is that case RemoveExecutor() is not identical to what exists in receiveAndReply Any reason 'executorDataMap.get(executorId).foreach(_.executorEndpoint.send

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15481 @zsxwing To minimize scope of synchronized block. The way @scwf has now, the synchronized block is limited to duplicating key and setting some state. Remaining can happen outside of the lock

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84160226 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15541: [SPARK-17637][Scheduler]Packed scheduling for Spa...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15541#discussion_r84154979 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala --- @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84081078 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

[GitHub] spark pull request #15553: [SPARK-18008] [build] Add support for -Dmaven.tes...

2016-10-19 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/15553#discussion_r84057228 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -41,11 +41,8 @@ ${project.version} - org.apache.spark - spark

<    3   4   5   6   7   8   9   10   11   12   >