[GitHub] spark pull request #14638: [SPARK-11374][SQL] Support `skip.header.line.coun...

2016-12-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/14638#discussion_r91331724 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -113,6 +113,9 @@ class HadoopTableReader( val tablePath

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914921 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -209,6 +209,41 @@ class JobCancellationSuite extends SparkFunSuite

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914715 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -161,12 +163,7 @@ private[spark] class Executor( * @param

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914906 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -209,6 +209,83 @@ class JobCancellationSuite extends SparkFunSuite

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914792 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -229,9 +259,12 @@ private[spark] class Executor

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914957 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +465,93 @@ private[spark] class Executor

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914652 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -148,9 +161,23 @@ private[spark] class Executor( } def

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914887 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +465,93 @@ private[spark] class Executor

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914831 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +465,93 @@ private[spark] class Executor

[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r92914965 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -148,9 +161,23 @@ private[spark] class Executor( } def

[GitHub] spark pull request #16261: [SPARK-18836] [CORE] Serialize one copy of task m...

2016-12-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16261#discussion_r92914999 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1009,13 +1009,14 @@ class DAGScheduler( } val

[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...

2017-01-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96130960 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use a global

[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...

2017-01-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96131251 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1726,6 +1729,10 @@ abstract class RDD[T: ClassTag]( // checkpoint

[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...

2017-01-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16576#discussion_r96131096 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag]( // NOTE: we use a global

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107773629 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107797382 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -215,7 +215,7 @@ private[spark] class PythonRunner

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107797887 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +479,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-23 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107631487 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107232894 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -140,16 +140,22 @@ private[spark] class TaskContextImpl

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107233146 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -302,12 +298,12 @@ private[spark] class Executor

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107239694 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107235395 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -569,8 +575,10 @@ class SparkContextSuite extends SparkFunSuite

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107237325 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -215,7 +215,8 @@ private[spark] class PythonRunner

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107235703 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107234055 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl( /** List of callback

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-21 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107234445 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -160,15 +160,20 @@ private[spark] abstract class Task[T]( // A flag

[GitHub] spark issue #17290: [SPARK-16599][CORE] java.util.NoSuchElementException: No...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17290 I agree with @srowen I dont see how this change affects the test. `blocksWithReleasedLocks` should be unchanged w.r.t this test. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-24 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 LGTM will wait a bit to allow for others to comment. @zsxwing can you also take a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-27 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108103956 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator<Produ

[GitHub] spark issue #17276: [SPARK-19937] Collect metrics of block sizes when shuffl...

2017-03-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17276 @jinxing64 If the intent behind these metrics is to help with SPARK-19659, it would be good to either add it as part of SPARK-19659 or subsequently (once the feature is merged

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052747 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator<Produ

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052753 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java --- @@ -169,6 +173,36 @@ public void write(Iterator<Produ

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052849 --- Diff: core/src/main/scala/org/apache/spark/executor/ShuffleReadMetrics.scala --- @@ -80,13 +86,15 @@ class ShuffleReadMetrics private[spark] () extends

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052758 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java --- @@ -228,6 +230,36 @@ void closeAndWriteOutput() throws IOException

[GitHub] spark pull request #17276: [SPARK-19937] Collect metrics of block sizes when...

2017-03-26 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17276#discussion_r108052804 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -72,6 +72,27 @@ private[spark] class SortShuffleWriter[K, V, C

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 If we make flush() noop, then buffered (uncommitted) data wont be written to the stream; am I missing something here, or is this change broken ? --- If your project is set up for it, you can reply

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 Ah, looks like I missed that CountingOutputStream was introduced after BOS and not before. Looks good to me. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17343: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-03-19 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17343 Background - you need to do a flush() to ensure the indices generated are valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16867: [SPARK-16929] Improve performance when check speculatabl...

2017-03-18 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16867 LGTM @kayousterhout , @squito. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106788877 --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala --- @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789380 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106788727 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -354,7 +354,7 @@ private[spark] object UIUtils extends Logging

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789297 --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala --- @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r106789004 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17290: [SPARK-16599][CORE] java.util.NoSuchElementExcept...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17290#discussion_r106788142 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala --- @@ -340,7 +340,7 @@ private[storage] class BlockInfoManager extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106268712 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -63,12 +83,40 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106063310 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -48,12 +50,30 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779317 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778005 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -56,6 +57,43 @@ private[spark] class BlockResult( val bytes: Long

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778650 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -79,6 +81,11 @@ private[spark] class DiskBlockManager(conf: SparkConf

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778688 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778760 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106264689 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -63,12 +83,40 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778914 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778813 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779213 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -17,48 +17,61 @@ package org.apache.spark.storage -import

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779546 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779004 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -34,6 +34,8 @@ import org.apache.spark.util.{ShutdownHookManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106269093 --- Diff: core/src/main/scala/org/apache/spark/security/CryptoStreamUtils.scala --- @@ -102,4 +150,34 @@ private[spark] object CryptoStreamUtils extends

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r10677 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -94,7 +101,11 @@ private[spark] class DiskBlockManager(conf: SparkConf

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106778932 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark pull request #17295: [SPARK-19556][core] Do not encrypt block manager ...

2017-03-18 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17295#discussion_r106779457 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskStore.scala --- @@ -73,55 +86,219 @@ private[spark] class DiskStore(conf: SparkConf, diskManager

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-21 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17166 Hi @kayousterhout, Can you take over reviewing this PR ? I might be tied up with other things for next couple of weeks, and I dont want @ericl's work to be blocked on me. Thx

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r107345386 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r107352668 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -555,12 +555,15 @@ private[spark] class BlockManager

[GitHub] spark pull request #17300: [SPARK-19956][Core]Optimize a location order of b...

2017-03-22 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17300#discussion_r107352378 --- Diff: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala --- @@ -500,6 +500,30 @@ class BlockManagerSuite extends SparkFunSuite

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 > First, keep in mind that there's no metadata that tells the receiver whether a block is encrypted or not. This means that methods like BlockManager.get, which can read block data from eit

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 Just to be clear, I would prefer if we consistently did things - either encrypt all blocks while transferring (irrespective of sasl being enabled or not); or depend only on sasl for channel

[GitHub] spark pull request #17109: [SPARK-19740][MESOS]Add support in Spark to pass ...

2017-03-14 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17109#discussion_r106052638 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -99,6 +99,26 @@ private

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 I have not looked at the implementation in detail, but can you comment on why the change w.r.t plain text block data to remote executor ? Isn't it not simpler to transmit block contents

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-03-14 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17295 > Not really sure what you mean here. But transferring encrypted data without RPC encryption is not really secure, since the encryption key is transferred to executors using an RPC. There's e

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r109865317 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor

[GitHub] spark issue #17533: [WIP][SPARK-20219] Schedule tasks based on size of input...

2017-04-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17533 Tasks are scheduled by locality (which includes shuffle tasks too to some extent). This is making a lot of state mutable within TSM - is there any tests done which show improvements due

[GitHub] spark issue #17596: [SPARK-12837][SQL] reduce the serialized size of accumul...

2017-04-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17596 The approach I took for this was slightly different. * Create a bitmask indicating which accumulators are required in TaskMetrics - that is, have non zero values, and emit this first

[GitHub] spark issue #17610: [SPARK-20131][Core]Use a separate lock for StandaloneSch...

2017-04-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17610 Isn't it not depending on 'this' being locked in super class method ? If it is not, then why not simply restrict lock to changing of `stopping` flag (if already set, return, else set and proceed

[GitHub] spark issue #17610: [SPARK-20131][Core]Use a separate lock for StandaloneSch...

2017-04-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17610 bq. I don't get it. But I think the stack trace shows why this dead-lock happens. Based on your description/stacktrace, I get why the deadlock happens - what I meant was, do any of the super

[GitHub] spark issue #17024: [SPARK-19525][CORE] Compressing checkpoints.

2017-04-12 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17024 @aramesh117 Unfortunately, since this heavily affects streaming, I cannot sign off on it without someone more familiar with spark streaming reviews it as well. --- If your project is set up

[GitHub] spark issue #17596: [SPARK-12837][CORE] reduce the serialized size of accumu...

2017-04-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17596 My comments were based on a fix in 1.6; actually lot of values were actually observed to be 0 for a lot of cases - just a few were not (even here it is relevant - resultSize, gctime, various bytes

[GitHub] spark issue #17088: [SPARK-19753][CORE] Un-register all shuffle output on a ...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17088 +CC @tgravescs You might be interested in this given your comments on on the blacklisting PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16639: [SPARK-19276][CORE] Fetch Failure handling robust to use...

2017-03-02 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16639 @squito It looks good to me, thanks for the changes ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #17220: remove tungsten-sort.Because it is not represent 'org.ap...

2017-03-08 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17220 This was an exposed parameter, we cannot remove it - irrespective of the duplication. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104769358 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104772015 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104773405 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -732,6 +732,13 @@ class DAGScheduler

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-07 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17166#discussion_r104770757 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -168,7 +168,8 @@ private[spark] class Executor( case Some

[GitHub] spark issue #17244: [SPARK-19889][SQL] Make TaskContext callbacks thread saf...

2017-03-12 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17244 LGTM. Would be great if other reviewers can also take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489927 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489232 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105489812 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105490080 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether

[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17238 This is something to be configured appropriately at the lower level (namely topology config), and should be handled there Special casing this in spark is not right. Two scenarios

[GitHub] spark pull request #17244: [SPARK-19889][SQL] Make TaskContext callbacks thr...

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17244#discussion_r105510888 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -57,57 +68,75 @@ private[spark] class TaskContextImpl( // Whether

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319503 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1028,7 +1028,7 @@ class DAGScheduler( val locs

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526180 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1006,7 +1006,7 @@ class DAGScheduler( runningStages

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526086 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUser.scala --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16788 @tgravescs , @vanzin - this PR for mesos changes how spark handles kerberos tokens fundamentally; would be good to have your views. +CC @jerryshao to also look at the PR, since you have

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319198 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -151,9 +152,13 @@ object SparkSubmit extends CommandLineUtils { val

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105526225 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -60,9 +60,10 @@ private[spark] class ResultTask[T, U

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105319759 --- Diff: core/src/main/scala/org/apache/spark/scheduler/KerberosUtil.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #16788: [SPARK-16742] Kerberos impersonation support

2017-03-10 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/16788#discussion_r105318927 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -257,10 +257,6 @@ private[deploy] class SparkSubmitArguments(args

<    3   4   5   6   7   8   9   10   11   12   >