[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4293#issuecomment-72509826 Updated patch adds instructions on how to avoid the exception and extends behavior to `NewHadoopRDD`. My opinion is still that this deserves an Exception rather

[GitHub] spark pull request: [Docs] Fix Building Spark link text

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4312#issuecomment-72515475 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3670#issuecomment-72500076 This keeps failing random different streaming tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3670#issuecomment-72500090 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5530] Add executor container to executo...

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4309#issuecomment-72500750 LGTM. It looks like I missed this in the shuffle of SPARK-1714. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-02-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23942054 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -47,9 +49,13 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: SPARK-5199. FS read metrics should support Com...

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4050#issuecomment-72503592 @rxin this should be fixed by SPARK-5492 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-02-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4293#issuecomment-72520547 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23899671 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -134,12 +136,29 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23899955 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -98,6 +121,9 @@ class YarnClusterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72377952 Thanks for adding the test and getting it to work @lianhuiwang. Had a few more comments, but this is looking close to me. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23899881 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +292,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23899688 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +292,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-02-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23899908 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +292,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: SPARK-5492. Thread statistics can break with o...

2015-02-01 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4305 SPARK-5492. Thread statistics can break with older Hadoop versions You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-5492

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-02-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4258#issuecomment-72405667 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23858269 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3976#issuecomment-72240509 This looks like the right approach. Added some comments inline. Are you able to add a test for this in `YarnClusterSuite`? Also, one last small thing

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23858127 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: remove redundant field childOutput from exec...

2015-01-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4291#issuecomment-72240710 Hi Kai, mind tagging this [SQL] so it can get properly sorted? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23857948 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5470][Core]use defaultClassLoader to lo...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4258#discussion_r23855340 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -97,7 +87,9 @@ class KryoSerializer(conf: SparkConf) // Use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23856550 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -165,6 +168,13 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23856503 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -138,8 +140,9 @@ object SparkSubmit { (clusterManager, deployMode) match

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23857534 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -103,11 +104,15 @@ private[spark] class ClientArguments(args: Array

[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3670#discussion_r23883301 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -513,13 +516,44 @@ private[spark] object Utils extends Logging { Files.move

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23884482 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -241,21 +242,22 @@ object DataWriteMethod extends Enumeration

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23884519 --- Diff: core/src/main/scala/org/apache/spark/ui/ToolTips.scala --- @@ -29,14 +29,15 @@ private[spark] object ToolTips { val SHUFFLE_READ_BLOCKED_TIME

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23856753 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23857834 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -148,7 +151,7 @@ object SparkSubmit { } // If we're

[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-01-30 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4293 SPARK-5500. Document that feeding hadoopFile into a shuffle operation wi... ...ll cause problems You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23857876 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23857919 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -267,10 +277,22 @@ object SparkSubmit { // In yarn-cluster mode, use

[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-01-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4293#issuecomment-72301039 @rxin @JoshRosen I like both of those ideas. Updated patch implements Josh's . Reynold's is a little more involved, but would be good to implement down the line as well

[GitHub] spark pull request: SPARK-5500. Document that feeding hadoopFile i...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4293#discussion_r23885959 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -308,6 +309,14 @@ class HadoopRDD[K, V]( // Do nothing. Hadoop RDD should

[GitHub] spark pull request: [SPARK-5173]support python application running...

2015-01-30 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3976#discussion_r23856668 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -103,11 +104,15 @@ private[spark] class ClientArguments(args: Array

[GitHub] spark pull request: [SPARK-5388] Provide a stable application subm...

2015-01-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4216#issuecomment-72289589 @pwendell @mateiz I realize I'm chiming in late here, but I think the primary concern is that the protocol doesn't satisfy the principle of least astonishment with respect

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23801023 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -350,9 +351,20 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23805531 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -760,7 +756,13 @@ object Client extends Logging

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23804603 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -154,20 +158,69 @@ private[spark] object

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23805723 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -451,8 +452,20 @@ private[spark] class ApplicationMaster

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23803402 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala --- @@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23805787 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala --- @@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23804949 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorURLClassLoader.scala --- @@ -32,36 +36,52 @@ private[spark] trait MutableURLClassLoader extends

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23805094 --- Diff: docs/configuration.md --- @@ -285,13 +285,13 @@ Apart from these, the following properties are also available, and may be useful /td /tr

[GitHub] spark pull request: SPARK-5458. Refer to aggregateByKey instead of...

2015-01-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4251#issuecomment-71877377 Exactly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3233#issuecomment-71943787 The new spark.driver.userClassPathFirst property seems a little strange to me in that, IIUC, it only takes effect when the driver is started through the application master

[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3233#discussion_r23738607 --- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala --- @@ -375,4 +390,64 @@ private[spark] object SparkConf { def isSparkPortConf(name

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71957158 @mengxr makes sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23699876 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -63,7 +63,7 @@ class DAGScheduler( mapOutputTracker

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23699799 --- Diff: core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala --- @@ -106,18 +107,30 @@ class SparkHadoopWriter(@transient jobConf: JobConf

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r2369 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -908,6 +912,11 @@ class DAGScheduler( val task = event.task

[GitHub] spark pull request: [SPARK-4879] Use the Spark driver to authorize...

2015-01-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4155#discussion_r23700080 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-5458. Refer to aggregateByKey instead of...

2015-01-28 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4251 SPARK-5458. Refer to aggregateByKey instead of combineByKey in docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-5458

[GitHub] spark pull request: SPARK-5458. Refer to aggregateByKey instead of...

2015-01-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4251#issuecomment-71897120 Good point, updated the patch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71765593 @rxin I mean making the `reduce` action able to do a tree reduce underneath. So all reduces are tree reduces, but the default number of levels is 1. --- If your

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71768075 My thinking was just to simplify the API. I.e. if numLevels is 1, we could branch to the old implementation. I don't have a strong opinion either way, but was thinking

[GitHub] spark pull request: [SPARK-5416] init Executor.threadPool before E...

2015-01-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4212#issuecomment-71740268 This looks reasonable. Was initially worried that changing the order might mess with the effect of classloader instantiation on the thread pool but on deeper inspection

[GitHub] spark pull request: [SPARK-5430] move treeReduce and treeAggregate...

2015-01-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4228#issuecomment-71756156 I think this would be a great API to add. Have you weighed adding a numLevels argument to `reduce` itself instead of a new method? --- If your project is set up

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-71510560 What would be the advantage of limiting executor requests to the number of NMs in the cluster? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: Spark 3789

2015-01-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4205#issuecomment-71496062 Hi @kdatta, mind giving this a descriptive title and the [GRAPHX] tag so it can get sorted properly? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: Spark 3789

2015-01-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4205#issuecomment-71499617 Exactly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4050#issuecomment-71537583 If we use a inputFormat that don‘t instanc of org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}, then we can't get information of input metrics

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4192#discussion_r23563321 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -60,6 +62,9 @@ private[yarn] class YarnAllocator( import

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4192#discussion_r23564448 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -60,6 +62,9 @@ private[yarn] class YarnAllocator( import

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4168#discussion_r23566205 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -199,14 +199,31 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4192#discussion_r23591435 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -60,6 +62,9 @@ private[yarn] class YarnAllocator( import

[GitHub] spark pull request: SPARK-5199. FS read metrics should support Com...

2015-01-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4050#issuecomment-71563355 Edited the JIRA title and added tests for the CombineFileSplits. Tested both against Hadoop 2.3 (which doesn't support getFSBytesReadCallback) and Hadoop 2.5 (which does

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4050#discussion_r23563465 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -218,13 +219,14 @@ class HadoopRDD[K, V]( // Find a function

[GitHub] spark pull request: SPARK-4337. [YARN] Add ability to cancel pendi...

2015-01-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4141#discussion_r23564546 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -192,15 +186,32 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-71406518 @sryza Is the point of not requiring these configs that the users don't really know how many executors they actually want? Exactly. From my perspective, one

[GitHub] spark pull request: SPARK-5393. Flood of util.RackResolver log mes...

2015-01-24 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4192 SPARK-5393. Flood of util.RackResolver log messages after SPARK-1714 You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-5393

[GitHub] spark pull request: SPARK-4136. Under dynamic allocation, cancel o...

2015-01-22 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4168 SPARK-4136. Under dynamic allocation, cancel outstanding executor requests when no longer needed [WIP] This takes advantage of the changes made in SPARK-4337 to cancel pending requests to YARN when

[GitHub] spark pull request: [SPARK-5347][CORE] Change FileSplit to InputSp...

2015-01-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4150#issuecomment-71055028 I think this is a duplicate of #4050, which only adds support for `CombineFileSplit`s. We shouldn't add support for generic `InputSplit`s because many input formats do

[GitHub] spark pull request: SPARK-5370. [YARN] Remove some unnecessary syn...

2015-01-22 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4164 SPARK-5370. [YARN] Remove some unnecessary synchronization in YarnAlloca... ...tor You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark

[GitHub] spark pull request: SPARK-4337. Add ability to cancel pending requ...

2015-01-21 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4141 SPARK-4337. Add ability to cancel pending requests to YARN You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-4337

[GitHub] spark pull request: SPARK-4786: Parquet filter pushdown for castab...

2015-01-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4156#issuecomment-70976336 Hi @saucam, mind tagging this with [SQL] so it can get properly sorted? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23284264 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -82,7 +82,15 @@ private[hash] object

[GitHub] spark pull request: [SPARK-5336][YARN]spark.executor.cores must no...

2015-01-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4123#issuecomment-70799955 We should warn about this in standalone mode and mesos coarse grained mode as well, right? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23284115 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -238,6 +245,10 @@ case class InputMetrics(readMethod: DataReadMethod.Value

[GitHub] spark pull request: [SPARK-5336][YARN]spark.executor.cores must no...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4123#discussion_r23283721 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -19,7 +19,7 @@ package org.apache.spark.deploy.yarn import

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23284251 --- Diff: core/src/main/scala/org/apache/spark/shuffle/hash/BlockStoreShuffleFetcher.scala --- @@ -82,7 +82,15 @@ private[hash] object

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23284154 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -317,4 +338,9 @@ class ShuffleWriteMetrics extends Serializable

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23284162 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -41,7 +41,7 @@ import org.apache.spark._ import

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-20 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3765#discussion_r23263022 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -153,498 +154,241 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3765#issuecomment-70762709 @tgravescs, uploaded a new patch that addresses your review comments. I just ran a bunch of manual tests on a 6-node, including * request more resources than

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-20 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3765#discussion_r23261539 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -153,498 +154,241 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2015-01-20 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/3765#discussion_r23261733 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -153,498 +154,241 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-19 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-70613708 I updated the patch to add a `spark.dynamicAllocation.initialExecutors` property. I also removed the requirement to set min/maxExecutors, so the user now only needs

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-19 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4051#discussion_r23205045 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -73,12 +73,12 @@ private[spark] class ClientArguments(args: Array

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-19 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70607997 @pwendell sorry, was out for the weekend, but this LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-01-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3670#issuecomment-70227794 Ok here's a version that's ready for review. It still needs a little more doc, polish, and test or two, but would like to get validation on the approach. --- If your

[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4051#issuecomment-70295188 To distill the motivation on the JIRA and make sure we're on the same page: in most situations (including Hive-on-Spark), users don't or can't know how many resources

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4020#issuecomment-70189173 Had a look over, and this mostly looks good, but it looks like there are many places where the patch replaces assigning with incrementing. It would be good to take

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4020#discussion_r23055589 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -257,8 +257,8 @@ private[spark] class Executor( val serviceTime

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4020#discussion_r23057341 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -240,10 +284,18 @@ class ShuffleWriteMetrics extends Serializable

[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...

2015-01-15 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4067#discussion_r23061158 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -17,6 +17,8 @@ package org.apache.spark +import

[GitHub] spark pull request: [SPARK-3288] All fields in TaskMetrics should ...

2015-01-15 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4020#discussion_r23057216 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -257,8 +257,8 @@ private[spark] class Executor( val serviceTime

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/4050 SPARK-5199. Input metrics should show up for InputFormats that return Co... ...mbineFileSplits You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4050#discussion_r22972019 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -219,6 +220,9 @@ class HadoopRDD[K, V]( val bytesReadCallback

<    2   3   4   5   6   7   8   9   10   11   >