[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

2015-04-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5577#issuecomment-97134710 As it bumps numRecordsWritten both on key and on value, it looks like this change would make us double count the number of records written. --- If your project is set up

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-97237212 I sync'd up with @andrewor14 offline about this. To summarize my understanding of his concern, the current patch has non-optimal behavior in the following (common

[GitHub] spark pull request: SPARK-6954. [YARN] ExecutorAllocationManager c...

2015-04-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5704#issuecomment-97237326 I'll update the current patch to include this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-97151475 My opinion is that the main criteria for including this are: * Is there an intention for and clear path to a state where this could be used in production? I think it's

[GitHub] spark pull request: [minor] [core] Warn users who try to cache RDD...

2015-04-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5751#issuecomment-97173195 Minor comment. Otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [minor] [core] Warn users who try to cache RDD...

2015-04-28 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5751#discussion_r29277660 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -1387,6 +1387,11 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29190908 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29192947 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29191148 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29191682 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29193245 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...

2015-04-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5716#issuecomment-96741889 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-6653][yarn] New config to specify port ...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5719#discussion_r29167101 --- Diff: docs/running-on-yarn.md --- @@ -134,6 +134,13 @@ Most of the configs are the same for Spark on YARN as for other deployment modes /td /tr

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-96744404 That's correct. Only for executor, but I updated the config naming after your earlier comment so that we have room to add it for the AM in the future. I just filed SPARK

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29171158 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4450#issuecomment-96844045 Updated patch rebases on master and incorporates @pwendell 's review comments. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-6954. ExecutorAllocationManager can end ...

2015-04-26 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/5704 SPARK-6954. ExecutorAllocationManager can end up requesting a negative n... ...umber of executors You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-26 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-96334096 #5704 demonstrates what I outlined in my above comment, and should supersede this PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112442 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-26 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29112697 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5676#issuecomment-95822131 This looks like a duplicate of SPARK-6954 (PR #5536) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3376] Add in-memory shuffle option.

2015-04-23 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5403#issuecomment-95686597 I've seen a lot of experimental features go into Hadoop and then slowly wilt from lack of use and maintenance. My two cents are that we should only include

[GitHub] spark pull request: [SPARK-7046] Remove InputMetrics from BlockRes...

2015-04-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5627#issuecomment-95259018 LGTM. One tiny nit is that `import org.apache.spark.executor._` could go to `import org.apache.spark.executor.{DataReadMethod, ShuffleWriteMetrics}` --- If your

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r28892668 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -113,11 +114,21 @@ private[spark] class ExternalSorter[K, V, C

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r28892612 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4450#issuecomment-95272806 Thanks for the review Patrick. Regarding Kryo: I would be really really surprised if Kryo were to change its serialization format in such a drastic way without

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94915882 Thanks for putting together those graphs Cheolsoo. I think they're really helpful for thinking about this. The side effect that I'm worried about

[GitHub] spark pull request: [SPARK-6738] [CORE] Improve estimate the size ...

2015-04-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5608#issuecomment-94910837 @srowen I haven't had a chance to look at this in detail yet, but, with SizeEstimator, we're purely trying to understand the size of the object in memory - the disk

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94945086 Thanks Andrew, I think you've nailed the part of the problem that I was missing: when executors die or are killed, executorIds.size decreases. If numPendingExecutors

[GitHub] spark pull request: [Minor] Comment improvements in ExternalSorter...

2015-04-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5620#issuecomment-95010096 This LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-7037] [CORE] Inconsistent behavior for ...

2015-04-21 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5617#discussion_r28842177 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -61,7 +61,16 @@ private[deploy] class SparkSubmitArguments(args: Seq

[GitHub] spark pull request: [WIP][SPARK-6986][CORE]Make SerializationStrea...

2015-04-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5577#issuecomment-94563008 My patch for SPARK-4550 ( #4450 ) makes a similar change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94595477 Mind if I take a look before you merge? I think I'm the last one to have touched this code. On Apr 20, 2015, at 4:40 PM, andrewor14 notificati...@github.com wrote

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94610887 I've thought about this more and think I'm able to articulate why I think the present fix is incorrect. Aside from the weirdness of the idea that there can

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-20 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-94611689 More broadly: I agree with you that the current code is really really difficult to reason about. When I added the ability to cancel executor requests, I spent

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-93665373 It would be good to come up with a test that can reproduce the issue. I believe that it is actually supposed to be acceptable for numExecutorsPending to sit below 0

[GitHub] spark pull request: [SPARK-6869][PySpark] Add pyspark archives pat...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93778753 Regarding the performance issue, this can be solved with the YARN distributed cache in the same way it works for the Spark assembly jar. If the file is placed on HDFS

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-93785353 @pwendell thought this would be preferable: One thing we could do that would be more isolated is have a function in SparkContext called estimateSizeOf(object: Any

[GitHub] spark pull request: SPARK-5888. [MLLIB]. Add OneHotEncoder as a Tr...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5500#issuecomment-93802687 @mengxr that makes sense to me, but did you mean to switch 1 and 2? I.e. in this PR we should implement `OneHotEncoder`, which takes an input column with category indices

[GitHub] spark pull request: [SPARK-6954] [YARN] Dynamic allocation: numExe...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5536#issuecomment-93801808 Awesome, sounds good Cheolsoo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-5888. [MLLIB]. Add OneHotEncoder as a Tr...

2015-04-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5500#issuecomment-93841917 Updated the patch conform to the approach described above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6869][PySpark] Pass PYTHONPATH to execu...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93490885 That alternative solution makes sense to me. If it's not going to be added to the classpath, it might make more sense to use a zip than a jar. --- If your project is set

[GitHub] spark pull request: SPARK-4783 [CORE] System.exit() calls in Spark...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5492#issuecomment-93490325 This seems to only get called from two places: * In standalone mode when the driver fails to communicate with the master, or the master sends a message to kill

[GitHub] spark pull request: SPARK-5112. Expose SizeEstimator as a develope...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3913#issuecomment-93487382 Again, one of the main uses is estimating the size of variables you're considering broadcasting. Another is experimenting with different representations - e.g. how much

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide a HadoopRDD varian...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5368#issuecomment-93492480 An RDD that handles read exceptions with `Try` does seem useful to me - I've seen a decent amount of boilerplate used to accomplish the same thing. Is there a reason

[GitHub] spark pull request: SPARK-4783 [CORE] System.exit() calls in Spark...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5492#issuecomment-93527717 Ok. LGTM. Could make sense for someone with Mesos expertise to look at it. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-93580862 @tgravescs the first will apply to the AM as well if there's not one for the AM specifically set: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-15 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5242#discussion_r28467525 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -231,6 +249,17 @@ private[yarn] class YarnAllocator

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-93257613 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-92959186 Updated patch still only supports executor node labels, but renames the property so that adding driver/am node labels in the future won't look weird. I also switched

[GitHub] spark pull request: [SPARK-6894]spark.executor.extraLibraryOptions...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5506#issuecomment-92944389 LGTM as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-92947103 @srowen @tgravescs sorry for the delay here. Have been working on a new patch that adds support or AM label expressions, but ran into some YARN API issues that makes

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5242#discussion_r28383831 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -123,6 +123,7 @@ private[spark] class Client( appContext.setQueue

[GitHub] spark pull request: [YARN] SPARK-6470. Add support for YARN node l...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5242#issuecomment-93119630 It turns out that setting the label expression when creating the `ApplicationSubmissionContext` sets it for the AM as well, which is undesirable. Updated patch reverts

[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-93040635 @andrewor14 this was a concern I had as well, and there's a long discussion above. The reason for not using `spark.executor.cores` is that the property introduced here

[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-93058391 My understanding based on the discussion here is that `spark.mesos.executor.cores` is the number of cores reserved by an executor *not* for use in running tasks. So

[GitHub] spark pull request: [SPARK-6869][PySpark] Pass PYTHONPATH to execu...

2015-04-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5478#issuecomment-93061403 IIUC, the motivation for this change is that the assembly jar distribution mechanism doesn't work for some Java versions. I agree with Andrew that, if at all

[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...

2015-04-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5449#discussion_r28278175 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -94,6 +98,14 @@ private[yarn] class YarnAllocator( // Additional

[GitHub] spark pull request: SPARK-5888. [MLLIB]. Add OneHotEncoder as a Tr...

2015-04-13 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/5500 SPARK-5888. [MLLIB]. Add OneHotEncoder as a Transformer This patch adds a one hot encoder for categorical features. Planning to add documentation and another test after getting feedback

[GitHub] spark pull request: [SPARK-6868][YARN] Fix broken container log li...

2015-04-13 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5477#issuecomment-92235067 This LGTM as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...

2015-04-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5449#discussion_r28218090 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -94,6 +98,14 @@ private[yarn] class YarnAllocator( // Additional

[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...

2015-04-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5449#discussion_r28217902 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -119,7 +131,27 @@ private[yarn] class YarnAllocator( def

[GitHub] spark pull request: [SPARK-6662][YARN] Allow variable substitution...

2015-04-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5321#issuecomment-91030087 Thanks @piaozhexiu. This seems reasonable to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-04-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4142#issuecomment-91031234 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-04-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4142#issuecomment-90232072 Is my understanding correct that the only change since the last version is that we bundle all the files together instead of uploading them individually? Can we

[GitHub] spark pull request: Replace use of .size with .length for Arrays

2015-04-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5376#issuecomment-90254901 This seems reasonable and mostly straightforward to me. What was your method for finding all the instances of this? --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-04-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4142#issuecomment-90243071 I don't think the whole method needs to be refactored right now, but there's a 50-line contiguous addition to it. Can that go in its own method? --- If your project

[GitHub] spark pull request: [SPARK-2669] [yarn] Distribute client configur...

2015-04-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/4142#issuecomment-90299451 I'm advocating for a sub-method `private def createHadoopConfArchive(): File` to house collecting and zipping the conf files. This stuff obviously makes sense

[GitHub] spark pull request: SPARK-6676 [BUILD] Clarify that profile hadoop...

2015-04-05 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5363#discussion_r27778826 --- Diff: docs/building-spark.md --- @@ -98,10 +98,10 @@ mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package # Apache Hadoop 2.3.X

[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

2015-04-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5343#issuecomment-89343218 I see. Do Hive/Pig/Sqoop have active clients or are they just monitoring progress in the same way the Spark yarn-cluster client is? --- If your project is set up

[GitHub] spark pull request: [SPARK-6692][YARN] Make it possible to kill AM...

2015-04-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5343#issuecomment-89337784 I'm somewhat dubious about whether this is needed. It takes a single command to kill a job with yarn application -kill. Also, the advantage of the latter

[GitHub] spark pull request: SPARK-6698: where RandomForest input specifies...

2015-04-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5351#issuecomment-89379598 Mind tagging this with MLLIB? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-04-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5297#issuecomment-89430618 LGTM as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6700] disable flaky test

2015-04-03 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5356#discussion_r27760819 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -143,7 +143,8 @@ class YarnClusterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-04-03 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5297#discussion_r27758229 --- Diff: docs/running-on-yarn.md --- @@ -196,6 +196,15 @@ Most of the configs are the same for Spark on YARN as for other deployment modes It should

[GitHub] spark pull request: [SPARK-6700] disable flaky test

2015-04-03 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5356#discussion_r27758595 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala --- @@ -143,7 +143,8 @@ class YarnClusterSuite extends FunSuite

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-04-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5297#discussion_r27694636 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -609,6 +593,26 @@ private[spark] class Client( throw new SparkException

[GitHub] spark pull request: [MLLIB] Remove println in LogisticRegression.s...

2015-04-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5338#issuecomment-89097896 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-04-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5297#discussion_r27694617 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -609,6 +593,26 @@ private[spark] class Client( throw new SparkException

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5297#discussion_r27585630 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -66,6 +66,8 @@ private[spark] class Client( private val

[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5292#discussion_r27589374 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -125,6 +125,7 @@ private[spark] class

[GitHub] spark pull request: [SPARK-6650] [core] Stop ExecutorAllocationMan...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5311#discussion_r27590421 --- Diff: core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala --- @@ -28,10 +28,20 @@ import org.apache.spark.util.ManualClock

[GitHub] spark pull request: [SPARK-6650] [core] Stop ExecutorAllocationMan...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5311#discussion_r27591507 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -173,32 +179,24 @@ private[spark] class ExecutorAllocationManager

[GitHub] spark pull request: [SPARK-6650] [core] Stop ExecutorAllocationMan...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5311#discussion_r27611599 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -17,10 +17,12 @@ package org.apache.spark +import

[GitHub] spark pull request: [SPARK-6650] [core] Stop ExecutorAllocationMan...

2015-04-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5311#issuecomment-88631842 One last nit. Otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-6640][Core] Fix the race condition of c...

2015-04-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5306#discussion_r27619143 --- Diff: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala --- @@ -71,12 +75,22 @@ private[spark] class HeartbeatReceiver(sc: SparkContext

[GitHub] spark pull request: [SPARK-3596][YARN]Support changing the yarn cl...

2015-03-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5292#discussion_r27539943 --- Diff: yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -125,6 +125,7 @@ private[spark] class

[GitHub] spark pull request: [SPARK-6614] OutputCommitCoordinator should cl...

2015-03-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5276#discussion_r27539637 --- Diff: core/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala --- @@ -113,9 +113,11 @@ private[spark] class OutputCommitCoordinator

[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-03-31 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5294#issuecomment-88165688 @srowen I assume this would be for running assemblies that already include the Hadoop classes. @Sephiroth-Lin do you mind going into detail about the situations you need

[GitHub] spark pull request: [SPARK-3591][YARN]fire and forget for YARN clu...

2015-03-31 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5297#issuecomment-88168832 If somebody has an existing script that runs spark-submit and then follows it with an action that expects the app to have completed, this would break it, right? I

[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-03-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5294#discussion_r27530621 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -809,7 +809,13 @@ object Client extends Logging

[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-03-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5294#discussion_r27530532 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -809,7 +809,13 @@ object Client extends Logging

[GitHub] spark pull request: [SPARK-1502][YARN]Add config option to not inc...

2015-03-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/5294#discussion_r27530361 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -809,7 +809,13 @@ object Client extends Logging

[GitHub] spark pull request: Master

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5258#issuecomment-87581081 Did you mean to submit this as a pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6592] fix filter for scaladoc to genera...

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5252#issuecomment-87722975 Mind tagging this with SQL so it gets sorted properly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6606][CORE]Accumulator deserialized twi...

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5259#issuecomment-87724903 Is this the same as the issue in SPARK-5360? @kayousterhout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Remove println in favor of log

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5272#issuecomment-8269 Mind tagging this with [MLLIB]? Also, it seems weird to have any logging here - the equivalent in LinearRegression doesn't. I would just take the println out

[GitHub] spark pull request: [SPARK-6350][Mesos] Make mesosExecutorCores co...

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5063#issuecomment-87791403 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87793634 @rxin can you explain that a little more? I was envisioning an `InputFormat` that wrapped other `InputFormat`s and ignored exceptions, not something that would require

[GitHub] spark pull request: [CORE] [SPARK-6593] Provide option for HadoopR...

2015-03-30 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/5250#issuecomment-87791639 My opinion is that this is the kind of thing that should go into a custom `InputFormat`. --- If your project is set up for it, you can reply to this email and have your

<    1   2   3   4   5   6   7   8   9   10   >