[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56132627 Removing this sounds good to me too. Will upload a patch. I think a measure of how long a task spends in shuffle would be useful though, as it helps users understand

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

2014-09-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55858105 Could this change behavior in cases where the spark.yarn.dist.files is configured with no scheme? Without this change, it would interpret no scheme to mean that it's

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-17 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2440 SPARK-3574. Shuffle finish time always reported as -1 The included test waits 100 ms after job completion for task completion events to come in so it can verify they have reasonable finish times

[GitHub] spark pull request: [SPARK-3319] [SPARK-3338] Resolve Spark submit...

2014-09-17 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2232#issuecomment-55985896 Hmm. My feeling is that it's better to be consistent here and consider the old behavior a bug than to maintain compatibility than to support a cornerish case

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-55831832 Had noticed that. Haven't had time to fix these but will get to them soon. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55309683 +1 for making Client private. This should go through SparkSubmit, and, as Patrick mentioned, I'd be surprised if we haven't broken any code that's relying on that already

[GitHub] spark pull request: [SPARK-3465] fix task metrics aggregation in l...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2338#issuecomment-55334340 I don't have any great ideas for how to write a test for it, but this looks good to me as well. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-55355339 Updated patch includes fallback to the split size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-10 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-55182360 Thanks @davies for catching those. Did another pass to make sure I didn't miss any others. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: SPARK-1713. Use a thread pool for launching ex...

2014-09-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/663#issuecomment-55066027 Upmerged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-3465] fix task metrics aggregation in l...

2014-09-09 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2338#issuecomment-55066419 Hi @davies , sorry for causing this bug and thanks for picking it up. To avoid making the deep copy unnecessarily when running in non-local mode, we could instead make

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-08 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17222926 --- Diff: python/pyspark/tests.py --- @@ -405,22 +404,6 @@ def test_zip_with_different_number_of_items(self): self.assertEquals(a.count(), b.count

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54780033 It looks like all the core tests are passing, but there are some failures in streaming and SQL tests. Have those been showing up elsewhere? --- If your project is set up

[GitHub] spark pull request: SPARK-1714. Take advantage of AMRMClient APIs ...

2014-09-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/655#issuecomment-54780393 Unfortunately the cleanup refactored a bunch of common code between yarn-alpha and yarn-stable that no longer would have been common after this patch (because, after 2.2

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-09-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2087#issuecomment-54852759 Just to make sure it's clear, the issue isn't only that we can be a few bytes off when we're reading outside of split boundaries, but that it'll look like we read the full

[GitHub] spark pull request: SPARK-3422. JavaAPISuite.getHadoopInputSplits ...

2014-09-08 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2324 SPARK-3422. JavaAPISuite.getHadoopInputSplits isn't used anywhere. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-3422

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2274#issuecomment-54677627 Updated patch adds Python back in and adds the 's' at the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-05 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17198796 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +322,44 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-05 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/2274#discussion_r17201280 --- Diff: python/pyspark/rdd.py --- @@ -515,6 +515,30 @@ def __add__(self, other): raise TypeError return self.union(other

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-04 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2274 SPARK-2978. Transformation with MR shuffle semantics I didn't add this to the transformations list in the docs because it's kind of obscure, but would be happy to do so if others think it would

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-04 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1388#discussion_r17101588 --- Diff: python/pyspark/mllib/regression.py --- @@ -66,6 +66,9 @@ def weights(self): def intercept(self): return self._intercept

[GitHub] spark pull request: [SPARK-2140] Updating heap memory calculation ...

2014-09-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2253#issuecomment-54516530 I think it's preferable to give the user the size they actually request. This avoids them requesting the same size later under different conditions and unexpectedly

[GitHub] spark pull request: SPARK-2978. Transformation with MR shuffle sem...

2014-09-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/2274#issuecomment-54555319 Updated patch removes Python version, adds Java version, and adds some additional doc. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [WIP] SPARK-2450: Add YARN executor log links ...

2014-09-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1375#issuecomment-54383155 I think this would be extremely useful. Getting executor logs with Spark on YARN currently requires clicking like 6 links on the ResourceManager page. --- If your

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

2014-09-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1956#issuecomment-54114366 Here's the exception: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:703

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-54115928 Updated the patch to match existing conventions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-54174652 I believe the failure is unrelated. I noticed it on SPARK-3052 as well. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

2014-09-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1956#issuecomment-54174772 I believe the failure is unrelated. I noticed it on SPARK-2461 as well. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1934#discussion_r17005626 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1934#discussion_r17005920 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1934#discussion_r17006495 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -270,11 +270,9 @@ private[spark] class ApplicationMaster(args

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1388#discussion_r17006876 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/regression/GeneralizedLinearAlgorithm.scala --- @@ -74,6 +74,8 @@ abstract class GeneralizedLinearModel

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020310 --- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020360 --- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020639 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1296,7 +1298,25 @@ class DAGScheduler( // If the RDD has some

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020666 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1296,7 +1298,25 @@ class DAGScheduler( // If the RDD has some

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020754 --- Diff: core/src/main/scala/org/apache/spark/rdd/PartitionLocation.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-02 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17020803 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +322,44 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-08-31 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1388#issuecomment-53994494 Sorry for the delay. Updated patch adds this for Python as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-2461. Add a toString method to Generaliz...

2014-08-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1388#discussion_r16936730 --- Diff: python/pyspark/mllib/regression.py --- @@ -66,6 +66,9 @@ def weights(self): def intercept(self): return self._intercept

[GitHub] spark pull request: Spark-2447 : Spark on HBase

2014-08-22 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1608#discussion_r16606710 --- Diff: external/hbase/pom.xml --- @@ -0,0 +1,140 @@ +?xml version=1.0 encoding=UTF-8? +!-- + ~ Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request: SPARK-2621. Update task InputMetrics increment...

2014-08-21 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/2087 SPARK-2621. Update task InputMetrics incrementally The patch takes advantage an API provided in Hadoop 2.5 that allows getting accurate data on Hadoop FileSystem bytes read. It eliminates the old

[GitHub] spark pull request: [SPARK-2849] Handle driver configs separately ...

2014-08-19 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1845#issuecomment-52713440 When we added spark-submit originally, we went with the current approach (Bash-Scala) because @mateiz had concerns about the overhead of starting two JVMs. --- If your

[GitHub] spark pull request: SPARK-3082. yarn.Client.logClusterResourceDeta...

2014-08-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1984#discussion_r16372445 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf

[GitHub] spark pull request: SPARK-3082. yarn.Client.logClusterResourceDeta...

2014-08-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1984#discussion_r16378515 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf

[GitHub] spark pull request: SPARK-3082. yarn.Client.logClusterResourceDeta...

2014-08-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1984#discussion_r16380717 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -103,13 +103,17 @@ class Client(clientArgs: ClientArguments, hadoopConf

[GitHub] spark pull request: SPARK-3082. yarn.Client.logClusterResourceDeta...

2014-08-18 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1984#issuecomment-52552878 Posted a patch that removes the queue resources log message entirely --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-2165] spark on yarn: add support for se...

2014-08-18 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1279#discussion_r16387993 --- Diff: docs/running-on-yarn.md --- @@ -125,6 +125,14 @@ Most of the configs are the same for Spark on YARN as for other deployment modes

[GitHub] spark pull request: SPARK-3082. yarn.Client.logClusterResourceDeta...

2014-08-16 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1984 SPARK-3082. yarn.Client.logClusterResourceDetails throws NPE if requeste... ...d queue doesn't exist You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-3074] [PySpark] support groupByKey() wi...

2014-08-15 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1977#issuecomment-52381086 Does / will the same functionality exist in Scala/Java? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-08-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1934#discussion_r16271945 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -213,28 +213,22 @@ class ApplicationMaster(args

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-08-14 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1934#discussion_r16272089 --- Diff: yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala --- @@ -613,29 +593,6 @@ object YarnAllocationHandler

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-08-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1934#issuecomment-52256543 Updated patch fixes an issue that @andrewor14 pointed out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

2014-08-14 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1956 SPARK-3052. Misleading and spurious FileSystem closed errors whenever a ... ...job fails while reading from Hadoop You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

2014-08-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1956#issuecomment-52262225 This occurs when an executor process shuts down while tasks are executing (e.g. because the driver disassociated or an OOME). Hadoop FileSystems register

[GitHub] spark pull request: SPARK-3052. Misleading and spurious FileSystem...

2014-08-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1956#issuecomment-52267702 Ah and the order they should be shut down in is RecordReader then FileSystem? Right --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: SPARK-3028. sparkEventToJson should support Sp...

2014-08-14 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1961 SPARK-3028. sparkEventToJson should support SparkListenerExecutorMetrics... ...Update You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark

[GitHub] spark pull request: SPARK-3028. sparkEventToJson should support Sp...

2014-08-14 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1961#issuecomment-52274079 Ooops, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r16204968 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -243,10 +244,23 @@ class HadoopRDD[K, V]( new HadoopMapPartitionsWithSplitRDD

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r16205012 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -304,4 +318,48 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r16205064 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -304,4 +318,48 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r16205236 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -304,4 +318,48 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-08-13 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r16205425 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -304,4 +318,48 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-3014. Log a more informative messages in...

2014-08-13 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1934 SPARK-3014. Log a more informative messages in a couple failure scenario... ...s You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy

[GitHub] spark pull request: [SPARK-2878]: Fix custom spark.kryo.registrato...

2014-08-11 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1890#issuecomment-51805750 FWIW I think this is already what happens in YARN, as we use Hadoop's distributed cache to send out the jars and include them on the executor classpath at startup

[GitHub] spark pull request: [SPARK-2913] Place our log4j.properties at the...

2014-08-08 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1844#issuecomment-51653564 On the executor side, framework jars come first unless spark.files.userClassPathFirst is set to true. At least for Spark on YARN, executors are not launched with spark

[GitHub] spark pull request: [SPARK-2894] spark-shell doesn't accept flags

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1825#issuecomment-51435337 This will allow spark-shell to take spark-submit options, but will remove its ability to take spark-shell-specific options (currently there's only one, file). I'm unclear

[GitHub] spark pull request: [SPARK-2894] spark-shell doesn't accept flags

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1825#issuecomment-51436115 org.apache.spark.repl.SparkRunnerSettings --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1826 SPARK-2900. aggregate inputBytes per stage You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2900 Alternatively you can

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1826#issuecomment-51440037 The failure appears to be unrelated (something with connections and Kafka). --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-2900. aggregate inputBytes per stage

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1826#issuecomment-51440066 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2904] Remove non-used local variable in...

2014-08-07 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1834#issuecomment-51543219 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-06 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51390403 thanks Patrick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-06 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15900998 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockFetcherIterator.scala --- @@ -191,7 +184,7 @@ object BlockFetcherIterator

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-06 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15906274 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -73,11 +75,16 @@ class TaskMetrics extends Serializable { var

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-05 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51168899 Looking into it. I ran the test that it was hanging on and things completed fine. I also combed the code and didn't see anywhere where this patch had changed how things

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-04 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51025499 Updated patch addresses @pwendell and @kayousterhout 's comments and adds tests. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-03 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-51001765 Updated patch keeps it as ShuffleWriteMetrics for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-50976412 Just tested this and observed the shuffle bytes read going up for in-progress tasks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2566. Update ShuffleWriteMetrics increme...

2014-08-02 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1481#issuecomment-50977756 I hadn't noticed this before, but DiskObjectWriter is used for tracking bytes spilled by ExternalSorter and ExternalAppendOnlyMap in addition to shuffle bytes written. So

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-08-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50853874 Thanks @pwendell and @andrewor14 for your continued reviews. 10 seconds sounds fine to me. Not that it's a shining beacon of performance, but MapReduce actually

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-08-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15684813 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -991,6 +994,9 @@ class SparkContext(config: SparkConf) extends Logging

[GitHub] spark pull request: SPARK-2641: Fixing how spark arguments are loa...

2014-08-01 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1657#issuecomment-50922542 This makes sense to me. However, we should also document it and mention that it only currently works for YARN. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-08-01 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1507#discussion_r15716773 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -73,11 +75,16 @@ class TaskMetrics extends Serializable { var

[GitHub] spark pull request: SPARK-2664. Deal with `--conf` options in spar...

2014-07-31 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1665#discussion_r15631418 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -184,7 +184,7 @@ object SparkSubmit { OptionAssigner(args.archives

[GitHub] spark pull request: [SPARK-2678][Core] Prevents `spark-submit` fro...

2014-07-31 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1699#issuecomment-50837155 I'm worried that treating unknown args as app args would make typos difficult to debug. spark-submit --executor-croes 10 should print out an error

[GitHub] spark pull request: SPARK-2664. Deal with `--conf` options in spar...

2014-07-30 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1665 SPARK-2664. Deal with `--conf` options in spark-submit that relate to fl... ...ags You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15526822 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15526958 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -155,6 +156,23 @@ class DAGScheduler( eventProcessActor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15527486 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -56,7 +56,7 @@ private[jobs] object UIData { } case class

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15528611 --- Diff: docs/configuration.md --- @@ -524,6 +524,13 @@ Apart from these, the following properties are also available, and may be useful output

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15559871 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +353,48 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-29 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50556357 Latest patch incorporates latest feedback and adds BlockManagerSuite back in. I tested on a small cluster and saw executors shut down fine (but haven't run at scale

[GitHub] spark pull request: SPARK-2738. Remove redundant imports in BlockM...

2014-07-29 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1642 SPARK-2738. Remove redundant imports in BlockManagerSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2738

[GitHub] spark pull request: SPARK-1813. Add a utility to SparkConf that ma...

2014-07-28 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/789#issuecomment-50431290 @mateiz I posted a couple ideas and was waiting on feedback. Any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-07-27 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-50282111 I think reflection is definitely the right way to go here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-25 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-50173304 As far as I can tell, you're right - I don't see why updateShuffleMetrics needs to be synchronized. Uploading a patch that: * Adds comments to TaskMetrics

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49970751 I don't entirely understand the advantage of having a separate PartialTaskMetrics. Ultimately every field of TaskMetrics except for maybe shuffleFinishTime will be able

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-24 Thread sryza
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/1056#discussion_r15333186 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -348,4 +352,46 @@ private[spark] class Executor

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-22 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-49778015 Exactly. The idea is to call mergeShuffleReadMetrics when we're about to send the metrics update. --- If your project is set up for it, you can reply to this email

<    6   7   8   9   10   11   12   13   >