[GitHub] spark pull request: [SPARK-3477] Clean up code in Yarn Client / Cl...

2014-09-11 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2350#issuecomment-55308825 Hey all - regarding backwards compatibility. I agree we definitely need to preserve all of the publicly documented interfaces, including environment variables etc

[GitHub] spark pull request: [SPARK-2778] [yarn] Add yarn integration tests...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2257#issuecomment-55464465 It's great to see us adding tests here. @vanzin how long do these tests take, roughly? We might have to only run these in certain situations if they take a long time

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55464541 Can you submit a patch with a unicode character to show this not working? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3452] Maven build should skip publishin...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2329#issuecomment-55464629 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3452] Maven build should skip publishin...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2329#issuecomment-55464648 LGTM as well pending tests. Thanks @ScrapCodes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-12 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17504535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -205,6 +208,54 @@ class SQLContext(@transient val sparkContext: SparkContext

[GitHub] spark pull request: [SPARK-3217] Add Guava to classpath when SPARK...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2141#issuecomment-55465673 Thanks, this looks good. It's really too bad we keep having to add complexity to the build to support this. For instance, I'm not sure whether it's safe in all cases

[GitHub] spark pull request: [SPARK-3437][BUILD] Support crossbuilding in m...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2357#issuecomment-55465992 Hey @ScrapCodes rather than forking the existing install plug-in, could we write our own plug-in that runs before the install plug-in? Forking seems like

[GitHub] spark pull request: SPARK-3039: Allow spark to be built using avro...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1945#issuecomment-55466291 Yeah - LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3007][SQL]Add Dynamic Partition suppo...

2014-09-12 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2226#issuecomment-55467338 This will be an awesome feature if it goes in! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-12 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2339#discussion_r17510336 --- Diff: ec2/spark_ec2.py --- @@ -608,14 +597,53 @@ def setup_spark_cluster(master, opts): print Ganglia started at http://%s:5080/ganglia

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-12 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2339#discussion_r17510349 --- Diff: ec2/spark_ec2.py --- @@ -608,14 +597,53 @@ def setup_spark_cluster(master, opts): print Ganglia started at http://%s:5080/ganglia

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-12 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2339#discussion_r17510355 --- Diff: ec2/spark_ec2.py --- @@ -608,14 +597,53 @@ def setup_spark_cluster(master, opts): print Ganglia started at http://%s:5080/ganglia

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-12 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2339#discussion_r17510361 --- Diff: ec2/spark_ec2.py --- @@ -61,8 +62,8 @@ def parse_args(): -s, --slaves, type=int, default=1, help=Number of slaves

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-13 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2339#issuecomment-55512497 @JoshRosen yeah we should try to fix that also, pretty sure that's the mystery reason why booting is taking so long. But in any case, do you agree this is a nicer way

[GitHub] spark pull request: Added support for accessing secured HDFS

2014-09-13 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2320#issuecomment-55515763 @vanzin there is currently a path where the `addFile` HTTP server is authenticated via a shared secret and this under the hood uses Diffie-Helman. This is used in YARN

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-09-14 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-0076 In that case I'd propose merging this tentatively and if it causes issues in the 1.2 dev/QA cycle we can revert it. I dug around a bunch, it looks like

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-0386 Jenkins, test this please. @cmccabe - do you mean `PROCESS_LOCAL`? I'm pretty sure we want to have them be higher priority than `NODE_LOCAL`, which

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525749 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -248,10 +250,22 @@ class HadoopRDD[K, V]( new

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525822 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525861 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525907 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,8 +208,10 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525928 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525938 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17525946 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-14 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-1288 Added some more comments, mostly cosmetic. Right now the tests are failing because this makes an API breaking change. --- If your project is set up for it, you can

[GitHub] spark pull request: Add a Community Projects page

2014-09-14 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2219#issuecomment-1813 @velvia so is this subsumed by the wiki page then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Add unit test to spark_ec2 script

2014-09-14 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/134#issuecomment-2464 Let's close this issue for now and hopefully someone can take it up and bring it up to date. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2301#issuecomment-55613432 Jenkins, test this please. LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3425] do not set MaxPermSize for OpenJD...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2301#issuecomment-55613491 Note: need to add Closes #2387 to the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: Added support for accessing secured HDFS

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2320#issuecomment-55612902 Okay - then let's aim to provide this level of security with standalone mode, with some clear documentation about what it provides. --- If your project is set up

[GitHub] spark pull request: [SPARK-787] Add S3 configuration parameters to...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1120#issuecomment-55636453 Hey @danosipov - after speaking with @JoshRosen a bit offline we felt that credential copying was too big of a change to introduce silently. Could you add a `--copy-aws

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17560907 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17561092 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55638524 Added a few more comments after thinking about this some more. As it stands the current factoring opens up a bunch of things at `private[spark]` visibility. We always

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55639865 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Added support for accessing secured HDFS

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2320#issuecomment-55641646 I was responding to your comment about needing Diffie Hellman. We already use an authentication approach based on HD. Encryption is an orthogonal issue

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17564257 --- Diff: docs/building-spark.md --- @@ -159,4 +160,13 @@ then ship it over to the cluster. We are investigating the exact cause

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55670955 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55670980 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2339#issuecomment-55671190 Ah cool - if we could re-write the availability check to just use boto, that might be cleaner. In the past I've found I could SSH into the cluster once the status

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55693164 @srowen was your comment about something in the current patch or proposed in one of the comments? Does this LGTY? --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-2895: Add mapPartitionsWithContext relat...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2194#issuecomment-55695570 I proposed a slightly different approach to this here: https://issues.apache.org/jira/browse/SPARK-3543 This would remove the need for special methods

[GitHub] spark pull request: [SPARK-2182] Scalastyle rule blocking non asci...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2358#issuecomment-55695611 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3040] pick up a more proper local ip ad...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1946#issuecomment-55695674 Alright - let's see how this goes and we can rollback if we see any issues. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2014#discussion_r17583504 --- Diff: docs/_config.yml --- @@ -1,5 +1,7 @@ -pygments: true +highlighter: pygments markdown: kramdown +gems: + - jekyll-redirect

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-55695994 @srowen are you planning to add more to this or is it GTG from your perspective? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3005] Fix spark driver hang in mesos fi...

2014-09-15 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1940#issuecomment-55696321 @kayousterhout thanks for the thorough analysis. Do you have any thoughts on just defining killTasks to be best effort? I think that would generally simplify the code

[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-55768975 Okay I can merge this. One thing though, we've typically had less-than-smooth experiences with jekyll and its dependencies. So if this feature causes issues for users

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55769881 Jenkins LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55769901 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612851 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,35 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612896 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17612925 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -181,8 +181,24 @@ private[spark] class TaskSetManager

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-55771271 On the visibility stuff, understood. I actually forgot the old API is still supported in newer versions of Hadoop. Otherwise, you could put this all in the new hadoop

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17614113 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -23,12 +23,33 @@ package org.apache.spark.scheduler * of preference

[GitHub] spark pull request: [SPARK-787] Add S3 configuration parameters to...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1120#issuecomment-55774088 @danosipov Yeah, do you want to just add the docs in this patch? Our docs are versioned in the spark repo under `docs/`. I'd just add one sentence that says you can

[GitHub] spark pull request: [SPARK-3398] [EC2] Have spark-ec2 intelligentl...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2339#issuecomment-55777111 Yeah that sounds like a decent idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-1477]: Add the lifecycle interface

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/991#issuecomment-55783768 I also don't think we should add extra interfaces that we aren't going to use. In Spark we never interact with these components in a generic way, so I don't see any

[GitHub] spark pull request: [SPARK-787] Add S3 configuration parameters to...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1120#issuecomment-55809887 Okay thanks, I'm merging this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/151#issuecomment-55810602 In that case let's close this issue. If there are a few users who are really dying to support this they can apply this patch manually. --- If your project is set up

[GitHub] spark pull request: Add a Community Projects page

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2219#issuecomment-55810790 Okay thanks we can pull this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-3548] [WebUI] Display cache hit ratio o...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2411#issuecomment-55820856 Hey so I think there are a few issues with this. Given the semantics of persisting RDD's I don't think it's really possible to express a hit ratio that makes sense

[GitHub] spark pull request: [SPARK-3535][Mesos] Add 15% task memory overhe...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2401#issuecomment-55821489 Hey will this have compatbility issues for existing deployments? I know many clusters where they just have Spark request the entire amount of memory on the node

[GitHub] spark pull request: SPARK-2932 [STREAMING] Move MasterFailureTest ...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2399#issuecomment-55821887 I'm guessing this is in the main package because it is an integration test rather than a unit test. I.e. it has a main method and is run using the full Spark assembly

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55835334 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3561 - Pluggable strategy to facilitate ...

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2422#issuecomment-55848794 I commented on the JIRA - but this would benefit from a more complete design proposal on JIRA. Traditionally the SparkContext API has not been a point of pluggabilty

[GitHub] spark pull request: [Minor]ignore all config files in conf

2014-09-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2395#issuecomment-55848847 Merged, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Docs] minor grammar fix

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2430#issuecomment-55947370 Thanks looks good - I'll pull this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...

2014-09-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17688016 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-3543] Write TaskContext in Java and exp...

2014-09-17 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2425#discussion_r17688175 --- Diff: core/src/main/java/org/apache/spark/TaskContext.java --- @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55964616 Is the idea here for this to be called by spark users or by spark internal components? If the former, this won't be visible because it's in a private object. It could

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55964698 One case where this could be useful is in the Spark shell: ``` scala sc.setLoggingLevel(WARN) ``` --- If your project is set up for it, you can reply

[GitHub] spark pull request: add a util method for changing the log level w...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2433#issuecomment-55966808 yeah, that works --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3579 Jekyll doc generation is different ...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2443#issuecomment-55995060 @srowen may want to review this since he recently modified this documentation. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-3579 Jekyll doc generation is different ...

2014-09-17 Thread pwendell
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/2443 SPARK-3579 Jekyll doc generation is different across environments. This patch makes some small changes to fix this problem: 1. We document specific versions of Jekyll/Kramdown to use that match

[GitHub] spark pull request: [SPARK-3565]Fix configuration item not consist...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2427#issuecomment-55995118 Yeah good catch, thanks. I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-3547]Using a special exit code instead ...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2421#issuecomment-55995237 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-55995502 Hey one potentially controversial idea - what about if we just remove this metric? It's not very useful because it's an absolute time and AFAIK we don't record any

[GitHub] spark pull request: SPARK-1793 - Heavily duplicated test setup cod...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/726#issuecomment-55996007 @ajtulloch any interest in updating? If not, we should close this issue and can re-open later. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-2815]: Compilation failed upon the hado...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1754#issuecomment-55996182 I don't mind putting this one in (it's simple enough and might lower the bar for anyone trying to go this route). But the regex needs to be fixed, otherwise

[GitHub] spark pull request: [SPARK-3534] Add hive-thriftserver to SQL test...

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2442#issuecomment-55996902 Yeah good call, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [Minor] rat exclude dependency-reduced-pom.xml

2014-09-17 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2326#issuecomment-55997867 Okay thanks, merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2447#discussion_r17740875 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,6 +208,23 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2447#issuecomment-56071597 LGTM pending a minor comment and tests. Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: SPARK-3579 Jekyll doc generation is different ...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2443#issuecomment-56073816 This is a very straightforward change, so I'm going to merge this. However, @srowen feel free to propose tweaks to this in a follow on PR if you think there are any

[GitHub] spark pull request: [SPARK-1477]: Add the lifecycle interface

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/991#issuecomment-56074326 Let's close this issue for now given the comments from me and @rxin. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3584] sbin/slaves doesn't work when we ...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2444#discussion_r17743713 --- Diff: .gitignore --- @@ -19,6 +19,7 @@ conf/*.sh conf/*.properties conf/*.conf conf/*.xml +conf/slaves --- End diff

[GitHub] spark pull request: SPARK-3580: New public method for RDD's to hav...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2447#discussion_r17745104 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -208,6 +208,23 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-3005] Fix spark driver hang in mesos fi...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1940#issuecomment-56084970 Yeah I think we should just change it to say that the kill request has been acknowledge, but since killing as asynchronous and best-effort, it may not have stopped

[GitHub] spark pull request: [SPARK-3566] [BUILD] .gitignore and .rat-exclu...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2426#issuecomment-56086638 This seems fine to me. For very common file editors seems okay to support them. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-3589][Minor]remove redundant code

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2445#discussion_r17747811 --- Diff: bin/spark-class --- @@ -169,7 +169,6 @@ if [ -n $SPARK_SUBMIT_BOOTSTRAP_DRIVER ]; then # This is used only if the properties file actually

[GitHub] spark pull request: [SPARK-3589][Minor]remove redundant code

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2445#issuecomment-56087106 Thanks - I merged this but I removed one of the changes which was incorrect. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768467 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -309,4 +323,42 @@ private[spark] object HadoopRDD { f(inputSplit

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768479 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1486#discussion_r17768506 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskLocation.scala --- @@ -22,13 +22,35 @@ package org.apache.spark.scheduler * In the latter

[GitHub] spark pull request: SPARK-1767: Prefer HDFS-cached replicas when s...

2014-09-18 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1486#issuecomment-5619 Yes, this appears to be an issue with our checker and adding an exclusion is fine for now. The class is private. Just had really minor comments and I can

[GitHub] spark pull request: SPARK-3574. Shuffle finish time always reporte...

2014-09-19 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2440#issuecomment-56206573 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17796741 --- Diff: core/src/main/scala/org/apache/spark/FutureAction.scala --- @@ -83,6 +83,15 @@ trait FutureAction[T] extends Future[T] { */ @throws

[GitHub] spark pull request: [SPARK-3446] Expose underlying job ids in Futu...

2014-09-19 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/2337#discussion_r17796804 --- Diff: core/src/test/scala/org/apache/spark/FutureActionSuite.scala --- @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF

  1   2   3   4   5   6   7   8   9   10   >