[GitHub] spark pull request: New FsPermission instance w/o FsPermission.cre...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2892#issuecomment-60394574 Aw, do we really still have to support Hadoop 1.0.x ... Looks reasonable since this object is never mutated or at risk of being changed accidentally. Make a JIRA

[GitHub] spark pull request: Sbt and Maven builds pass on Linux boxes with ...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2880#issuecomment-60394639 Since this duplicates your other PR can you close this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-60395100 Why does this change affect the output, and what is the change? `grep -e` already prints the whole line that matched. Why change the scalastyle settings too? This has

[GitHub] spark pull request: [SPARK-3629][Doc] improve spark on yarn doc

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2813#issuecomment-60395462 It's not possible to see what text you changed since the text was also moved. I am not sure it helps to move the text. Is it possible to see your changes in place? You

[GitHub] spark pull request: SPARK-3359 [DOCS] sbt/sbt unidoc doesn't work ...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2909#issuecomment-60406786 @mengxr This is all I think is fixable without starting to take away from the valid scaladoc. For example we'd have to remove `@group` tags and remove several links

[GitHub] spark pull request: [SPARK-4078] New FsPermission instance w/o FsP...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2892#issuecomment-60416557 @GraceH I think for now it will be best to continue to support Hadoop 1.0.4. What is the new API you refer to and does it work on 1.0.4? I thought you are saying

[GitHub] spark pull request: [SPARK-3787] Assembly jar name is wrong when w...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2647#issuecomment-60438442 I'd hate to have to hard-code more stuff like this. How about just documenting that `hadoop.version` is required, or failing in this situation? SBT isn't the build

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19363636 --- Diff: LICENSE --- @@ -1,4 +1,3 @@ - --- End diff -- Oops, didn't mean to change that. I don't know why that happened and I'll undo

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19364103 --- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala --- @@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19364277 --- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala --- @@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag

[GitHub] spark pull request: [Spark-4060] [MLlib] exposing special rdd func...

2014-10-24 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2907#issuecomment-60454077 Cool, you would know best if it's ready for external use. Looks good on unit tests. What if it returned `RDD[Array[T]]`? I experimented briefly with making

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] [WIP] Replace colt d...

2014-10-24 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2928#discussion_r19368910 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double

[GitHub] spark pull request: SPARK-4022 [CORE] [MLLIB] Replace colt depende...

2014-10-26 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2928#issuecomment-60529854 @mengxr Oops I missed that this failed when I saw the SQL tests failed. Should be OK. I rebased too for good measure. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

2014-10-27 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2956#discussion_r19391269 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1204,6 +1204,8 @@ abstract class RDD[T: ClassTag]( } else

[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60559230 It seems much more desirable to just support 3g or 200m in this argument, as was intended. `Utils.memoryStringToMb` can do the conversion. --- If your project is set up

[GitHub] spark pull request: [SPARK-3997][Build]scalastyle should output th...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2846#issuecomment-60559452 The `awk` command here is still a little wrong, as it matches any string `error`. `grep \[error\]` definitely works on `scalastyle.txt` to match `[error]`. There's

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60563934 You can also just take a local reference to the thread, and operate on it. The local reference will of course be null in both cases or not-null in both cases

[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60564242 Hm, how do you mean? the rest of the code already expects this to be an `Int` and a number of megabytes. Nothing else can be further parsing this (or else that's a bug

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60574364 @zsxwing You're trying to handle the case where `thread` changes between checking `thread != null` and calling `thread.interrupt()` right? Copying `thread` to a local

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60578902 How would running another task X change the value of `t` in T1's stack? I understand it could modify `thread`. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60579828 Ah right. But wasn't this already a potential problem and still after this change? the caller's call to `cancel()` may be just about to happen, when the old task finishes

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60581156 But the call to interrupt the old task interrupt happens after the new task begins on the thread. The prior interrupt state doesn't matter; the thread is interrupted

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60583305 Yes I get that this is not just about preventing an NPE, which is what I thought the intent was originally. I think this does not go far enough to prevent the problem

[GitHub] spark pull request: [SPARK-4097] Fix the race condition of 'thread...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2957#issuecomment-60584591 Yeah I think that would be enough to guarantee it, and it makes sense. Either interrupt happens before the task nulls the reference, in which case the task is not done

[GitHub] spark pull request: [SPARK-4096][YARN]Update executor memory descr...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60590304 @WangTaoTheTonic heh yes that's exactly what I meant. +1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-4096][YARN]let ApplicationMaster accept...

2014-10-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2955#issuecomment-60637972 It should not change overall behavior. It at least makes this --executor-memory flag act like all the others, even if it is pretty internal, as the comments suggest

[GitHub] spark pull request: add regression metrics

2014-10-28 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19465130 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: add regression metrics

2014-10-28 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2978#issuecomment-60743817 Update the title with SPARK- [MLLIB] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: add regression metrics

2014-10-28 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19465228 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2978#issuecomment-60752144 This is picky now, but you might write out meanAverageError instead of saying mae. Is r2_score style-wise correct vs r2Score? (Sorry should have thought of that.) Finally

[GitHub] spark pull request: SPARK-4111 [MLlib] add regression metrics

2014-10-28 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2978#discussion_r19468611 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala --- @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4109][CORE] Correctly deserialize Task....

2014-10-28 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2971#discussion_r19491335 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -89,13 +89,13 @@ private[spark] object ResultTask

[GitHub] spark pull request: [SPARK-4110] Wrong comments about default sett...

2014-10-28 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2972#issuecomment-60803920 +1 from me. The log directory change is correct, and while `SPARK_PREFIX` looks like it matches `SPARK_HOME`, looks more common to use the latter in Spark. --- If your

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2994#discussion_r19523818 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ProducerCache.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Delete jetty 6.1.26 from spark package

2014-10-29 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2989#issuecomment-60907459 Other libraries may be using old Jetty. It is not clearly safe to do this. Can you identify first what is bringing in Jetty 6? --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-10-29 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2994#discussion_r19577422 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ProducerCache.scala --- @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: Merge pull request #1 from apache/master

2014-10-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3016#issuecomment-61107082 I think you opened this accidentally. Can you close it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-10-30 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2964#discussion_r19636020 --- Diff: docs/configuration.md --- @@ -21,16 +21,19 @@ application. These properties can be set directly on a [SparkConf](api/scala/index.html

[GitHub] spark pull request: SPARK-4040. Update documentation to exemplify ...

2014-10-30 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/2964#discussion_r19636056 --- Diff: docs/streaming-programming-guide.md --- @@ -68,7 +68,9 @@ import org.apache.spark._ import org.apache.spark.streaming._ import

[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...

2014-10-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2814#issuecomment-61169847 @andrewor14 Yep, rebased it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-4121] Set commons-math3 version based o...

2014-10-30 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3023#discussion_r19636626 --- Diff: pom.xml --- @@ -1191,6 +1196,7 @@ hadoop.version2.3.0/hadoop.version protobuf.version2.5.0/protobuf.version

[GitHub] spark pull request: SPARK-1209 [CORE] SparkHadoop{MapRed,MapReduce...

2014-10-31 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2814#issuecomment-61233478 Hm, that's weird since I thought I ran the test against the default Hadoop, and that's 1.0.4. Which test failed? (I'll go look around jenkins too) I can't find

[GitHub] spark pull request: [Core] Locale dependent code

2014-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3036#discussion_r19659505 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1683,7 +1683,7 @@ private[spark] object Utils extends Logging { def

[GitHub] spark pull request: [Core] Locale dependent code

2014-10-31 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3036#discussion_r19659592 --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala --- @@ -103,14 +105,16 @@ class UtilsSuite extends FunSuite { val hour = minute

[GitHub] spark pull request: SPARK-2085: [MLlib] Apply user-specific regula...

2014-06-09 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1026#issuecomment-45558366 FWIW I support this change. It matches how I have used ALS and is the weighted regression (WR) in ALS-WR from the paper http://www.hpl.hp.com/personal/Robert_Schreiber

[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...

2014-06-10 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/597#issuecomment-45647190 The results depend a whole lot on the choice of parameters. Did you try some degree of search for the best lambda / # features? it's quite possible to make a model

[GitHub] spark pull request: SPARK-1668: Add implicit preference as an opti...

2014-06-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/597#issuecomment-45792263 You mentioned trying lots of values but what did you try? What about other test metrics -- to rule out some problem in the evaluation? Maybe you can share some of how you

[GitHub] spark pull request: [Minor] Fix style, formatting and naming in Bl...

2014-06-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1058#discussion_r13683274 --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala --- @@ -98,7 +102,7 @@ private[spark] class CacheManager(blockManager: BlockManager) extends

[GitHub] spark pull request: Resolve sbt warnings during build

2014-06-12 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-45918415 The changes you are removing were put in place to resolve warnings from Scala 2.10. IIRC the code does not even compile without these in Scala 2.11. What

[GitHub] spark pull request: SPARK-2058: Overriding config from SPARK_HOME ...

2014-06-12 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/997#discussion_r13731671 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -272,6 +287,24 @@ class SparkSubmitSuite extends FunSuite

[GitHub] spark pull request: Resolve sbt warnings during build

2014-06-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-46002724 Yes, but then what warnings is this resolving? I understand one compiler flag is shorter than imports. I also understand that these language features

[GitHub] spark pull request: Resolve sbt warnings during build

2014-06-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-46004821 Yes, I understand that, but these warnings are already suppressed with imports, and the advertised change here is to resolve warnings. It just exchanges mechanisms

[GitHub] spark pull request: Resolve sbt warnings during build

2014-06-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-46006402 I'm more concerned with consistency than anything. We shouldn't do it both ways, and I don't think that compiler arg should have been added. A compiler flag is not a bad

[GitHub] spark pull request: SPARK-2149. Univariate kernel density estimati...

2014-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1093#discussion_r13794553 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2149. Univariate kernel density estimati...

2014-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1093#discussion_r13794772 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2149. Univariate kernel density estimati...

2014-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1093#discussion_r13794760 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2149. Univariate kernel density estimati...

2014-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1093#discussion_r13829729 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: SPARK-2149. Univariate kernel density estimati...

2014-06-16 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1093#discussion_r13829881 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [WIP] SPARK-2157 Ability to write tight firewa...

2014-06-18 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1107#discussion_r13908816 --- Diff: core/src/main/scala/org/apache/spark/network/ConnectionManager.scala --- @@ -102,7 +102,24 @@ private[spark] class ConnectionManager(port: Int

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-06-18 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46443977 I think this is good to go. The initial test passed, but recent ones errored out. Just to double-check: Jenkins, test this please. --- If your project is set up

[GitHub] spark pull request: SPARK-2034. KafkaInputDStream doesn't close re...

2014-06-18 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/980#issuecomment-46444091 Pardon, could I ping this issue for review and consideration for commit? I think it's a clean fix and improvement. --- If your project is set up for it, you can reply

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-06-18 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46448348 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-06-18 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1115#issuecomment-46464659 (This PR has way more than you intend -- thousands of files changed, hundreds of commits. You need to rebase the branch on master.) --- If your project is set up

[GitHub] spark pull request: Fix some tests.

2014-06-19 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/917#issuecomment-46526650 (Looks like this needs a rebase?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Branch 1.0 Add ZLIBCompressionCodec code

2014-06-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1115#issuecomment-46755115 Assuming you have the `apache/spark` repository configured in your git repository as `upstream`, you can checkout your branch for this PR and `git pull --rebase upstream

[GitHub] spark pull request: SPARK-1996. Remove use of special Maven repo f...

2014-06-21 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1170 SPARK-1996. Remove use of special Maven repo for Akka Just following up Matei's suggestion to remove the Akka repo references. Builds and the audit-release script appear OK. You can merge this pull

[GitHub] spark pull request: SPARK-1675. Make clear whether computePrincipa...

2014-06-21 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1171 SPARK-1675. Make clear whether computePrincipalComponents requires centered data Just closing out this small JIRA, resolving with a comment change. You can merge this pull request into a Git

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-06-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46767613 Yeah, it passes for me locally. And you wouldn't think this would cause trouble by excluding deps. But it does keep failing consistently, so it's probably an actual issue

[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO

2014-06-21 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1173 SPARK-1316. Remove use of Commons IO Commons IO is actually barely used, and is not a declared dependency. This just replaces with equivalents from the JDK and Guava. You can merge this pull

[GitHub] spark pull request: SPARK-2229: FileAppender throw an llegalArgume...

2014-06-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1174#issuecomment-46778033 +1 I literally ran into this too 6 hours ago and had the same fix. It's from the change for SPARK-1940. I think it's a good idea that test be run on Java 6 as a result

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-06-22 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-46781514 Ah OK, it did fail for me locally with `sbt clean assembly test`. Sorry, this did in fact have a problem. I think akka does need the old Netty; the second commit

[GitHub] spark pull request: SPARK-1316. Remove use of Commons IO

2014-06-23 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1173#discussion_r14062363 --- Diff: core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala --- @@ -83,7 +83,7 @@ private[spark] class RollingFileAppender

[GitHub] spark pull request: [SPARK-2308][MLLIB] Add Mini-Batch KMeans Clus...

2014-06-27 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1248#issuecomment-47395422 Broad question -- this seems to duplicate a lot of KMeans.scala. Can it not be a variant rather than a separate implementation? or at least refactor the substantial

[GitHub] spark pull request: SPARK-2293. Replace RDD.zip usage by map with ...

2014-06-27 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1250 SPARK-2293. Replace RDD.zip usage by map with predict inside. This is the only occurrence of this pattern in the examples that needs to be replaced. It only addresses the example change. You can

[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...

2014-06-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47579967 I've been wrestling with this dependency for a while. However I thought we specifically wanted to retain servlet-api 3.0. What problem are you seeing? It's possible

[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...

2014-06-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47580459 Yeah, saw your detail on the JIRA. Yes 2.5 should not be in there, I'm just surprised it is. The concern is knocking out 3.0 accidentally. Have you checked that is all OK

[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...

2014-06-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47589343 Hey @pdmack as far as I can see, you're correct. javax.servlet:servlet-api is not currently a dependency for most (?) Hadoop versions. It seems to come in with versions

[GitHub] spark pull request: Resolve sbt warnings during build � �

2014-06-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1153#issuecomment-47591417 This would be nice to get in before 1.0.1 just to clean up those ugly compiler warnings. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: SPARK-2332 [build] add exclusion for old servl...

2014-06-30 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1271#issuecomment-47594705 On a very related note, have a look at https://github.com/apache/spark/pull/906 -- SPARK-1949. It addresses something very similar and I fixed the problem in it vis-a-vis

[GitHub] spark pull request: [MLLIB] SPARK-2329 Add multi-label evaluation ...

2014-07-01 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1270#discussion_r14428136 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala --- @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-02 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-47836900 Yeah @vanzin these are actually different changes. Incarnations of the same general issue. @pwendell if you have a moment to have a second look, I think this actually

[GitHub] spark pull request: fix bin offset in DecisionTree node aggregatio...

2014-07-07 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1316#issuecomment-48208094 This was reported as https://issues.apache.org/jira/browse/SPARK-2152 (and duplicated for some reason as https://issues.apache.org/jira/browse/SPARK-2160) but the author

[GitHub] spark pull request: Fix (some of the) warnings in the test suite

2014-07-07 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1323#issuecomment-48242653 This covers a lot of the same ground as https://github.com/apache/spark/pull/1153 . It would be great to get all of this in to stop the warnings. --- If your project

[GitHub] spark pull request: Resolve sbt warnings during build

2014-07-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-48281479 @rxin It looks like Scala is requiring developers to be more explicit about intention to use these features; these warnings become errors in Scala 2.11 actually. So my

[GitHub] spark pull request: Resolve sbt warnings during build

2014-07-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1069#issuecomment-48287243 @witgo Go ahead. Right now I actually don't see any warnings appear when the compiler flag is removed, so it looks like all other warnings are suppressed locally already

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-48292618 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Resolve sbt warnings during build

2014-07-08 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1330#issuecomment-48372372 Were all of those imports being removed not required after all to avoid warnings? as long as that's the case (i.e. no build warnings) yes this is great IMHO

[GitHub] spark pull request: Use the scala-logging wrapper instead of the d...

2014-07-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1369#issuecomment-48707227 Wasn't this already decided against in https://github.com/apache/spark/pull/332 and again https://github.com/apache/spark/pull/1208 ? or is this not another PR for https

[GitHub] spark pull request: [SPARK-2359][MLlib] Correlations

2014-07-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/1367#discussion_r14842983 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala --- @@ -0,0 +1,121 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

2014-07-12 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1387#issuecomment-48819579 The JVM will run GC before throwing OOM. System.gc() doesn't necessarily ever invoke GC. What is the expected additional benefit then? --- If your project is set up

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835447 That makes sense, but then it doesn't explain why a constant amount works for a given job when executor memory is low, and then doesn't work when it is high. This has

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1391#issuecomment-48835727 Yes of course, lots of settings' best or even usable values are ultimately app-specific. Ideally, defaults work for lots of cases. A flat value is the simplest of models

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1393 SPARK-2465. Use long as user / item ID for ALS I'd like to float this for consideration: use longs instead of ints for user and product IDs in the ALS implementation. The main reason

[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1393#issuecomment-48843270 I think the most significant change is the Rating object. It goes from 8 + (ref) + 8 (object) + 4 (int) + 4 (int) + 8 (double) = 32 bytes to 8 (ref) + 8 (object) + 4

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-13 Thread srowen
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/906 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: SPARK-1949. Servlet 2.5 vs 3.0 conflict in SBT...

2014-07-13 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/906#issuecomment-48846227 Obsoleted by SBT build changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: SPARK-2363. Clean MLlib's sample data files

2014-07-13 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/1394 SPARK-2363. Clean MLlib's sample data files (Just made a PR for this, @mengxr was the reporter of:) MLlib has sample data under serveral folders: 1) data/mllib 2) data/ 3) mllib

[GitHub] spark pull request: [SPARK-2471] remove runtime scope for jets3t

2014-07-14 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1402#issuecomment-48877650 It's set that way just because it is only used by Hadoop's FileSystem to access S3. Code shouldn't call it directly. Maven should therefore include it in the runtime

[GitHub] spark pull request: [SPARK-2471] remove runtime scope for jets3t

2014-07-14 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1402#issuecomment-48883754 What is the assembly then? I always took this term to mean all of the runtime dependencies together. How would I make a runnable JAR in SBT in general? --- If your

[GitHub] spark pull request: [SPARK-2471] remove runtime scope for jets3t

2014-07-14 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1402#issuecomment-48885032 The SLF4J binding is a runtime dependency, and it may be one that a down-stream consumer wants to override. But leaving it out entirely yields no runtime binding at all

<    1   2   3   4   5   6   7   8   9   10   >