Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2892#issuecomment-60394574
Aw, do we really still have to support Hadoop 1.0.x ...
Looks reasonable since this object is never mutated or at risk of being
changed accidentally. Make a JIRA
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2880#issuecomment-60394639
Since this duplicates your other PR can you close this one?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2846#issuecomment-60395100
Why does this change affect the output, and what is the change? `grep -e`
already prints the whole line that matched. Why change the scalastyle settings
too? This has
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2813#issuecomment-60395462
It's not possible to see what text you changed since the text was also
moved. I am not sure it helps to move the text. Is it possible to see your
changes in place? You
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2909#issuecomment-60406786
@mengxr This is all I think is fixable without starting to take away from
the valid scaladoc. For example we'd have to remove `@group` tags and remove
several links
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2892#issuecomment-60416557
@GraceH I think for now it will be best to continue to support Hadoop
1.0.4. What is the new API you refer to and does it work on 1.0.4? I thought
you are saying
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2647#issuecomment-60438442
I'd hate to have to hard-code more stuff like this. How about just
documenting that `hadoop.version` is required, or failing in this situation?
SBT isn't the build
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19363636
--- Diff: LICENSE ---
@@ -1,4 +1,3 @@
-
--- End diff --
Oops, didn't mean to change that. I don't know why that happened and I'll
undo
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19364103
--- Diff: core/src/main/scala/org/apache/spark/partial/StudentTCacher.scala
---
@@ -35,7 +37,8 @@ private[spark] class StudentTCacher(confidence: Double
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19364277
--- Diff: core/src/main/scala/org/apache/spark/rdd/SampledRDD.scala ---
@@ -53,9 +53,14 @@ private[spark] class SampledRDD[T: ClassTag
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2907#issuecomment-60454077
Cool, you would know best if it's ready for external use. Looks good on
unit tests.
What if it returned `RDD[Array[T]]`? I experimented briefly with making
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2928#discussion_r19368910
--- Diff:
core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala ---
@@ -87,15 +87,19 @@ class BernoulliSampler[T](lb: Double, ub: Double
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2928#issuecomment-60529854
@mengxr Oops I missed that this failed when I saw the SQL tests failed.
Should be OK. I rebased too for good measure.
---
If your project is set up for it, you can reply
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2956#discussion_r19391269
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1204,6 +1204,8 @@ abstract class RDD[T: ClassTag](
} else
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2955#issuecomment-60559230
It seems much more desirable to just support 3g or 200m in this
argument, as was intended. `Utils.memoryStringToMb` can do the conversion.
---
If your project is set up
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2846#issuecomment-60559452
The `awk` command here is still a little wrong, as it matches any string
`error`. `grep \[error\]` definitely works on `scalastyle.txt` to match
`[error]`. There's
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60563934
You can also just take a local reference to the thread, and operate on it.
The local reference will of course be null in both cases or not-null in both
cases
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2955#issuecomment-60564242
Hm, how do you mean? the rest of the code already expects this to be an
`Int` and a number of megabytes. Nothing else can be further parsing this (or
else that's a bug
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60574364
@zsxwing You're trying to handle the case where `thread` changes between
checking `thread != null` and calling `thread.interrupt()` right? Copying
`thread` to a local
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60578902
How would running another task X change the value of `t` in T1's stack? I
understand it could modify `thread`.
---
If your project is set up for it, you can reply
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60579828
Ah right. But wasn't this already a potential problem and still after this
change? the caller's call to `cancel()` may be just about to happen, when the
old task finishes
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60581156
But the call to interrupt the old task interrupt happens after the new task
begins on the thread. The prior interrupt state doesn't matter; the thread is
interrupted
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60583305
Yes I get that this is not just about preventing an NPE, which is what I
thought the intent was originally. I think this does not go far enough to
prevent the problem
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2957#issuecomment-60584591
Yeah I think that would be enough to guarantee it, and it makes sense.
Either interrupt happens before the task nulls the reference, in which case the
task is not done
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2955#issuecomment-60590304
@WangTaoTheTonic heh yes that's exactly what I meant. +1!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2955#issuecomment-60637972
It should not change overall behavior. It at least makes this
--executor-memory flag act like all the others, even if it is pretty internal,
as the comments suggest
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19465130
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2978#issuecomment-60743817
Update the title with SPARK- [MLLIB]
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19465228
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2978#issuecomment-60752144
This is picky now, but you might write out meanAverageError instead of
saying mae. Is r2_score style-wise correct vs r2Score? (Sorry should have
thought of that.) Finally
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2978#discussion_r19468611
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2971#discussion_r19491335
--- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala
---
@@ -89,13 +89,13 @@ private[spark] object ResultTask
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2972#issuecomment-60803920
+1 from me. The log directory change is correct, and while `SPARK_PREFIX`
looks like it matches `SPARK_HOME`, looks more common to use the latter in
Spark.
---
If your
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2994#discussion_r19523818
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ProducerCache.scala
---
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2989#issuecomment-60907459
Other libraries may be using old Jetty. It is not clearly safe to do this.
Can you identify first what is bringing in Jetty 6?
---
If your project is set up for it, you
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2994#discussion_r19577422
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ProducerCache.scala
---
@@ -0,0 +1,34 @@
+/*
+ * Licensed to the Apache
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/3016#issuecomment-61107082
I think you opened this accidentally. Can you close it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2964#discussion_r19636020
--- Diff: docs/configuration.md ---
@@ -21,16 +21,19 @@ application. These properties can be set directly on a
[SparkConf](api/scala/index.html
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/2964#discussion_r19636056
--- Diff: docs/streaming-programming-guide.md ---
@@ -68,7 +68,9 @@ import org.apache.spark._
import org.apache.spark.streaming._
import
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2814#issuecomment-61169847
@andrewor14 Yep, rebased it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3023#discussion_r19636626
--- Diff: pom.xml ---
@@ -1191,6 +1196,7 @@
hadoop.version2.3.0/hadoop.version
protobuf.version2.5.0/protobuf.version
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/2814#issuecomment-61233478
Hm, that's weird since I thought I ran the test against the default Hadoop,
and that's 1.0.4. Which test failed? (I'll go look around jenkins too)
I can't find
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3036#discussion_r19659505
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1683,7 +1683,7 @@ private[spark] object Utils extends Logging {
def
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/3036#discussion_r19659592
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -103,14 +105,16 @@ class UtilsSuite extends FunSuite {
val hour = minute
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1026#issuecomment-45558366
FWIW I support this change. It matches how I have used ALS and is the
weighted regression (WR) in ALS-WR from the paper
http://www.hpl.hp.com/personal/Robert_Schreiber
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/597#issuecomment-45647190
The results depend a whole lot on the choice of parameters. Did you try
some degree of search for the best lambda / # features? it's quite possible to
make a model
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/597#issuecomment-45792263
You mentioned trying lots of values but what did you try? What about other
test metrics -- to rule out some problem in the evaluation? Maybe you can share
some of how you
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1058#discussion_r13683274
--- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
@@ -98,7 +102,7 @@ private[spark] class CacheManager(blockManager:
BlockManager) extends
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-45918415
The changes you are removing were put in place to resolve warnings from
Scala 2.10. IIRC the code does not even compile without these in Scala 2.11.
What
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/997#discussion_r13731671
--- Diff:
core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala ---
@@ -272,6 +287,24 @@ class SparkSubmitSuite extends FunSuite
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-46002724
Yes, but then what warnings is this resolving?
I understand one compiler flag is shorter than imports. I also understand
that these language features
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-46004821
Yes, I understand that, but these warnings are already suppressed with
imports, and the advertised change here is to resolve warnings. It just
exchanges mechanisms
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-46006402
I'm more concerned with consistency than anything. We shouldn't do it both
ways, and I don't think that compiler arg should have been added. A compiler
flag is not a bad
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1093#discussion_r13794553
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1093#discussion_r13794772
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1093#discussion_r13794760
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1093#discussion_r13829729
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1093#discussion_r13829881
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/KernelDensity.scala ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1107#discussion_r13908816
--- Diff:
core/src/main/scala/org/apache/spark/network/ConnectionManager.scala ---
@@ -102,7 +102,24 @@ private[spark] class ConnectionManager(port: Int
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-46443977
I think this is good to go. The initial test passed, but recent ones
errored out. Just to double-check:
Jenkins, test this please.
---
If your project is set up
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/980#issuecomment-46444091
Pardon, could I ping this issue for review and consideration for commit? I
think it's a clean fix and improvement.
---
If your project is set up for it, you can reply
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-46448348
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1115#issuecomment-46464659
(This PR has way more than you intend -- thousands of files changed,
hundreds of commits. You need to rebase the branch on master.)
---
If your project is set up
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/917#issuecomment-46526650
(Looks like this needs a rebase?)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1115#issuecomment-46755115
Assuming you have the `apache/spark` repository configured in your git
repository as `upstream`, you can checkout your branch for this PR and `git
pull --rebase upstream
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1170
SPARK-1996. Remove use of special Maven repo for Akka
Just following up Matei's suggestion to remove the Akka repo references.
Builds and the audit-release script appear OK.
You can merge this pull
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1171
SPARK-1675. Make clear whether computePrincipalComponents requires centered
data
Just closing out this small JIRA, resolving with a comment change.
You can merge this pull request into a Git
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-46767613
Yeah, it passes for me locally. And you wouldn't think this would cause
trouble by excluding deps. But it does keep failing consistently, so it's
probably an actual issue
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1173
SPARK-1316. Remove use of Commons IO
Commons IO is actually barely used, and is not a declared dependency. This
just replaces with equivalents from the JDK and Guava.
You can merge this pull
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1174#issuecomment-46778033
+1 I literally ran into this too 6 hours ago and had the same fix. It's
from the change for SPARK-1940. I think it's a good idea that test be run on
Java 6 as a result
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-46781514
Ah OK, it did fail for me locally with `sbt clean assembly test`. Sorry,
this did in fact have a problem. I think akka does need the old Netty; the
second commit
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1173#discussion_r14062363
--- Diff:
core/src/main/scala/org/apache/spark/util/logging/RollingFileAppender.scala ---
@@ -83,7 +83,7 @@ private[spark] class RollingFileAppender
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1248#issuecomment-47395422
Broad question -- this seems to duplicate a lot of KMeans.scala. Can it not
be a variant rather than a separate implementation? or at least refactor the
substantial
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1250
SPARK-2293. Replace RDD.zip usage by map with predict inside.
This is the only occurrence of this pattern in the examples that needs to
be replaced. It only addresses the example change.
You can
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1271#issuecomment-47579967
I've been wrestling with this dependency for a while. However I thought we
specifically wanted to retain servlet-api 3.0. What problem are you seeing?
It's possible
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1271#issuecomment-47580459
Yeah, saw your detail on the JIRA. Yes 2.5 should not be in there, I'm just
surprised it is. The concern is knocking out 3.0 accidentally. Have you checked
that is all OK
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1271#issuecomment-47589343
Hey @pdmack as far as I can see, you're correct. javax.servlet:servlet-api
is not currently a dependency for most (?) Hadoop versions. It seems to come in
with versions
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1153#issuecomment-47591417
This would be nice to get in before 1.0.1 just to clean up those ugly
compiler warnings.
---
If your project is set up for it, you can reply to this email and have your
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1271#issuecomment-47594705
On a very related note, have a look at
https://github.com/apache/spark/pull/906 -- SPARK-1949. It addresses something
very similar and I fixed the problem in it vis-a-vis
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1270#discussion_r14428136
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MultilabelMetrics.scala
---
@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-47836900
Yeah @vanzin these are actually different changes. Incarnations of the same
general issue. @pwendell if you have a moment to have a second look, I think
this actually
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1316#issuecomment-48208094
This was reported as https://issues.apache.org/jira/browse/SPARK-2152 (and
duplicated for some reason as https://issues.apache.org/jira/browse/SPARK-2160)
but the author
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1323#issuecomment-48242653
This covers a lot of the same ground as
https://github.com/apache/spark/pull/1153 . It would be great to get all of
this in to stop the warnings.
---
If your project
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-48281479
@rxin It looks like Scala is requiring developers to be more explicit about
intention to use these features; these warnings become errors in Scala 2.11
actually. So my
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1069#issuecomment-48287243
@witgo Go ahead. Right now I actually don't see any warnings appear when
the compiler flag is removed, so it looks like all other warnings are
suppressed locally already
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-48292618
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1330#issuecomment-48372372
Were all of those imports being removed not required after all to avoid
warnings? as long as that's the case (i.e. no build warnings) yes this is great
IMHO
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1369#issuecomment-48707227
Wasn't this already decided against in
https://github.com/apache/spark/pull/332 and again
https://github.com/apache/spark/pull/1208 ? or is this not another PR for
https
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/1367#discussion_r14842983
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/stat/correlation/Correlation.scala
---
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-48819579
The JVM will run GC before throwing OOM. System.gc() doesn't necessarily
ever invoke GC. What is the expected additional benefit then?
---
If your project is set up
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835447
That makes sense, but then it doesn't explain why a constant amount works
for a given job when executor memory is low, and then doesn't work when it is
high. This has
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1391#issuecomment-48835727
Yes of course, lots of settings' best or even usable values are ultimately
app-specific. Ideally, defaults work for lots of cases. A flat value is the
simplest of models
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1393
SPARK-2465. Use long as user / item ID for ALS
I'd like to float this for consideration: use longs instead of ints for
user and product IDs in the ALS implementation.
The main reason
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48843270
I think the most significant change is the Rating object. It goes from 8 +
(ref) + 8 (object) + 4 (int) + 4 (int) + 8 (double) = 32 bytes to 8 (ref) + 8
(object) + 4
Github user srowen closed the pull request at:
https://github.com/apache/spark/pull/906
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/906#issuecomment-48846227
Obsoleted by SBT build changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user srowen opened a pull request:
https://github.com/apache/spark/pull/1394
SPARK-2363. Clean MLlib's sample data files
(Just made a PR for this, @mengxr was the reporter of:)
MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1402#issuecomment-48877650
It's set that way just because it is only used by Hadoop's FileSystem to
access S3. Code shouldn't call it directly. Maven should therefore include it
in the runtime
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1402#issuecomment-48883754
What is the assembly then? I always took this term to mean all of the
runtime dependencies together. How would I make a runnable JAR in SBT in
general?
---
If your
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1402#issuecomment-48885032
The SLF4J binding is a runtime dependency, and it may be one that a
down-stream consumer wants to override. But leaving it out entirely yields no
runtime binding at all
301 - 400 of 15282 matches
Mail list logo