decomposition is pretty good here, yes.
--
Sean Owen | Director, Data Science | London
On Thu, Mar 6, 2014 at 3:05 PM, Debasish Das debasish.da...@gmail.com wrote:
Hi Sebastian,
Yes Mahout ALS and Oryx runs fine on the same matrix because Sean calls QR
decomposition.
But the ALS objective should
-revealing and can help detect this
situation.
That is why I had just used the QR decomposition. I agree that it is
almost surely a synthetic or flawed setup that causes this situation,
but, those things do happen.
--
Sean Owen | Director, Data Science | London
On Thu, Mar 6, 2014 at 6:51 PM, Matei
in the parent pom, and ahead of the Cloudera repo. This
causes it to be tried first, which is appropriate.
Any +1 for either of those changes?
--
Sean Owen | Director, Data Science | London
On Fri, Mar 14, 2014 at 7:37 AM, Tom Graves tgraves...@yahoo.com wrote:
It appears the cloudera repo for the mqtt
their order can be controlled as desired. Child pom repos
come after parent repos and that, while it rarely makes any difference,
isn't actually desirable.
I'll prep a PR but wait for someone else to second a change like that.
--
Sean Owen | Director, Data Science | London
On Fri, Mar 14, 2014 at 7:57
PS the Cloudera cert issue was cleared up a few hours ago; give it a spin.
On Fri, Mar 14, 2014 at 8:22 AM, Sean Owen so...@cloudera.com wrote:
Yes, I'm using Maven 3.2.1. Actually, scratch that, it fails for me too
once it gets down into the MQTT module, with a clearer error
Much of this sounds related to the memory issue mentioned earlier in this
thread. Are you using a build that has fixed that? That would be by far
most important here.
If the raw memory requirement is 8GB, the actual heap size necessary could
be a lot larger -- object overhead, all the other stuff
.
--
Sean Owen | Director, Data Science | London
On Sat, Apr 5, 2014 at 11:06 PM, Patrick Wendell pwend...@gmail.com wrote:
If you want to submit a hot fix for this issue specifically please do. I'm
not sure why it didn't fail our build...
On Sat, Apr 5, 2014 at 2:30 PM, Debasish Das debasish.da
scala.None certainly isn't new in 2.10.4; it's ancient :
http://www.scala-lang.org/api/2.10.3/index.html#scala.None$
Surely this is some other problem?
On Sun, Apr 6, 2014 at 6:46 PM, Koert Kuipers ko...@tresata.com wrote:
also, i thought scala 2.10 was binary compatible, but does not seem to
Good call -- indeed that same Files class has a move() method that
will try to use renameTo() and then fall back to copy() and delete()
if needed for this very reason.
On Tue, Apr 15, 2014 at 6:34 AM, Ye Xianjin advance...@gmail.com wrote:
Hi, I think I have found the cause of the tests
On Mon, Apr 21, 2014 at 6:03 PM, Paul Brown p...@mult.ifario.us wrote:
- MLlib as Mahout.next would be a unfortunate. There are some gems in
Mahout, but there are also lots of rocks. Setting a minimal bar of
working, correctly implemented, and documented requires a surprising amount
of work.
#1 and #2 are not relevant the issue of jar size. These can be problems in
general, but don't think there have been issues attributable to file
clashes. Shading has mechanisms to deal with this anyway.
#3 is a problem in general too, but is not specific to shading. Where
versions collide, build
On Tue, May 13, 2014 at 2:49 PM, Sean Owen so...@cloudera.com wrote:
On Tue, May 13, 2014 at 9:36 AM, Patrick Wendell pwend...@gmail.com wrote:
The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.0.0-rc5/
Good news
On this note, non-binding commentary:
Releases happen in local minima of change, usually created by
internally enforced code freeze. Spark is incredibly busy now due to
external factors -- recently a TLP, recently discovered by a large new
audience, ease of contribution enabled by Github. It's
On Sat, May 17, 2014 at 4:52 PM, Mark Hamstra m...@clearstorydata.com wrote:
Which of the unresolved bugs in spark-core do you think will require an
API-breaking change to fix? If there are none of those, then we are still
essentially on track for a 1.0.0 release.
I don't have a particular
I might be stating the obvious for everyone, but the issue here is not
reflection or the source of the JAR, but the ClassLoader. The basic
rules are this.
new Foo will use the ClassLoader that defines Foo. This is usually
the ClassLoader that loaded whatever it is that first referenced Foo
and
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai
On Sun, May 18, 2014 at 11:57 PM, Sean Owen so...@cloudera.com wrote:
I might be stating the obvious for everyone, but the issue here is not
reflection or the source of the JAR, but the ClassLoader. The basic
rules
It's an Iterator in both Java and Scala. In both cases you need to
copy the stream of values into something List-like to sort it. An
Iterable would not change that (not sure the API can promise many
iterations anyway).
If you just want the equivalent of toArray, you can use a utility
method in
http://spark.apache.org/docs/0.9.1/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions
It becomes automagically available when your RDD contains pairs.
On Tue, May 20, 2014 at 9:00 PM, GlennStrycker glenn.stryc...@gmail.com wrote:
I don't seem to have this function in my Spark
I'd like to resurrect this thread:
http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3c6d657d19-1ecf-4e92-bf15-cc4762ef9...@thekratos.com%3E
Basically when you call this particular Java-flavored overloading of
KafkaUtils.createStream:
This class was introduced in Servlet 3.0. We have in the dependency
tree some references to Servlet 2.5 and Servlet 3.0. The latter is a
superset of the former. So we standardized on depending on Servlet
3.0.
At least, that seems to have been successful in the Maven build, but
this is just
Thanks Nan, that does appear to fix it. I was using local. Can
anyone say whether that's to be expected or whether it could be a bug
somewhere?
On Fri, May 30, 2014 at 2:42 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
Hi, Sean
I was in the same problem
but when I changed MASTER=“local” to
On Mon, Jun 2, 2014 at 6:05 PM, Marcelo Vanzin van...@cloudera.com wrote:
You mentioned something in your shading argument that kinda reminded
me of something. Spark currently depends on slf4j implementations and
log4j with compile scope. I'd argue that's the wrong approach if
we're talking
I suspect Patrick is right about the cause. The Maven artifact that
was released does contain this class (phew)
http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.0.0%7Cjar
As to the hadoop1 / hadoop2 artifact question -- agree that is often
done. Here the working
(The user@ list might be a bit better but I can see why it might look
like a dev@ question.)
Did you import org.apache.spark.mllib.linalg.Vector ? I think you are
picking up Scala's Vector class instead.
On Mon, Jun 9, 2014 at 11:57 AM, dataginjaninja
rickett.stepha...@gmail.com wrote:
The
(BCC dev@)
The example is out of date with respect to current Vector class. The
zeros() method is on Vectors. There is not currently a += operation
for Vector anymore.
To be fair the example doesn't claim this illustrates use of the Spark
Vector class but it did work with the now-deprecated
On Tue, Jul 8, 2014 at 7:29 AM, Lizhengbing (bing, BIPA)
zhengbing...@huawei.com wrote:
1) I download the imdb data from
http://komarix.org/ac/ds/Blanc__Mel.txt.bz2 and use this data to test
LBFGS
2) I find the imdb data are zero-based-index data
Since the method is for parsing the
Agree. You end up with a core and a corer core to distinguish
between and it ends up just being more complicated. This sounds like
something that doesn't need a module.
On Tue, Jul 15, 2014 at 5:59 AM, Patrick Wendell pwend...@gmail.com wrote:
Adding new build modules is pretty high overhead, so
Are you setting -Pyarn-alpha? ./sbt/sbt -Pyarn-alpha, followed by
projects, shows it as a module. You should only build yarn-stable
*or* yarn-alpha at any given time.
I don't remember the modules changing in a while. 'yarn-alpha' is for
YARN before it stabilized, circa early Hadoop 2.0.x.
This looks like a Jetty version problem actually. Are you bringing in
something that might be changing the version of Jetty used by Spark?
It depends a lot on how you are building things.
Good to specify exactly how your'e building here.
On Thu, Jul 17, 2014 at 3:43 PM, Nathan Kronenfeld
Looks like a real problem. I see it too. I think the same workaround
found in ClientBase.scala needs to be used here. There, the fact that
this field can be a String or String[] is handled explicitly. In fact
I think you can just call to ClientBase for this? PR it, I say.
On Thu, Jul 17, 2014 at
, 2014 at 10:56 AM, Sean Owen so...@cloudera.com wrote:
This looks like a Jetty version problem actually. Are you bringing in
something that might be changing the version of Jetty used by Spark?
It depends a lot on how you are building things.
Good to specify exactly how your'e building here
can make this change after 1642 is through.
On Thu, Jul 17, 2014 at 12:25 PM, Sean Owen so...@cloudera.com wrote:
CC tmalaska since he touched the line in question. This is a fun one.
So, here's the line of code added last week:
val channelFactory = new NioServerSocketChannelFactory
Good idea, although it gets difficult in the context of multiple
distributions. Say change X is not present in version A, but present
in version B. If you depend on X, what version can you look for to
detect it? The distribution will return A or A+X or somesuch, but
testing for A will give an
%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
Should a JIRA be opened so that dependency on hive-metastore can be
replaced by dependency on hive-exec ?
Cheers
On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote:
The reason for org.spark-project.hive is that Spark relies
You missed the mllib artifact? that would certainly explain it! all I
see is core.
On Sun, Aug 3, 2014 at 10:03 AM, jun kit...@126.com wrote:
Hi,
I have started my spark exploration in intellij IDEA local model and want to
focus on MLlib part.
but when I put some example codes in IDEA, It
For any Hadoop 2.4 distro, yes, set hadoop.version but also set
-Phadoop-2.4. http://spark.apache.org/docs/latest/building-with-maven.html
On Mon, Aug 4, 2014 at 9:15 AM, Patrick Wendell pwend...@gmail.com wrote:
For hortonworks, I believe it should work to just link against the
corresponding
What would such a profile do though? In general building for a
specific vendor version means setting hadoop.verison and/or
yarn.version. Any hard-coded value is unlikely to match what a
particular user needs. Setting protobuf versions and so on is already
done by the generic profiles.
In a
I think your best bet by far is to consume the Maven build as-is from
within Eclipse. I wouldn't try to export a project config from the
build as there is plenty to get lost in translation.
Certainly this works well with IntelliJ, and by the by, if you have a
choice, I would strongly recommend
(Don't use gen-idea, just open it directly as a Maven project in IntelliJ.)
On Thu, Aug 7, 2014 at 4:53 AM, Ron Gonzalez
zlgonza...@yahoo.com.invalid wrote:
So I downloaded community edition of IntelliJ, and ran sbt/sbt gen-idea.
I then imported the pom.xml file.
I'm still getting all sorts of
It's definitely just a typo. The ordered categories are A, C, B so the
other split can't be A | B, C. Just open a PR.
On Thu, Aug 7, 2014 at 2:11 AM, Matt Forbes m...@tellapart.com wrote:
I found the section on ordering categorical features really interesting,
but the A, B, C example seemed
A common approach is to separate unit tests from integration tests.
Maven has support for this distinction. I'm not sure it helps a lot
though, since it only helps you to not run integration tests all the
time. But lots of Spark tests are integration-test-like and are
important to run to know a
Try setting it to handle incremental compilation of Scala by itself
(IntelliJ) and to run its own compile server. This is in global
settings, under the Scala settings. It seems to compile incrementally
for me when I change a file or two.
On Mon, Aug 11, 2014 at 8:57 PM, Ron's Yahoo!
Maven is just telling you that there is no version 1.1.0 of
yarn-parent, and indeed, it has not been released. To build the branch
you would need to mvn install to compile and make available local
copies of artifacts along the way. (You may have these for
1.1.0-SNAPSHOT locally already). Use
Yes, master hasn't compiled for me for a few days. It's fixed in:
https://github.com/apache/spark/pull/1726
https://github.com/apache/spark/pull/2075
Could a committer sort this out?
Sean
On Fri, Aug 22, 2014 at 9:55 PM, Ted Yu yuzhih...@gmail.com wrote:
Hi,
Using the following command on
The examples aren't runnable quite like this. It's intended that they
are submitted to a cluster with spark-submit, which would among other
things provide Spark at runtime.
I think you might get them to run this way if you set master to
local[*] and indeed made a run profile that also included
+1 I tested the source and Hadoop 2.4 release. Checksums and
signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
fail any more than usual.
FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
project and have encountered no problems.
I notice that the 1.1.0
popular
Hadoop versions to lower the bar for users to build and test Spark.
- Patrick
On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote:
+1 I tested the source and Hadoop 2.4 release. Checksums and
signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
fail
On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell pwend...@gmail.com wrote:
In terms of vendor support for this approach - In the early days
Cloudera asked us to add CDH4 repository and more recently Pivotal and
MapR also asked us to allow linking against their hadoop-client
libraries. So we've
This isn't possible since the two versions of YARN are mutually
incompatible at compile-time. However see my comments about how this
could be restructured to be a little more standard, and so that
IntelliJ would parse it out of the box.
Still I imagine it is not worth it if YARN alpha will go
/sources
/configuration
/execution
/executions
/plugin
On Aug 31, 2014, at 16:19, Sean Owen so...@cloudera.com wrote:
This isn't possible since the two versions of YARN are mutually
incompatible at compile-time. However see my comments
All the signatures are correct. The licensing all looks fine. The
source builds fine.
Now, let me ask about unit tests, since I had a more detailed look,
which I should have done before.
dev/run-tests fails two tests (1 Hive, 1 Kafka Streaming) for me
locally on 1.1.0-rc3. Does anyone else see
Fantastic. As it happens, I just fixed up Mahout's tests for Java 8
and observed a lot of the same type of failure.
I'm about to submit PRs for the two issues I identified. AFAICT these
3 then cover the failures I mentioned:
https://issues.apache.org/jira/browse/SPARK-3329
Hm, are you suggesting that the Spark distribution be a bag of 100
JARs? It doesn't quite seem reasonable. It does not remove version
conflicts, just pushes them to run-time, which isn't good. The
assembly is also necessary because that's where shading happens. In
development, you want to run
+1 signatures still fine, tests still pass. On Mac OS X I get the
following failure but I think it's spurious. Only mentioning it to see
if anyone else sees it. It doesn't happen on Linux.
[error] Test
org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream
failed:
Dumb question -- are you using a Spark build that includes the Kinesis
dependency? that build would have resolved conflicts like this for
you. Your app would need to use the same version of the Kinesis client
SDK, ideally.
All of these ideas are well-known, yes. In cases of super-common
This is just a line logging that one test succeeded right? I don't find
that noise. Recently I wanted to search test run logs for a test case
success and it was important that the individual test case was logged.
On Sep 6, 2014 4:13 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
It would help to point to your change. Are you sure it was only docs
and are you sure you're rebased, submitting against the right branch?
Jenkins is saying you are changing public APIs; it's not reporting
test failures. But it could well be a test/Jenkins problem.
On Sun, Sep 7, 2014 at 8:39 PM,
FWIW consensus from Cloudera folk seems to be that there's no need or
demand on this end for YARN alpha. It wouldn't have an impact if it
were removed sooner even.
It will be a small positive to reduce complexity by removing this
support, making it a little easier to develop for current YARN
On Thu, Sep 11, 2014 at 10:17 PM, Tom thubregt...@gmail.com wrote:
If I set SPARK_DRIVER_MEMORY to x GB, Spark reports
/14/09/11 15:36:41 INFO MemoryStore: MemoryStore started with capacity
~0.55*x GB/
*Question:*
Does this relate to spark.storage.memoryFraction (default 0.6), and is the
What has worked for me is to bundle log4j.properties in the root of
the application's .jar file, since log4j will look for it there, and
configuring log4j will turn off Spark's default log4j configuration.
I don't think conf/log4j.properties is going to do anything by itself,
but
I'm having trouble getting decision forests to work with categorical
features. I have a dataset with a categorical feature with 40 values.
It seems to be treated as a continuous/numeric value by the
implementation.
Digging deeper, I see there is some logic in the code that indicates
that
heuristic when storing histogram bins (and searching for
optimal splits) in the tree code.
Maybe Manish or Joseph can clarify?
On Oct 12, 2014, at 2:50 PM, Sean Owen so...@cloudera.com wrote:
I'm having trouble getting decision forests to work with categorical
features. I have a dataset
, Sean Owen so...@cloudera.com wrote:
I'm looking at this bit of code in DecisionTreeMetadata ...
val maxCategoriesForUnorderedFeature =
((math.log(maxPossibleBins / 2 + 1) / math.log(2.0)) + 1).floor.toInt
strategy.categoricalFeaturesInfo.foreach { case (featureIndex, numCategories
Great, we'll confer then. I'm using master / 1.2.0-SNAPSHOT. I'll send
some details directly under separate cover.
On Mon, Oct 13, 2014 at 7:12 PM, Joseph Bradley jos...@databricks.com wrote:
Hi Sean,
Sorry I didn't see this thread earlier! (Thanks Ameet for pinging me.)
Short version: That
It Gramian is at least positive semidefinite and will be definite if the
matrix is non singular, yes. That's usually but not always true.
The lambda*I matrix is positive definite, well, when lambda is positive.
Adding that makes it definite.
At least, lambda=0 could be rejected as invalid.
But
Maven is at least built in to OS X (well, with dev tools). You don't
even have to brew install it. Surely SBT isn't in the dev tools even?
I recall I had to install it. I'd be surprised to hear it required
zero setup.
On Mon, Oct 20, 2014 at 8:04 PM, Nicholas Chammas
nicholas.cham...@gmail.com
Oh right, we're talking about the bundled sbt of course.
And I didn't know Maven wasn't installed anymore!
On Mon, Oct 20, 2014 at 8:20 PM, Hari Shreedharan
hshreedha...@cloudera.com wrote:
The sbt executable that is in the spark repo can be used to build sbt
without any other set up (it will
This one can be resolved, I think, with a bit of help from someone who
understands SBT + plugin config:
https://issues.apache.org/jira/browse/SPARK-3359
Just a matter of figuring out how to set a property on the plugin.
This would make Java 8 javadoc work much more nicely. Minor but
useful!
Given the nature of the error, I would be really, really shocked if
Java 7u71 were actually being used in the failing build, so no I do
not thing the problem has to do with 7u71 per se.
As I'd expect I see no changes to javac in this update from 7u65, and
no chatter about crazy javac regressions.
On Fri, Oct 24, 2014 at 8:59 PM, Koert Kuipers ko...@tresata.com wrote:
mvn clean package -DskipTests takes about 30 mins for me. thats painful
since its needed for the tests. does anyone know any tricks to speed it up?
(besides getting a better laptop). does zinc help?
Zinc helps by about
Here's a crude benchmark on a Linux box (GCE n1-standard-4). zinc gets
the assembly build in range of SBT's time.
mvn -DskipTests clean package
15:27
(start zinc)
8:18
(rebuild)
7:08
./sbt/sbt -DskipTests clean assembly
5:10
(start zinc)
5:11
(rebuild)
5:06
The dependencies were already
On Tue, Oct 28, 2014 at 6:18 PM, Niklas Wilcke
1wil...@informatik.uni-hamburg.de wrote:
1. via dev/run-tests script
This script executes all tests and take several hours to finish.
Some tests failed but I can't say which of them. Should this really take
that long? Can I specify to run only
On Wed, Oct 29, 2014 at 6:02 PM, Niklas Wilcke
1wil...@informatik.uni-hamburg.de wrote:
The core tests seems to fail because of my german locale. Some tests are
locale dependend like the
UtilsSuite.scala
- string formatting of time durations - checks for locale dependend
seperators like .
for the failure. I tried some different configurations like
[1,1,512], [2,1,1024] etc. but couldn't get the tests run without a
failure.
Could this be a configuration issue?
On 28.10.2014 19:03, Sean Owen wrote:
On Tue, Oct 28, 2014 at 6:18 PM, Niklas Wilcke
1wil...@informatik.uni
MAP is effectively an average over all k from 1 to min(#
recommendations, # items rated) Getting first recommendations right is
more important than the last.
On Thu, Oct 30, 2014 at 10:21 PM, Debasish Das debasish.da...@gmail.com wrote:
Does it make sense to have a user specific K or K is
This might be a question for Xiangrui. Recently I was using
BinaryClassificationMetrics to build an AUC curve for a classifier
over a reasonably large number of points (~12M). The scores were all
probabilities, so tended to be almost entirely unique.
The computation does some operations by key,
/Partitioner.scala#L104
. Limiting the number of bins is definitely useful. Do you have time
to work on it? -Xiangrui
On Sun, Nov 2, 2014 at 9:34 AM, Sean Owen so...@cloudera.com wrote:
This might be a question for Xiangrui. Recently I was using
BinaryClassificationMetrics to build an AUC curve
Let me crash this thread to suggest this *might* be related to this
problem I'm trying to solve:
https://issues.apache.org/jira/browse/SPARK-4196
Basically the question there is: this blank Configuration object gets
made on the driver in the saveAsNewAPIHadoopFiles call, and seems to
need to be
I don't think it's anything to do with AbstractParams. The problem is
MovieLensALS$Params, which is a case class without default
constructor. It is not Serializable.
However you can see it gets used in an RDD function:
val ratings = sc.textFile(params.input).map { line =
val fields =
(Different topic, indulge me one more reply --)
Yes the number of JIRAs/PRs closed is unprecedented too and that
deserves big praise. The project has stuck to making all changes and
discussion in this public process, which is so powerful. Adjusted for
the sheer inbound volume, Spark is doing a
I noticed that this doesn't compile:
mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package
[error] warning: [options] bootstrap class path not set in conjunction
with -source 1.6
[error]
on isolating it's inclusion to only the
newer YARN API's.
- Patrick
On Fri, Nov 7, 2014 at 11:43 PM, Sean Owen so...@cloudera.com wrote:
I noticed that this doesn't compile:
mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean
package
[error] warning: [options] bootstrap class
Oops, that was my mistake. I moved network/shuffle into yarn, when
it's just that network/yarn should be removed from yarn-alpha. That
makes yarn-alpha work. I'll run tests and open a quick JIRA / PR for
the change.
On Sat, Nov 8, 2014 at 8:23 AM, Patrick Wendell pwend...@gmail.com wrote:
This
- Tip: when you rebase, IntelliJ will temporarily think things like the
Kafka module are being removed. Say 'no' when it asks if you want to remove
them.
- Can we go straight to Scala 2.11.4?
On Wed, Nov 12, 2014 at 5:47 AM, Patrick Wendell pwend...@gmail.com wrote:
Hey All,
I've just merged
LICENSE and NOTICE are fine. Signature and checksum is fine. I
unzipped and built the plain source distribution, which built.
However I am seeing a consistent test failure with mvn -DskipTests
clean package; mvn test. In the Hive module:
- SET commands semantics for a HiveContext *** FAILED ***
2014-11-13 11:26 GMT-08:00 Michael Armbrust mich...@databricks.com:
Hey Sean,
Thanks for pointing this out. Looks like a bad test where we should be
doing Set comparison instead of Array.
Michael
On Thu, Nov 13, 2014 at 2:05 AM, Sean Owen so...@cloudera.com wrote:
LICENSE and NOTICE
I don't think it's necessary. You're looking at the hadoop-2.4
profile, which works with anything = 2.4. AFAIK there is no further
specialization needed beyond that. The profile sets hadoop.version to
2.4.0 by default, but this can be overridden.
On Fri, Nov 14, 2014 at 3:43 PM, Corey Nolet
14, 2014 at 10:46 AM, Sean Owen so...@cloudera.com wrote:
I don't think it's necessary. You're looking at the hadoop-2.4
profile, which works with anything = 2.4. AFAIK there is no further
specialization needed beyond that. The profile sets hadoop.version to
2.4.0 by default, but this can
FWIW I do not see this on master with mvn -DskipTests clean package.
I'm on OS X 10.10 and I build with Java 8 by default.
On Fri, Nov 14, 2014 at 8:17 PM, Patrick Wendell pwend...@gmail.com wrote:
A recent patch broke clean builds for me, I am trying to see how
widespread this issue is and
No, the Maven build is the main one. I would use it unless you have a need
to use the SBT build in particular.
On Nov 16, 2014 2:58 AM, Dinesh J. Weerakkody dineshjweerakk...@gmail.com
wrote:
Hi Yiming,
I believe that both SBT and MVN is supported in SPARK, but SBT is preferred
(I'm not 100%
I thought I'd ask first since there's a good chance this isn't a
problem, but, I'm having a problem wherein the first batch that Spark
Streaming processes fails (due to an app problem), but then, stop()
blocks a very long time.
This bit of JobGenerator.stop() executes, since the message appears
])
Michael
On Sun, Nov 16, 2014 at 3:27 AM, Dinesh J. Weerakkody
dineshjweerakk...@gmail.com wrote:
Hi Stephen and Sean,
Thanks for correction.
On Sun, Nov 16, 2014 at 12:28 PM, Sean Owen so...@cloudera.com wrote:
No, the Maven build is the main one. I would use it unless you have a
need
I use randomSplit to make a train/CV/test set in one go. It definitely
produces disjoint data sets and is efficient. The problem is you can't
do it by key.
I am not sure why your subtract does not work. I suspect it is because
the values do not partition the same way, or they don't evaluate
+1 (non binding)
Signatures and license looks good. I built the plain-vanilla
distribution and ran tests. While I still see the Java 8 + Hive test
failure, I think we've established this is ignorable.
On Wed, Nov 19, 2014 at 11:51 PM, Andrew Or and...@databricks.com wrote:
I will start with a
For the interested, the SVN repo for the site is viewable at
http://svn.apache.org/viewvc/spark/site/ and to check it out, you can
svn co https://svn.apache.org/repos/asf/spark/site;
I assume the best process is to make a diff and attach it to the JIRA.
How old school.
On Tue, Nov 25, 2014 at
I'm having no problems with the build or zinc on my Mac. I use zinc
from brew install zinc.
On Tue, Dec 2, 2014 at 3:02 AM, Stephen Boesch java...@gmail.com wrote:
Mac as well. Just found the problem: I had created an alias to zinc a
couple of months back. Apparently that is not happy with
Yes, they are compiled to classes in JVM bytecode just the same. You
may find the generated code from Scala looks a bit strange and uses
Scala-specific classes, but it's certainly possible to treat them like
other Java classes.
On Tue, Dec 2, 2014 at 5:22 AM, Niranda Perera nira...@wso2.com
You just run it once with zinc -start and leave it running as a
background process on your build machine. You don't have to do
anything for each build.
On Wed, Dec 3, 2014 at 3:44 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
(Nit: CDH *5.1.x*, including 5.1.3, is derived from Hadoop 2.3.x. 5.3
is based on 2.5.x)
On Fri, Dec 5, 2014 at 3:29 PM, DB Tsai dbt...@dbtsai.com wrote:
As Marcelo said, CDH5.3 is based on hadoop 2.3, so please try
-
To
https://github.com/apache/spark/blob/master/network/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/BlockTransferMessage.java#L70
public byte[] toByteArray() {
ByteBuf buf = Unpooled.buffer(encodedLength());
buf.writeByte(type().id);
encode(buf);
assert buf.writableBytes()
1 - 100 of 1514 matches
Mail list logo