Hi everyone,
The PMC recently voted to add two new committers and PMC members: Joey Gonzalez
and Andrew Or. Both have been huge contributors in the past year -- Joey on
much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew
on Spark Core. Join me in welcoming them as
+1 Joey Andrew :)
--
Christopher T. Nguyen
Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao]
linkedin.com/in/ctnguyen
On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez jegon...@eecs.berkeley.edu
wrote:
Hi Everyone,
Thank you for inviting me to be a committer. I look forward to
Hi Jun,
Spark currently doesn't have that feature, i.e. it aims for a fixed number
of executors per application regardless of resource usage, but it's
definitely worth considering. We could start more executors when we have a
large backlog of tasks and shut some down when we're underutilized.
Hi Patrick,
I am testing the 1.1 branch but I see lot of protobuf warnings while
building the jars:
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser
Yeah this should be changed. You can change the banner in the repl,
printWelcome function. Mind sending a PR ?
I think this should be a one place change in the future (Not sure how
feasible it is). Volunteers ?
Prashant Sharma
On Fri, Aug 8, 2014 at 12:48 PM, Debasish Das
Need a parameter -Phadoop-2.3
eg:
./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.2
-Dyarn.version=2.3.0-cdh5.0.2 -Phadoop-2.3 -Pyarn
-- --
??: Debasish Dasdebasish.da...@gmail.com;
: 2014??8??8??(??) 3:18
??:
Yes, I think we need both level resource control (container numbers and
dynamically change container resources), which can make the resource
utilization much more effective, especially when we have more types work
load share the same infrastructure.
Is there anyway I can observe the tasks
I think that would be useful work. I don't know the minute details of this
code, but in general TaskSchedulerImpl keeps track of pending tasks. Tasks
are organized into TaskSets, each of which corresponds to a particular
stage. Each TaskSet has a TaskSetManager, which directly tracks the
Congrats, Joey Andrew!!
-Xiangrui
On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote:
+1 Joey Andrew :)
--
Christopher T. Nguyen
Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao]
linkedin.com/in/ctnguyen
On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez
Congratulations Andrew and Joey.
Prashant Sharma
On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote:
Congrats, Joey Andrew!!
-Xiangrui
On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com
wrote:
+1 Joey Andrew :)
--
Christopher T. Nguyen
Howdy,
Do we think it's both feasible and worthwhile to invest in getting our unit
tests to finish in under 5 minutes (or something similarly brief) when run
by Jenkins?
Unit tests currently seem to take anywhere from 30 min to 2 hours. As
people add more tests, I imagine this time will only
A common approach is to separate unit tests from integration tests.
Maven has support for this distinction. I'm not sure it helps a lot
though, since it only helps you to not run integration tests all the
time. But lots of Spark tests are integration-test-like and are
important to run to know a
How about using parallel execution feature of maven-surefire-plugin
(assuming all the tests were made parallel friendly) ?
http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html
Cheers
On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen
ScalaTest actually has support for parallelization built-in. We can use
that.
The main challenge is to make sure all the test suites can work in parallel
when running along side each other.
On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu yuzhih...@gmail.com wrote:
How about using parallel execution
Nick,
Would you like to file a ticket to track this?
I think the first baby step is to log the amount of time each test cases
take. This is supposed to happen already (see the flag), but somehow the
time are not showing. If you have some time to figure that out, that'd be
great.
Hi,
I have a running spark app against the released version of 1.0.1. I recently
decided to try and upgrade to the trunk version. Interestingly enough, after
building the 1.1.0-SNAPSHOT assembly, replacing it as my assembly in my app
caused errors. In particular, it seems Kryo serialization
Looks like you didn't actually paste the exception message. Do you mind
doing that?
On Fri, Aug 8, 2014 at 10:14 AM, Reynold Xin r...@databricks.com wrote:
Pasting a better formatted trace:
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
at
Pasting a better formatted trace:
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:137)
at
fwiw, when we did this work in HBase, we categorized the tests. Then some
tests can share a single jvm, while some others need to be isolated in
their own jvm. Nevertheless surefire can still run them in parallel by
starting/stopping several jvm.
Nicolas
On Fri, Aug 8, 2014 at 7:10 PM, Reynold
Oops, exception is below.
For local, it works and that's the case since TorrentBroadcast has if !isLocal,
then that's the only time the broadcast actually happens. It really seems as if
the Kryo wrapper didn't kick in for some reason. Do we have a unit test that
tests the Kryo serialization
Yes, I'm pretty sure it doesn't actually use the right serializer in
TorrentBroadcast:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L232
And TorrentBroadcast is turned on by default for 1.1 right now. Do you want
to submit a
I created a JIRA ticket to track this:
https://issues.apache.org/jira/browse/SPARK-2928
Let me know if you need help with it.
On Fri, Aug 8, 2014 at 10:40 AM, Reynold Xin r...@databricks.com wrote:
Yes, I'm pretty sure it doesn't actually use the right serializer in
TorrentBroadcast:
Actually apparently there is a pull request for it. Thanks for reporting!
https://github.com/apache/spark/pull/1836
On Fri, Aug 8, 2014 at 10:50 AM, Ron Gonzalez zlgonza...@yahoo.com wrote:
Sure let me give it a try. Any tips? I've only started looking at Spark
code more closely recently.
Just as a note, when you're developing stuff, you can use test-only in sbt,
or the equivalent feature in Maven, to run just some of the tests. This is what
I do, I don't wait for Jenkins to run things. 90% of the time if it passes the
tests that I know could break stuff, it will pass all of
I dug around this a bit a while ago, I think if someone sat down and
profiled the tests it's likely we could find some things to optimize.
In particular, there may be overheads in starting up a local spark
context that could be minimized and speed up all the tests. Also,
there are some tests
One simple optimization might be to disable the application web UI in tests
that don’t need it. When running tests on my local machine while also running
another Spark shell, I’ve noticed that the test logs fill up with errors when
the web UI attempts to bind to the default port, fails, and
The API change seems not major. I have locally change it and compiled, but
not test yet. The major problem is still how to solve the hive-exec jar
dependency. I am willing to help on this issue. Is it better stick to the
same way as hive-0.12 until hive-exec is cleaned enough to switch back?
--
I can compile with no error, but my patch also includes other stuff.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Here is the patch. Please ignore the pom.xml related change, which just for
compiling purpose. I need to further work on this one based on Wandou's
previous work.
--
View this message in context:
Sorry, forget to upload files. I have never posted before :) hive.diff
http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html
no one use spark-shell in master branch?
i created a PR as follow up commit of SPARK-2678 and PR #1801:
https://github.com/apache/spark/pull/1861
--
View this message in context:
Could you make a PR as described here:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
On Fri, Aug 8, 2014 at 1:57 PM, Zhan Zhang zhaz...@gmail.com wrote:
Sorry, forget to upload files. I have never posted before :) hive.diff
Attached the diff the PR SPARK-2706. I am currently working on this problem.
If somebody are also working on this, we can share the load.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html
Sent from the
Josh - that was actually fixed recently (we just bind to a random port
when running tests).
On Fri, Aug 8, 2014 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote:
One simple optimization might be to disable the application web UI in tests
that don't need it. When running tests on my local
Just opened a PR based on the branch Patrick mentioned for this issue
https://github.com/apache/spark/pull/1864
On Sat, Aug 9, 2014 at 6:48 AM, Patrick Wendell pwend...@gmail.com wrote:
Cheng Lian also has a fix for this. I've asked him to make a PR - he
is on China time so it probably won't
35 matches
Mail list logo