Welcoming two new committers

2014-08-08 Thread Matei Zaharia
Hi everyone, The PMC recently voted to add two new committers and PMC members: Joey Gonzalez and Andrew Or. Both have been huge contributors in the past year -- Joey on much of GraphX as well as quite a bit of the initial work in MLlib, and Andrew on Spark Core. Join me in welcoming them as

Re: Welcoming two new committers

2014-08-08 Thread Christopher Nguyen
+1 Joey Andrew :) -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez jegon...@eecs.berkeley.edu wrote: Hi Everyone, Thank you for inviting me to be a committer. I look forward to

Re: Fine-Grained Scheduler on Yarn

2014-08-08 Thread Sandy Ryza
Hi Jun, Spark currently doesn't have that feature, i.e. it aims for a fixed number of executors per application regardless of resource usage, but it's definitely worth considering. We could start more executors when we have a large backlog of tasks and shut some down when we're underutilized.

Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-08 Thread Debasish Das
Hi Patrick, I am testing the 1.1 branch but I see lot of protobuf warnings while building the jars: [warn] Class com.google.protobuf.Parser not found - continuing with a stub. [warn] Class com.google.protobuf.Parser not found - continuing with a stub. [warn] Class com.google.protobuf.Parser

Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-08 Thread Prashant Sharma
Yeah this should be changed. You can change the banner in the repl, printWelcome function. Mind sending a PR ? I think this should be a one place change in the future (Not sure how feasible it is). Volunteers ? Prashant Sharma On Fri, Aug 8, 2014 at 12:48 PM, Debasish Das

?????? [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-08 Thread witgo
Need a parameter -Phadoop-2.3 eg: ./make-distribution.sh -Dhadoop.version=2.3.0-cdh5.0.2 -Dyarn.version=2.3.0-cdh5.0.2 -Phadoop-2.3 -Pyarn -- -- ??: Debasish Dasdebasish.da...@gmail.com; : 2014??8??8??(??) 3:18 ??:

Re: Fine-Grained Scheduler on Yarn

2014-08-08 Thread Jun Feng Liu
Yes, I think we need both level resource control (container numbers and dynamically change container resources), which can make the resource utilization much more effective, especially when we have more types work load share the same infrastructure. Is there anyway I can observe the tasks

Re: Fine-Grained Scheduler on Yarn

2014-08-08 Thread Sandy Ryza
I think that would be useful work. I don't know the minute details of this code, but in general TaskSchedulerImpl keeps track of pending tasks. Tasks are organized into TaskSets, each of which corresponds to a particular stage. Each TaskSet has a TaskSetManager, which directly tracks the

Re: Welcoming two new committers

2014-08-08 Thread Xiangrui Meng
Congrats, Joey Andrew!! -Xiangrui On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote: +1 Joey Andrew :) -- Christopher T. Nguyen Co-founder CEO, Adatao http://adatao.com [ah-'DAY-tao] linkedin.com/in/ctnguyen On Thu, Aug 7, 2014 at 10:39 PM, Joseph Gonzalez

Re: Welcoming two new committers

2014-08-08 Thread Prashant Sharma
Congratulations Andrew and Joey. Prashant Sharma On Fri, Aug 8, 2014 at 2:10 PM, Xiangrui Meng men...@gmail.com wrote: Congrats, Joey Andrew!! -Xiangrui On Fri, Aug 8, 2014 at 12:14 AM, Christopher Nguyen c...@adatao.com wrote: +1 Joey Andrew :) -- Christopher T. Nguyen

Unit tests in 5 minutes

2014-08-08 Thread Nicholas Chammas
Howdy, Do we think it's both feasible and worthwhile to invest in getting our unit tests to finish in under 5 minutes (or something similarly brief) when run by Jenkins? Unit tests currently seem to take anywhere from 30 min to 2 hours. As people add more tests, I imagine this time will only

Re: Unit tests in 5 minutes

2014-08-08 Thread Sean Owen
A common approach is to separate unit tests from integration tests. Maven has support for this distinction. I'm not sure it helps a lot though, since it only helps you to not run integration tests all the time. But lots of Spark tests are integration-test-like and are important to run to know a

Re: Unit tests in 5 minutes

2014-08-08 Thread Ted Yu
How about using parallel execution feature of maven-surefire-plugin (assuming all the tests were made parallel friendly) ? http://maven.apache.org/surefire/maven-surefire-plugin/examples/fork-options-and-parallel-execution.html Cheers On Fri, Aug 8, 2014 at 9:14 AM, Sean Owen

Re: Unit tests in 5 minutes

2014-08-08 Thread Reynold Xin
ScalaTest actually has support for parallelization built-in. We can use that. The main challenge is to make sure all the test suites can work in parallel when running along side each other. On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu yuzhih...@gmail.com wrote: How about using parallel execution

Re: Unit tests in 5 minutes

2014-08-08 Thread Reynold Xin
Nick, Would you like to file a ticket to track this? I think the first baby step is to log the amount of time each test cases take. This is supposed to happen already (see the flag), but somehow the time are not showing. If you have some time to figure that out, that'd be great.

1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Ron Gonzalez
Hi, I have a running spark app against the released version of 1.0.1. I recently decided to try and upgrade to the trunk version. Interestingly enough, after building the 1.1.0-SNAPSHOT assembly, replacing it as my assembly in my app caused errors. In particular, it seems Kryo serialization

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Reynold Xin
Looks like you didn't actually paste the exception message. Do you mind doing that? On Fri, Aug 8, 2014 at 10:14 AM, Reynold Xin r...@databricks.com wrote: Pasting a better formatted trace: at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180) at

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Reynold Xin
Pasting a better formatted trace: at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:137) at

Re: Unit tests in 5 minutes

2014-08-08 Thread Nicolas Liochon
fwiw, when we did this work in HBase, we categorized the tests. Then some tests can share a single jvm, while some others need to be isolated in their own jvm. Nevertheless surefire can still run them in parallel by starting/stopping several jvm. Nicolas On Fri, Aug 8, 2014 at 7:10 PM, Reynold

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Ron Gonzalez
Oops, exception is below. For local, it works and that's the case since TorrentBroadcast has if !isLocal, then that's the only time the broadcast actually happens. It really seems as if the Kryo wrapper didn't kick in for some reason. Do we have a unit test that tests the Kryo serialization

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Reynold Xin
Yes, I'm pretty sure it doesn't actually use the right serializer in TorrentBroadcast: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L232 And TorrentBroadcast is turned on by default for 1.1 right now. Do you want to submit a

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Reynold Xin
I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-2928 Let me know if you need help with it. On Fri, Aug 8, 2014 at 10:40 AM, Reynold Xin r...@databricks.com wrote: Yes, I'm pretty sure it doesn't actually use the right serializer in TorrentBroadcast:

Re: 1.1.0-SNAPSHOT possible regression

2014-08-08 Thread Reynold Xin
Actually apparently there is a pull request for it. Thanks for reporting! https://github.com/apache/spark/pull/1836 On Fri, Aug 8, 2014 at 10:50 AM, Ron Gonzalez zlgonza...@yahoo.com wrote: Sure let me give it a try. Any tips? I've only started looking at Spark code more closely recently.

Re: Unit tests in 5 minutes

2014-08-08 Thread Matei Zaharia
Just as a note, when you're developing stuff, you can use test-only in sbt, or the equivalent feature in Maven, to run just some of the tests. This is what I do, I don't wait for Jenkins to run things. 90% of the time if it passes the tests that I know could break stuff, it will pass all of

Re: Unit tests in 5 minutes

2014-08-08 Thread Patrick Wendell
I dug around this a bit a while ago, I think if someone sat down and profiled the tests it's likely we could find some things to optimize. In particular, there may be overheads in starting up a local spark context that could be minimized and speed up all the tests. Also, there are some tests

Re: Unit tests in 5 minutes

2014-08-08 Thread Josh Rosen
One simple optimization might be to disable the application web UI in tests that don’t need it.  When running tests on my local machine while also running another Spark shell, I’ve noticed that the test logs fill up with errors when the web UI attempts to bind to the default port, fails, and

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
The API change seems not major. I have locally change it and compiled, but not test yet. The major problem is still how to solve the hive-exec jar dependency. I am willing to help on this issue. Is it better stick to the same way as hive-0.12 until hive-exec is cleaned enough to switch back? --

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
I can compile with no error, but my patch also includes other stuff. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7775.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Here is the patch. Please ignore the pom.xml related change, which just for compiling purpose. I need to further work on this one based on Wandou's previous work. -- View this message in context:

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Sorry, forget to upload files. I have never posted before :) hive.diff http://apache-spark-developers-list.1001551.n3.nabble.com/file/n/hive.diff -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p.html

Re: spark-shell is broken! (bad option: '--master')

2014-08-08 Thread chutium
no one use spark-shell in master branch? i created a PR as follow up commit of SPARK-2678 and PR #1801: https://github.com/apache/spark/pull/1861 -- View this message in context:

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Michael Armbrust
Could you make a PR as described here: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Fri, Aug 8, 2014 at 1:57 PM, Zhan Zhang zhaz...@gmail.com wrote: Sorry, forget to upload files. I have never posted before :) hive.diff

Re: Working Formula for Hive 0.13?

2014-08-08 Thread Zhan Zhang
Attached the diff the PR SPARK-2706. I am currently working on this problem. If somebody are also working on this, we can share the load. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p7782.html Sent from the

Re: Unit tests in 5 minutes

2014-08-08 Thread Patrick Wendell
Josh - that was actually fixed recently (we just bind to a random port when running tests). On Fri, Aug 8, 2014 at 12:00 PM, Josh Rosen rosenvi...@gmail.com wrote: One simple optimization might be to disable the application web UI in tests that don't need it. When running tests on my local

Re: spark-shell is broken! (bad option: '--master')

2014-08-08 Thread Cheng Lian
Just opened a PR based on the branch Patrick mentioned for this issue https://github.com/apache/spark/pull/1864 On Sat, Aug 9, 2014 at 6:48 AM, Patrick Wendell pwend...@gmail.com wrote: Cheng Lian also has a fix for this. I've asked him to make a PR - he is on China time so it probably won't