Are you using SSD? We found that the bottleneck is not computational, but disk IO. When assembly, sbt is moving lots of class files, jars, and packaging them into a single flat jar. I can do assembly in my macbook in 10mins while before upgrading to SSD, it took 30~40mins.
Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Fri, Apr 25, 2014 at 12:53 PM, Williams, Ken <[email protected] > wrote: > I’ve cloned the github repo and I’m building Spark on a pretty beefy > machine (24 CPUs, 78GB of RAM) and it takes a pretty long time. > > > > For instance, today I did a ‘git pull’ for the first time in a week or > two, and then doing ‘sbt/sbt assembly’ took 43 minutes of wallclock time > (88 minutes of CPU time). After that, I did ‘SPARK_HADOOP_VERSION=2.2.0 > SPARK_YARN=true sbt/sbt assembly’ and that took 25 minutes wallclock, 73 > minutes CPU. > > > > Is that typical? Or does that indicate some setup problem in my > environment? > > > > -- > > Ken Williams, Senior Research Scientist > > *Wind**Logics* > > http://windlogics.com > > > > ------------------------------ > > CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution of > any kind is strictly prohibited. If you are not the intended recipient, > please contact the sender via reply e-mail and destroy all copies of the > original message. Thank you. >
