Re: Spark performance optimization

Andrew Ash Mon, 24 Feb 2014 22:47:22 -0800

Have you tried using a standalone spark cluster vs a YARN one?  I get the
impression that standalone responses are faster (the JVMs are already all
running) but haven't done any rigorous testing (and have only used
standalone so far).



On Mon, Feb 24, 2014 at 10:43 PM, polkosity <polkos...@gmail.com> wrote:

> As mentioned in a previous post, I have an application which relies on a
> quick response.  The application matches a client's image against a set of
> stored images.  Image features are stored in a SequenceFile and passed over
> JNI to match in OpenCV, along with the features for the client's image.  An
> id for the matched image is returned.
>
> I was using Hadoop 1.2.1 and achieved some pretty good results, but the job
> initialization was taking about 15 seconds, and we'd hoped to have a
> response in ~5 seconds.  So we moved to Hadoop 2.2, YARN & Spark.  Sadly,
> job initialization is still taking over 10 seconds (on a cluster of 10 EC2
> m1.large).
>
> Any suggestions on what I can do to bring this initialization time down?
>
> Once the executors begin work, the performance is quite good, but any
> general performance optimization tips also welcome!
>
> Thanks.
> - Dan
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-performance-optimization-tp2017.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: Spark performance optimization

Reply via email to