Re: Performance of Spark when the compute and storage are separated

vincent gromakowski Sat, 14 Apr 2018 13:07:00 -0700

Not with hadoop but with Cassandra, i have seen 20x data locality
improvement on partitioned optimized spark jobs


Le sam. 14 avr. 2018 à 21:17, Mich Talebzadeh <mich.talebza...@gmail.com> a
écrit :

> Hi,
>
> This is a sort of your mileage varies type question.
>
> In a classic Hadoop cluster, one has data locality when each node includes
> the Spark libraries and HDFS data. this helps certain queries like
> interactive BI.
>
> However running Spark over remote storage say Isilon scaled out NAS
> instead of LOCAL HDFS becomes problematic. The full-scan Spark needs to
> do will take much longer when it is done over the network (access the
> remote Isilon storage) instead of local I/O request to HDFS.
>
> Has anyone done some comparative studies on this?
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Re: Performance of Spark when the compute and storage are separated

Reply via email to