Not with hadoop but with Cassandra, i have seen 20x data locality improvement on partitioned optimized spark jobs
Le sam. 14 avr. 2018 à 21:17, Mich Talebzadeh <mich.talebza...@gmail.com> a écrit : > Hi, > > This is a sort of your mileage varies type question. > > In a classic Hadoop cluster, one has data locality when each node includes > the Spark libraries and HDFS data. this helps certain queries like > interactive BI. > > However running Spark over remote storage say Isilon scaled out NAS > instead of LOCAL HDFS becomes problematic. The full-scan Spark needs to > do will take much longer when it is done over the network (access the > remote Isilon storage) instead of local I/O request to HDFS. > > Has anyone done some comparative studies on this? > > > Thanks > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >