Hi,
We wrote a spark steaming app that receives file names on HDFS from Kafka
and opens them using Hadoop's libraries.
The problem with this method is that I'm not utilizing data locality
because any worker might open any file without giving precedence to data
locality.
I can't open the files using sparkContext because it's limited to the
driver class.

Is there a way I could open files at runtime and benefit from data locality?

Thanks,
Daniel

Reply via email to