Hi, We wrote a spark steaming app that receives file names on HDFS from Kafka and opens them using Hadoop's libraries. The problem with this method is that I'm not utilizing data locality because any worker might open any file without giving precedence to data locality. I can't open the files using sparkContext because it's limited to the driver class.
Is there a way I could open files at runtime and benefit from data locality? Thanks, Daniel