How to access line fileName in loading file using the textFile method

Soheil Pourbafrani Mon, 24 Sep 2018 05:54:35 -0700

Hi, My text data are in the form of text file. In the processing logic, I
need to know each word is from which file. Actually, I need to tokenize the
words and create the pair of <fileName, word>. The naive solution is to
call sc.textFile for each file and having the fileName in a variable,
create the pairs, but it's not efficient and I got the StackOverflow error
as dataset grew.


So my question is supposing all files are in a directory and I read then
using sc.textFile("path/*"), how can I understand each data is for which
file?

Is it possible (and needed) to customize the textFile method?

How to access line fileName in loading file using the textFile method

Reply via email to