I have large files that need to be imported into hdfs for further spark
processing. Obviously, I can import it in using hadoop fs however, there is
some minor processing that needs to be performed around a few
transformations, stripping the header line, and other such stuff.
I would like to stay
If the file is not present on each node, it may not find it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Building-a-hash-table-from-a-csv-file-using-yarn-cluster-and-giving-it-to-each-executor-tp18850p18877.html
Sent from the Apache Spark User List