I built my Spark Streaming app on my local machine, and an initial step in
log processing is filtering out rows with spam IPs. I use the following
code which works locally:
// Creates a HashSet for badIPs read in from file
val badIpSource = scala.io.Source.fromFile(wrongIPlist.csv)
If the file is not present on each node, it may not find it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Building-a-hash-table-from-a-csv-file-using-yarn-cluster-and-giving-it-to-each-executor-tp18850p18877.html
Sent from the Apache Spark User List