Hello,
I did not understand very well your question.
However, I can tell you that if you do .collect() on a RDD you are
collecting all the data in the driver node. For this reason, you should use
it only when the RDD is very small.
Your function "validate_hostname" depends on a DataFrame. It's not
Hi,
My Spark app is mapping lines from a text file to case classes stored within an
RDD.
When I run the following code on this rdd:
.collect.map(line => if(validate_hostname(line, data_frame))
line).foreach(println)
It correctly calls the method validate_hostname by passing the case class and