Hey,
In spark-shell, I'm doing:
val s3 = // connection to s3 using aws-java-sdk
val mapping: Map[String, String] = {
// use s3 to load file, create a plain map
}
val rdd = sc.loadSomeData().map {
// use mapping local var, but *not* s3
}
rdd.count()
This blows up with "Task not serializable" on the AmazonS3Client.
My RDD's map closure really does not reference s3, so I am assuming
that all of the REPL's variables are being pulled along for the ride,
whether I use them or not.
I'm pretty sure this used to work, pre-0.9. Is this a known issue?
Anything I can do to get around it?
Thanks!
- Stephen