Just getting started with Spark, so hopefully this is all there and I just haven't found it yet.
I have a driver pgm on my client machine, I can use addFiles to distribute files to the remote worker nodes of the cluster. They are there to be found by my code running in the executors, so al is good. But ... 1) it also makes a copy on the local machine - is there a way to identify this isn't needed? I only need it on the cluster. - if I sent a .tar file, it unzips it for me, which is nice, but again, extra work on the client machine when I'm not using it there. 2) it copies the files to the spark_installdir/work/. - that's fine, I suppose. though is there any way to designate a location? 3) they don't get cleaned up - I don't see anything ever getting removed from the work/. location; just keeps adding up - there was a cleanFiles() call, but I don't know that it cleaned up rather than just stopped copying anymore (how it was doc'ed). But, this is deprecated now so it's moot anyway. - is there a removeFiles() call to clean up? What's the expected use case? How does my code manually clean up; permission issues if I try? Again, I searched the archives but didn't see any of this, but I'm just getting started so may very well be missing this somewhere. Thanks! Tom