I forgot to mention that we will be scheduling this job using Oozie. So, we will not be able to know which worker node is going to being running this. If we try to do anything local, it would get lost. This is why I’m looking for something that does not deal with the local file system.
> On Mar 2, 2016, at 11:17 AM, Benjamin Kim <bbuil...@gmail.com> wrote: > > I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV > file? I am able to download the file first locally using the SFTP Client in > the spark-sftp package. Then, I load the file into a dataframe using the > spark-csv package, which automatically decompresses the file. I just want to > remove the "downloading file to local" step and directly have the remote file > decompressed, read, and loaded. Can someone give me any hints? > > Thanks, > Ben > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org