Sumedh,
How would this work? The only server that we have is the Oozie server with no
resources to run anything except Oozie, and we have no sudo permissions. If we
run the mount command using the shell action which can run on any node of the
cluster via YARN, then the spark job will not be abl
I forgot to mention that we will be scheduling this job using Oozie. So, we
will not be able to know which worker node is going to being running this. If
we try to do anything local, it would get lost. This is why I’m looking for
something that does not deal with the local file system.
> On Mar
On Thursday 03 March 2016 12:47 AM, Benjamin Kim wrote:
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file? I am able
to download the file first locally using the SFTP Client in the spark-sftp package. Then,
I load the file into a dataframe using the spark-csv packag
The Apache Commons library will let you access files on an SFTP server via a
Java library, no local file handling involved
https://commons.apache.org/proper/commons-vfs/filesystems.html
Hope this helps,
Ewan
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file?
I am a
So doing a quick look through the README & code for spark-sftp it seems
that the way this connector works is by downloading the file locally on the
driver program and this is not configurable - so you would probably need to
find a different connector (and you probably shouldn't use spark-sftp for
l