Hi Mark,
On Wed, Jul 22, 2020 at 4:49 PM Mark Bidewell wrote:
>
> Sorry if this is the wrong place for this. I am trying to debug an issue
> with this library:
> https://github.com/springml/spark-sftp
>
> When I attempt to create a dataframe:
>
> spark.read.
> format("com.springml.s
You'd probably do best to ask that project, but scanning the source
code, that looks like it's how it's meant to work. It downloads to a
temp file on the driver then copies to distributed storage then
returns a DataFrame for that. I can't see how it would be implemented
directly over sftp as there
Sorry if this is the wrong place for this. I am trying to debug an issue
with this library:
https://github.com/springml/spark-sftp
When I attempt to create a dataframe:
spark.read.
format("com.springml.spark.sftp").
option("host", "...").
option("username", ".
Hi all,
I am new to the Spark community. Please ignore if this question doesn't make
sense.
My PySpark Dataframe is just taking a fraction of time (in ms) in 'Sorting',
but moving data is much expensive (> 14 sec).
Explanation:
I have a huge Arrow RecordBatches collection which is equally