Hi,
We are doing the following to save a dataframe in parquet (using
DirectParquetOutputCommitter) as follows.
dfWriter.format("parquet")
.mode(SaveMode.Overwrite)
.save(outputPath)
The problem is even if an executor fails once while writing file (say some
transient HDFS issue), when its
.
To avoid this, save state in your own data store.
On Sat, Jul 4, 2015 at 9:01 PM, Vinoth Chandar vin...@uber.com wrote:
Hi,
Just looking for some clarity on the below 1.4 documentation.
And restarting from earlier checkpoint information of pre-upgrade code
cannot be done. The checkpoint
Hi,
Just looking for some clarity on the below 1.4 documentation.
And restarting from earlier checkpoint information of pre-upgrade code
cannot be done. The checkpoint information essentially contains serialized
Scala/Java/Python objects and trying to deserialize objects with new,
modified
Hi all,
As I understand from docs and talks, the streaming state is in memory as
RDD (optionally checkpointable to disk). SPARK-2629 hints that this in
memory structure is not indexed efficiently?
I am wondering how my performance would be if the streaming state does not
fit in memory (say 100GB
Thanks for confirming!
On Wed, Apr 1, 2015 at 12:33 PM, Tathagata Das t...@databricks.com wrote:
In the current state yes there will be performance issues. It can be done
much more efficiently and we are working on doing that.
TD
On Wed, Apr 1, 2015 at 7:49 AM, Vinoth Chandar vin