DataFrameWriter.save fails job with one executor failure

2016-03-25 Thread Vinoth Chandar
Hi, We are doing the following to save a dataframe in parquet (using DirectParquetOutputCommitter) as follows. dfWriter.format("parquet") .mode(SaveMode.Overwrite) .save(outputPath) The problem is even if an executor fails once while writing file (say some transient HDFS issue), when its

Re: Restarting Spark Streaming Application with new code

2015-07-08 Thread Vinoth Chandar
. To avoid this, save state in your own data store. On Sat, Jul 4, 2015 at 9:01 PM, Vinoth Chandar vin...@uber.com wrote: Hi, Just looking for some clarity on the below 1.4 documentation. And restarting from earlier checkpoint information of pre-upgrade code cannot be done. The checkpoint

Restarting Spark Streaming Application with new code

2015-07-04 Thread Vinoth Chandar
Hi, Just looking for some clarity on the below 1.4 documentation. And restarting from earlier checkpoint information of pre-upgrade code cannot be done. The checkpoint information essentially contains serialized Scala/Java/Python objects and trying to deserialize objects with new, modified

Size of arbitrary state managed via DStream updateStateByKey

2015-04-01 Thread Vinoth Chandar
Hi all, As I understand from docs and talks, the streaming state is in memory as RDD (optionally checkpointable to disk). SPARK-2629 hints that this in memory structure is not indexed efficiently? I am wondering how my performance would be if the streaming state does not fit in memory (say 100GB

Re: Size of arbitrary state managed via DStream updateStateByKey

2015-04-01 Thread Vinoth Chandar
Thanks for confirming! On Wed, Apr 1, 2015 at 12:33 PM, Tathagata Das t...@databricks.com wrote: In the current state yes there will be performance issues. It can be done much more efficiently and we are working on doing that. TD On Wed, Apr 1, 2015 at 7:49 AM, Vinoth Chandar vin