Re: SPARK-20325 - Spark Structured Streaming documentation Update: checkpoint configuration

2017-04-14 Thread Katherin Eri
Thank you your reply, I will open pull request for this doc issue. The
logic is clear.

пт, 14 апр. 2017, 23:34 Michael Armbrust :

> 1)  could we update documentation for Structured Streaming and describe
>> that checkpointing could be specified by
>> spark.sql.streaming.checkpointLocation on SparkSession level and thus
>> automatically checkpoint dirs will be created per foreach query?
>>
>>
> Sure, please open a pull request.
>
>
>> 2) Do we really need to specify the checkpoint dir per query? what the
>> reason for this? finally we will be forced to write some checkpointDir name
>> generator, for example associate it with some particular named query and so
>> on?
>>
>
> Every query needs to have a unique checkpoint as this is how we track what
> has been processed.  If we don't have this, we can't restart the query
> where it left off.  In you example, I would suggest including the metric
> name in the checkpoint location path.
>
-- 

*Yours faithfully, *

*Kate Eri.*


Re: SPARK-20325 - Spark Structured Streaming documentation Update: checkpoint configuration

2017-04-14 Thread Michael Armbrust
>
> 1)  could we update documentation for Structured Streaming and describe
> that checkpointing could be specified by 
> spark.sql.streaming.checkpointLocation
> on SparkSession level and thus automatically checkpoint dirs will be
> created per foreach query?
>
>
Sure, please open a pull request.


> 2) Do we really need to specify the checkpoint dir per query? what the
> reason for this? finally we will be forced to write some checkpointDir name
> generator, for example associate it with some particular named query and so
> on?
>

Every query needs to have a unique checkpoint as this is how we track what
has been processed.  If we don't have this, we can't restart the query
where it left off.  In you example, I would suggest including the metric
name in the checkpoint location path.