From: Joseph Torres
Sent: Tuesday, May 1, 2018 1:58:54 PM
To: Ryan Blue
Cc: Thakrar, Jayesh; dev@spark.apache.org
Subject: Re: Datasource API V2 and checkpointing
I agree that Spark should fully handle state serialization and recovery for
most sources. This is how
>>>> needs to be re-run, the execution engine just asks the source for the
>>>>>> same
>>>>>> offset range again. Sources also get a handle to their own subfolder of
>>>>>> the
>>>>>> checkpoint, which they can use as scratch space if they n
ts can be simply indices into the log rather than huge strings
>>>>> containing all the paths.
>>>>>
>>>>> SPARK-23323 is orthogonal. That commit coordinator is responsible for
>>>>> ensuring that, within a single Spark job, two different tasks
in a single Spark job, two different tasks can't commit
>>>> the same partition.
>>>>
>>>> On Fri, Apr 27, 2018 at 8:53 AM, Thakrar, Jayesh <
>>>> jthak...@conversantmedia.com> wrote:
>>>>
>>>>> Wondering if this
, 2018 at 8:53 AM, Thakrar, Jayesh <
>>> jthak...@conversantmedia.com> wrote:
>>>
>>>> Wondering if this issue is related to SPARK-23323?
>>>>
>>>>
>>>>
>>>> Any pointers will be greatly appreciated….
>>>>
PARK-23323?
>>>
>>>
>>>
>>> Any pointers will be greatly appreciated….
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Jayesh
>>>
>>>
>>>
>>> *From: *"Thakrar, Jayesh"
>
on.
>
> On Fri, Apr 27, 2018 at 8:53 AM, Thakrar, Jayesh <
> jthak...@conversantmedia.com> wrote:
>
>> Wondering if this issue is related to SPARK-23323?
>>
>>
>>
>> Any pointers will be greatly appreciated….
>>
>>
>>
>> Thank
Thanks Joseph!
From: Joseph Torres
Date: Friday, April 27, 2018 at 11:23 AM
To: "Thakrar, Jayesh"
Cc: "dev@spark.apache.org"
Subject: Re: Datasource API V2 and checkpointing
The precise interactions with the DataSourceV2 API haven't yet been hammered
out in desig
>
>
>
> Thanks,
>
> Jayesh
>
>
>
> *From: *"Thakrar, Jayesh"
> *Date: *Monday, April 23, 2018 at 9:49 PM
> *To: *"dev@spark.apache.org"
> *Subject: *Datasource API V2 and checkpointing
>
>
>
> I was wondering when checkpointing i
Wondering if this issue is related to SPARK-23323?
Any pointers will be greatly appreciated….
Thanks,
Jayesh
From: "Thakrar, Jayesh"
Date: Monday, April 23, 2018 at 9:49 PM
To: "dev@spark.apache.org"
Subject: Datasource API V2 and checkpointing
I was wondering when chec
I was wondering when checkpointing is enabled, who does the actual work?
The streaming datasource or the execution engine/driver?
I have written a small/trivial datasource that just generates strings.
After enabling checkpointing, I do see a folder being created under the
checkpoint folder, but t
11 matches
Mail list logo