Re: Datasource API V2 and checkpointing

2018-05-01 Thread Thakrar, Jayesh
From: Joseph Torres Sent: Tuesday, May 1, 2018 1:58:54 PM To: Ryan Blue Cc: Thakrar, Jayesh; dev@spark.apache.org Subject: Re: Datasource API V2 and checkpointing I agree that Spark should fully handle state serialization and recovery for most sources. This is how

Re: Datasource API V2 and checkpointing

2018-05-01 Thread Joseph Torres
>>>> needs to be re-run, the execution engine just asks the source for the >>>>>> same >>>>>> offset range again. Sources also get a handle to their own subfolder of >>>>>> the >>>>>> checkpoint, which they can use as scratch space if they n

Re: Datasource API V2 and checkpointing

2018-05-01 Thread Ryan Blue
ts can be simply indices into the log rather than huge strings >>>>> containing all the paths. >>>>> >>>>> SPARK-23323 is orthogonal. That commit coordinator is responsible for >>>>> ensuring that, within a single Spark job, two different tasks

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres
in a single Spark job, two different tasks can't commit >>>> the same partition. >>>> >>>> On Fri, Apr 27, 2018 at 8:53 AM, Thakrar, Jayesh < >>>> jthak...@conversantmedia.com> wrote: >>>> >>>>> Wondering if this

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue
, 2018 at 8:53 AM, Thakrar, Jayesh < >>> jthak...@conversantmedia.com> wrote: >>> >>>> Wondering if this issue is related to SPARK-23323? >>>> >>>> >>>> >>>> Any pointers will be greatly appreciated…. >>>>

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres
PARK-23323? >>> >>> >>> >>> Any pointers will be greatly appreciated…. >>> >>> >>> >>> Thanks, >>> >>> Jayesh >>> >>> >>> >>> *From: *"Thakrar, Jayesh" >

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue
on. > > On Fri, Apr 27, 2018 at 8:53 AM, Thakrar, Jayesh < > jthak...@conversantmedia.com> wrote: > >> Wondering if this issue is related to SPARK-23323? >> >> >> >> Any pointers will be greatly appreciated…. >> >> >> >> Thank

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Thanks Joseph! From: Joseph Torres Date: Friday, April 27, 2018 at 11:23 AM To: "Thakrar, Jayesh" Cc: "dev@spark.apache.org" Subject: Re: Datasource API V2 and checkpointing The precise interactions with the DataSourceV2 API haven't yet been hammered out in desig

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Joseph Torres
> > > > Thanks, > > Jayesh > > > > *From: *"Thakrar, Jayesh" > *Date: *Monday, April 23, 2018 at 9:49 PM > *To: *"dev@spark.apache.org" > *Subject: *Datasource API V2 and checkpointing > > > > I was wondering when checkpointing i

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Wondering if this issue is related to SPARK-23323? Any pointers will be greatly appreciated…. Thanks, Jayesh From: "Thakrar, Jayesh" Date: Monday, April 23, 2018 at 9:49 PM To: "dev@spark.apache.org" Subject: Datasource API V2 and checkpointing I was wondering when chec

Datasource API V2 and checkpointing

2018-04-23 Thread Thakrar, Jayesh
I was wondering when checkpointing is enabled, who does the actual work? The streaming datasource or the execution engine/driver? I have written a small/trivial datasource that just generates strings. After enabling checkpointing, I do see a folder being created under the checkpoint folder, but t