Is there a reason you can't trust the runner to be durable storage for
inprocess work?

I can understand that the DirectRunner only stores things in memory but
other runners have stronger durability guarantees.

On Tue, Aug 21, 2018 at 9:58 AM Raghu Angadi <[email protected]> wrote:

> I think by 'KafkaUnboundedSource checkpointing' you mean enabling
> 'commitOffsetsInFinalize()' on KafkaIO source.
> It is better option than enable.auto.commit, but does not exactly do what
> you want in this moment. It is invoked after the first stage ('Simple
> Transformation' in your case). This is certainly true for Dataflow and I
> think is also the case for DirectRunner.
>
> I don't see way to leverage built-in checkpoint for consistency
> externally. You would have to manually commit offsets.
>
> On Tue, Aug 21, 2018 at 8:55 AM Micah Whitacre <[email protected]>
> wrote:
>
>> I'm starting with a very simple pipeline that will read from Kafka ->
>> Simple Transformation -> GroupByKey -> Persist the data.  We are also
>> applying some simple windowing/triggering that will persist the data after
>> every 100 elements or every 60 seconds to balance slow trickles of data as
>> well as not storing too much in memory.  For now I'm just running with the
>> DirectRunner since this is just a small processing problem.
>>
>> With the potential for failure during the persisting of the data, we want
>> to ensure that the Kafka offsets are not updated until we have successfully
>> persisted the data.  Looking at KafkaIO it seems like our two options for
>> persisting offsets are:
>> * Kafka's enable.auto.commit
>> * KafkaUnboundedSource checkpointing.
>>
>> The first option would commit prematurely before we could guarantee the
>> data was persisted.  I can't unfortunately find many details about the
>> checkpointing so I was wondering if there was a way to configure it or tune
>> it more appropriately.
>>
>> Specifically I'm hoping to understand the flow so I can rely on the built
>> in KafkaIO functionality without having to write our own offset
>> management.  Or is it more common to write your own?
>>
>> Thanks,
>> Micah
>>
>

Reply via email to