I see, I wasn’t sure if that would work as expected. The docs seems to
suggest to be careful before turning off that option, and I’m not sure why
failOnDataLoss is true by default.

On Tue, Apr 14, 2020 at 5:16 PM Burak Yavuz <brk...@gmail.com> wrote:

> Just set `failOnDataLoss=false` as an option in readStream?
>
> On Tue, Apr 14, 2020 at 4:33 PM Ruijing Li <liruijin...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a spark structured streaming app that is consuming from a kafka
>> topic with retention set up. Sometimes I face an issue where my query has
>> not finished processing a message but the retention kicks in and deletes
>> the offset, which since I use the default setting of “failOnDataLoss=true”
>> causes my query to fail. The solution I currently have is manual, deleting
>> the offsets directory and rerunning.
>>
>> I instead like to have spark automatically fall back to the earliest
>> offset available. The solutions I saw recommend setting auto.offset =
>> earliest, but for structured streaming, you cannot set that. How do I do
>> this for structured streaming?
>>
>> Thanks!
>> --
>> Cheers,
>> Ruijing Li
>>
> --
Cheers,
Ruijing Li

Reply via email to