Hi Nimrod,
i am also interested in your first point, what exactly doesn "false alarm" mean.
Today had following scenario, which in my opinion is a false alarm.
Following example:
- Topic contains 'N' Messages
- Spark Streaming application consumed all 'N' messages successfully
- Checkpoints of s
Hi Nimrod,
i am also interested in your first point, what exactly doesn "false alarm"
mean.
Today had following scenario, which in my opinion is a false alarm.
Following example:
- Topic contains 'N' Messages
- Spark Streaming application consumed all 'N' messages successfully
- Checkpoints of s
1. I think false alarm in this context means you are ok to loose data like
in Dev and Test envs.
2. Not sure
3. Sorry not sure again but guess would be during your failover checkpoint
got out of sync
Sorry, that is all I used this feature for. If you think you can smoothly
fail over to other clust
Thanks Khalid,
Some follow ups:
1. I'm still unsure what will be "false alarms"
2. When there is data loss on some partitions - will that lead to all
partitions to get reset?
3. I had an occurrence - that I set failOnDataloss to false, I set
policy to earliest (which was about 24 h
I use this option in development environments where jobs are not actively
running and Kafka topic has retention policy on. Meaning when a streaming
job runs it may find that the last offset it read is not there anymore and
in this case it falls back to starting position (i.e. earliest or latest)
sp
Hi everyone,
I'm currently working with Spark Structured Streaming integrated with Kafka
and had some questions regarding the failOnDataLoss option.
The current documentation states:
*"Whether to fail the query when it's possible that data is lost (e.g.,
topics are deleted, or offsets are out of