Hi Kishore,
Sorry for the long disappeared. There is no other errors regarding
checkpoint file. This problem occured in our production enviroment, I'll
try to reproduce them but still got no time to do it.
I'll inform you if some thing come up. Thanks again!
2015-08-21 14:15 GMT+08:00 Kishore
Hi Zhao, Do you see any other errors regarding checkpoint file? Is this
reproducible by you and if you can you enable debug log level to get more
info.
On Thu, Aug 20, 2015 at 7:44 AM, Zhao Weinan zhaow...@gmail.com wrote:
Hi Kishore Senji,
I've been busy recovering some data these two
Hi Kishore Senji,
I've been busy recovering some data these two days... and found that I
maybe hit more serious problem than I thought. I lost almost all data on
one broker at least at some time, here is some log from server.log pasted
below, and very like the situation described by Jason and
Hi Kishore Senji,
The size of segement file is default 1GB.
According to the LogManager.scala#cleanupExpiredSegments, Kafka will only
delete segments whose lastModTime is older than retention.ms, so I dont
think this is the reason for my data loss. Actually I lost some data in
topic other than
Yes you are right. I misread the code. So the only thing that can explain
the behavior you are seeing is that may be there are many segments that
need to be deleted all at once. Can you try may be reducing the retention.ms
in smaller intervals - like reduce it to 9 days from 10 days and see if the
Hi Kishore Senji,
Thanks for the reply.
Do you have some suggestions before the fix came up? Try not to modify the
retention.ms? Or disable the auto rebalance? Cause this problem is 100%
reproduceable in my scenario (two times got dead lock in two retention.ms
modification), and I even found
Hi guys,
I got this problem, after changing one topic's config to
retention.ms=86,400,000
from 864,000,000, the brokers start to shedule and do deletions of outdated
index of that topic.
Then for some reason some brokers' connection with zookeeper were expired,
suddenly lots of ERRORs showed up
Interesting problem you ran in to. It seems like this broker was chosen as
the Controller and onControllerFailure() method was called. This will
schedule the checkAndTriggerPartitionRebalance method to execute after 5
seconds (when auto rebalance enabled). In the mean time this broker lost