Re: Possible DEAD LOCK for one day at broker controller?

2015-09-12 Thread Zhao Weinan
Hi Kishore, Sorry for the long disappeared. There is no other errors regarding checkpoint file. This problem occured in our production enviroment, I'll try to reproduce them but still got no time to do it. I'll inform you if some thing come up. Thanks again! 2015-08-21 14:15 GMT+08:00 Kishore

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-21 Thread Kishore Senji
Hi Zhao, Do you see any other errors regarding checkpoint file? Is this reproducible by you and if you can you enable debug log level to get more info.​ On Thu, Aug 20, 2015 at 7:44 AM, Zhao Weinan zhaow...@gmail.com wrote: Hi Kishore Senji, I've been busy recovering some data these two

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-20 Thread Zhao Weinan
Hi Kishore Senji, I've been busy recovering some data these two days... and found that I maybe hit more serious problem than I thought. I lost almost all data on one broker at least at some time, here is some log from server.log pasted below, and very like the situation described by Jason and

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-18 Thread Zhao Weinan
Hi Kishore Senji, The size of segement file is default 1GB. According to the LogManager.scala#cleanupExpiredSegments, Kafka will only delete segments whose lastModTime is older than retention.ms, so I dont think this is the reason for my data loss. Actually I lost some data in topic other than

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-18 Thread Kishore Senji
Yes you are right. I misread the code. So the only thing that can explain the behavior you are seeing is that may be there are many segments that need to be deleted all at once. Can you try may be reducing the retention.ms in smaller intervals - like reduce it to 9 days from 10 days and see if the

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-17 Thread Zhao Weinan
Hi Kishore Senji, Thanks for the reply. Do you have some suggestions before the fix came up? Try not to modify the retention.ms? Or disable the auto rebalance? Cause this problem is 100% reproduceable in my scenario (two times got dead lock in two retention.ms modification), and I even found

Possible DEAD LOCK for one day at broker controller?

2015-08-16 Thread Zhao Weinan
Hi guys, I got this problem, after changing one topic's config to retention.ms=86,400,000 from 864,000,000, the brokers start to shedule and do deletions of outdated index of that topic. Then for some reason some brokers' connection with zookeeper were expired, suddenly lots of ERRORs showed up

Re: Possible DEAD LOCK for one day at broker controller?

2015-08-16 Thread Kishore Senji
Interesting problem you ran in to. It seems like this broker was chosen as the Controller and onControllerFailure() method was called. This will schedule the checkAndTriggerPartitionRebalance method to execute after 5 seconds (when auto rebalance enabled). In the mean time this broker lost