https://bugzilla.wikimedia.org/show_bug.cgi?id=28144
--- Comment #6 from Tim Starling <[email protected]> 2011-03-22 06:20:04 UTC --- Since I wasn't making much progress finding the bug by code review, I decided to have a crack at debugging the running process with gdb, despite the lack of symbols. I've determined the following: * The hashtable hasn't grown, it still has hashpower=16. * There are 134 entries still in the hashtable, so this wasn't an isolated case. * I looked at three entries, they all had count = processing = 2. * By subtracting an appropriate offset from the address of the locks structure, I could look at the client_data structure. For one of the hashtable entries, there were two clients, with FDs 387 and 466. lsof says: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME poolcount 1838 poolcounter 387u sock 0,6 0t0 64556254 can't identify protocol poolcount 1838 poolcounter 466u sock 0,6 0t0 64552887 can't identify protocol There are 275 FDs which give "can't identify protocol", which is suspiciously close to double the number of hashtable entries. Maybe these FDs were closed on the remote side, but poolcounterd never called close() on them. That wouldn't be a surprising scenario, since the structures seem to indicating that free_client_data() was never called on them, and on_client() never calls close() without first calling free_client_data(). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
