On Thu, Sep 13, 2012 at 12:30 AM, Vladislav Bogdanov <bub...@hoster-ok.com> wrote: > 12.09.2012 10:35, Andrew Beekhof wrote: >> On Tue, Sep 11, 2012 at 6:14 PM, Vladislav Bogdanov >> <bub...@hoster-ok.com> wrote: >>> 07.09.2012 09:25, Vladislav Bogdanov wrote: >>>> 06.09.2012 12:58, Vladislav Bogdanov wrote: >>>> ... >>>>> lrmd seems not to clean up gio channels properly: >>>> >>>> I prefer to call g_io_channel_unref() right after g_io_add_watch_full() >>>> instead of doing so when deleting descriptor (g_source_remove() is >>>> enough there). So channel will be automatically freed after watch fd is >>>> removed from poll with g_source_remove(). If I need to replace watch >>>> flags, I call g_io_channel_ref()/g_io_channel_unref() around >>>> g_source_remove()/g_io_add_watch_full() pair. This would require some >>>> care and testing though. I have spent two days on this when writing my >>>> first non-blocking server based on glib and gio. This code now works >>>> like a charm with approach outlined above. >>> >>> Does it make some sense? >> >> Yes, I just needed to read it a few times before it sunk in :) >> The interactions are pretty complex, before I commit anything I'd like >> to be able to verify the refcounts are correct... how do you find out >> the refcount for these glib objects? > > Frankly speaking just by try-and-fail method, while reading through its > code. glib structures contain ref_count field, it is possible to reach > it with some hacks. > The main point with gio_channel is that both g_io_channel_unix_new() and > g_io_add_watch_full() increment refcount, iirc, it is documented in glib > sources. > > You can send me patches if you prefer some independent testing before > commit. Hopefully I'll be able to test them quickly.
These are the two patches I'm going with (the second builds on the first): + Andrew Beekhof (16 hours ago) 6554777: High: Core: Ensure file descriptors and IPC channels attached to mainloop are correctly cleaned up on disconnection + Andrew Beekhof (15 hours ago) 2a160e9: High: Core: Ensure server-side IPC connections are cleaned up on client exit (HEAD, origin/master, origin/HEAD, master) Both are in my repo (not ClusterLabs yet). They include terrible terrible hacks for displaying the ref_count (only if tracing is enabled) and after reading the glib sources and running some tests through valgrind (testing the graceful and non-graceful termination of both sides), I'm quite confident the combination is correct. > >> >>> >>> g_io_add_watch_full() refs channel, so after channel is created and its >>> fd is put to mainloop with g_io_add_watch_full, channel has 2 in >>> refcount. Thus, it is not freed after g_source_remove() is called once. >>> >>>> >>>>> >>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are >>>>> definitely lost in loss record 147 of 152 >>>>> ==1734== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) >>>>> ==1734== by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x71C67F4: g_io_channel_unix_new (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x4E52470: mainloop_add_fd (mainloop.c:660) >>>>> ==1734== by 0x5067870: services_os_action_execute >>>>> (services_linux.c:456) >>>>> ==1734== by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696) >>>>> ==1734== by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105) >>>>> ==1734== by 0x7190F0D: g_main_context_dispatch (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194D54: g_main_loop_run (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x402427: main (main.c:302) >>>>> ==1734== >>>>> ==1734== 8,946 (8,520 direct, 426 indirect) bytes in 71 blocks are >>>>> definitely lost in loss record 148 of 152 >>>>> ==1734== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) >>>>> ==1734== by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x71C67F4: g_io_channel_unix_new (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x4E52470: mainloop_add_fd (mainloop.c:660) >>>>> ==1734== by 0x50678AE: services_os_action_execute >>>>> (services_linux.c:465) >>>>> ==1734== by 0x403AA6: lrmd_rsc_dispatch (lrmd.c:696) >>>>> ==1734== by 0x4E513C2: crm_trigger_dispatch (mainloop.c:105) >>>>> ==1734== by 0x7190F0D: g_main_context_dispatch (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194D54: g_main_loop_run (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x402427: main (main.c:302) >>>>> ==1734== >>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are >>>>> definitely lost in loss record 151 of 152 >>>>> ==1734== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) >>>>> ==1734== by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x71C67F4: g_io_channel_unix_new (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x4E52470: mainloop_add_fd (mainloop.c:660) >>>>> ==1734== by 0x5067870: services_os_action_execute >>>>> (services_linux.c:456) >>>>> ==1734== by 0x50676B4: recurring_action_timer (services_linux.c:212) >>>>> ==1734== by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7190F0D: g_main_context_dispatch (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194D54: g_main_loop_run (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x402427: main (main.c:302) >>>>> ==1734== >>>>> ==1734== 65,394 (62,280 direct, 3,114 indirect) bytes in 519 blocks are >>>>> definitely lost in loss record 152 of 152 >>>>> ==1734== at 0x4C26FDE: malloc (vg_replace_malloc.c:236) >>>>> ==1734== by 0x71997D2: g_malloc (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x71C67F4: g_io_channel_unix_new (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x4E52470: mainloop_add_fd (mainloop.c:660) >>>>> ==1734== by 0x50678AE: services_os_action_execute >>>>> (services_linux.c:465) >>>>> ==1734== by 0x50676B4: recurring_action_timer (services_linux.c:212) >>>>> ==1734== by 0x719161A: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7190F0D: g_main_context_dispatch (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194937: ??? (in /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x7194D54: g_main_loop_run (in >>>>> /lib64/libglib-2.0.so.0.2200.5) >>>>> ==1734== by 0x402427: main (main.c:302) >>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org