[Gluster-devel] rdma: misuse of ibv_ack_cq_events api
Hello folks, I think ibv_ack_cq_events may be used incorrectly in gf_rdma_recv_completion_proc and gf_rdma_send_completion_proc functions in rpc/rpc-transport/rdma/src/rdma.c. Function gf_rdma_recv_completion_proc calls ibv_ack_cq_events(event_cq, num_wr) before it returns, where the num_wr is the number of work completions returned by ibv_poll_cq function. Therefore, this function regards the number of work completions(ibv_wc) as the number of completion queue events. But if I have understood correctly, completion queue events are produced by ibv_get_cq_event instead of ibv_poll_cq. So after ibv_get_cq_event producing an event, users should call `ibv_ack_cq_events(event_cq, 1)` to ack this event. Looking forward to your reply. Thanks. Lorne -- Here is example code of `ibv_get_cq_event` man page, which supports my opinion. /* https://linux.die.net/man/3/ibv_get_cq_event */ /* Wait for the completion event */ if (ibv_get_cq_event(channel, _cq, _ctx)) { fprintf(stderr, "Failed to get cq_event\n"); return 1; } /* Ack the event */ ibv_ack_cq_events(ev_cq, 1); /* Request notification upon the next completion event */ if (ibv_req_notify_cq(ev_cq, 0)) { fprintf(stderr, "Couldn't request CQ notification\n"); return 1; } ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fedora smoke failure on 3.12
On Wed, 2018-09-05 at 16:08 +0530, Anoop C S wrote: > On Wed, 2018-09-05 at 15:44 +0530, Pranith Kumar Karampuri wrote: > > It also failed on 4.1 > > https://build.gluster.org/job/fedora-smoke/1665/console > > > > Looks like quite a few changes need to be ported for them to pass? > > https://bugzilla.redhat.com/show_bug.cgi?id=1625489 was opened to track this > change in behaviour. This is fixed now. You can recheck to verify that fedora-smoke is skipped(while voting) for release branches. > > On Wed, Sep 5, 2018 at 3:41 PM Pranith Kumar Karampuri > > wrote: > > > https://build.gluster.org/job/fedora-smoke/1668/console > > > > > > I think it is happening because of missing tirpc changes in configure.ac? > > > There are a series > > > of > > > patches for libtirpc starting with > > > https://review.gluster.org/c/glusterfs/+/19235, I am not > > > very > > > good at reading configure.ac except for the straightforward ones. I would > > > need help making > > > sure > > > 3.12 works fine. Could one of you help please? > > > > > > -- > > > Pranith > > > > ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fedora smoke failure on 3.12
Thanks a lot! On Wed, Sep 5, 2018 at 4:55 PM Anoop C S wrote: > On Wed, 2018-09-05 at 16:08 +0530, Anoop C S wrote: > > On Wed, 2018-09-05 at 15:44 +0530, Pranith Kumar Karampuri wrote: > > > It also failed on 4.1 > https://build.gluster.org/job/fedora-smoke/1665/console > > > > > > Looks like quite a few changes need to be ported for them to pass? > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1625489 was opened to track > this change in behaviour. > > This is fixed now. You can recheck to verify that fedora-smoke is > skipped(while voting) for release > branches. > > > > On Wed, Sep 5, 2018 at 3:41 PM Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > > > > https://build.gluster.org/job/fedora-smoke/1668/console > > > > > > > > I think it is happening because of missing tirpc changes in > configure.ac? There are a series > > > > of > > > > patches for libtirpc starting with > https://review.gluster.org/c/glusterfs/+/19235, I am not > > > > very > > > > good at reading configure.ac except for the straightforward ones. I > would need help making > > > > sure > > > > 3.12 works fine. Could one of you help please? > > > > > > > > -- > > > > Pranith > > > > > > > > -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fedora smoke failure on 3.12
It also failed on 4.1 https://build.gluster.org/job/fedora-smoke/1665/console Looks like quite a few changes need to be ported for them to pass? On Wed, Sep 5, 2018 at 3:41 PM Pranith Kumar Karampuri wrote: > https://build.gluster.org/job/fedora-smoke/1668/console > > I think it is happening because of missing tirpc changes in configure.ac? > There are a series of patches for libtirpc starting with > https://review.gluster.org/c/glusterfs/+/19235, I am not very good at > reading configure.ac except for the straightforward ones. I would need > help making sure 3.12 works fine. Could one of you help please? > > -- > Pranith > -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] fedora smoke failure on 3.12
https://build.gluster.org/job/fedora-smoke/1668/console I think it is happening because of missing tirpc changes in configure.ac? There are a series of patches for libtirpc starting with https://review.gluster.org/c/glusterfs/+/19235, I am not very good at reading configure.ac except for the straightforward ones. I would need help making sure 3.12 works fine. Could one of you help please? -- Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] remaining entry in gluster volume heal info command even after reboot
Looks like the test case is a bit involved and also has modifications directly on the brick. Could you let us know if there is any reason to touch the brick directly? On Wed, Sep 5, 2018 at 2:53 PM Zhou, Cynthia (NSB - CN/Hangzhou) < cynthia.z...@nokia-sbell.com> wrote: > I will try to reproduce (reboot + ftest)and tell you later, but in > following steps you can also simulate this issue locally, at least the > remaining entry happened and the entry heal quit also because of all thee > empty heald_sinks.(not sure if this is exactly the same from reboot+ftest > reproduced one but I guess it should be the same) > > > > > > 1> Stop client quorum by command “gluster v set > cluster.quorum-type none” > > 2> Isolate sn-0 from sn-1 and sn-2 > > iptables -I OUTPUT -d sn-1.local -j DROP > > iptables -I OUTPUT -d sn-2.local -j DROP > > iptables -I INPUT -s sn-2.local -j DROP > > iptables -I INPUT -s sn-1.local -j DROP > > 3> Touch /mnt/export/testdir/common.txt on sn-0 > > 4> Touch /mnt/export/testidir/common.txt on sn-1 > > 5> On sn-1 node,Delete all /mnt/bricks/export/brick/testdir/common.txt > metadata until getfattr returns empty, > > setfattr -x trusted.afr.dirty > /mnt/bricks/export/brick/testdir/common.txt > > setfattr -x trusted.afr.export-client-0 > /mnt/bricks/export/brick/testdir/common.txt > > setfattr -x trusted.gfid /mnt/bricks/export/brick/testdir/common.txt > > setfattr -x trusted.gfid2path.53be37be7f01389d > /mnt/bricks/export/brick/testdir/common.txt > > then getfattr returns empty > > [root@sn-1:/home/robot] > > # getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/common.txt > > [root@sn-1:/home/robot] > > 6> then delete the corresponding entry(common.txt) in > /mnt/bricks/export/brick/.glusterfs/indices/xattrop/ > > [root@sn-1:/home/robot] > > # rm -rf > /mnt/bricks/export/brick/.glusterfs/indices/xattrop/d0d237f7-0c43-4828-8720-dfb3792fe5fb > > 7>Restore network on sn-0 node. > > iptables -D OUTPUT -d sn-1.local -j DROP > > iptables -D OUTPUT -d sn-2.local -j DROP > >iptables -D INPUT -s sn-1.local -j DROP > > iptables -D INPUT -s sn-2.local -j DROP > > 7> Do touch /mnt/export/testdir/common.txt on sn-0 node > > 8> Gluster v heal export info will show following and keep for long time > > # gluster v heal export info > > Brick sn-0.local:/mnt/bricks/export/brick > > /testdir > > Status: Connected > > Number of entries: 1 > > > > Brick sn-1.local:/mnt/bricks/export/brick > > Status: Connected > > Number of entries: 0 > > > > Brick sn-2.local:/mnt/bricks/export/brick > > /testdir > > Status: Connected > > Number of entries: 1 > > > > > > > > *From:* Pranith Kumar Karampuri > *Sent:* Wednesday, September 05, 2018 4:56 PM > *To:* Zhou, Cynthia (NSB - CN/Hangzhou) > *Cc:* Gluster Devel ; Ravishankar N < > ravishan...@redhat.com> > *Subject:* Re: remaining entry in gluster volume heal info command even > after reboot > > > > > > On Wed, Sep 5, 2018 at 1:27 PM Zhou, Cynthia (NSB - CN/Hangzhou) < > cynthia.z...@nokia-sbell.com> wrote: > > Hi glusterfs experts: > >Good day! > >Recently when I do some test on my gluster env, I found that there > are some remaining entries in command “gluster v heal mstate info” *even > after reboot*. > > fstest_035ffc492ec43551a64087f9280ffe3e is a folder in /mnt/mstate > and in this folder only one file(fstest_458bb82d8884ed5c9dadec4ed93bec4e) > exists. > > When I dbg by gdb on sn-0 I found that: > > Parent dir changelog(fstest_035ffc492ec43551a64087f9280ffe3e) says need > to heal entry, but the changelog/gfid/filetype of the only entry in parent > dir shows there is nothing to be healed, so glustershd does nothing every > round of heal. And this entry will remain. > > My gdb shows that each round of heal on sn-0 , it exits in function > __afr_selfheal_entry (if (AFR_COUNT(healed_sinks, priv->child_count) == > 0)), because in this case all three healed_sinks are zero. > > > > What is the return value of this function in gdb? > > > >Have you any idea how to solve this issue from glusterfs pov? > Thanks! > > > > [test steps] > >Reboot three sn nodes( sn-0, sn-1, sn-2(arbiter)) sequentially, and > on another node (with glusterfs clients) run fstest. > > > > [problem description] > > > > Remaining entries in “gluster v heal mstate info” command even after > reboot sn-0 many times, the entries are still there! > > > > [root@sn-0:/home/robot] > > # gluster v heal mstate info > > Brick sn-0.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > Brick sn-1.local:/mnt/bricks/mstate/brick > > Status: Connected > > Number of entries: 0 > > > > Brick sn-2.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > > > > > some > env
Re: [Gluster-devel] remaining entry in gluster volume heal info command even after reboot
On Wed, Sep 5, 2018 at 1:27 PM Zhou, Cynthia (NSB - CN/Hangzhou) < cynthia.z...@nokia-sbell.com> wrote: > Hi glusterfs experts: > >Good day! > >Recently when I do some test on my gluster env, I found that there > are some remaining entries in command “gluster v heal mstate info” *even > after reboot*. > > fstest_035ffc492ec43551a64087f9280ffe3e is a folder in /mnt/mstate > and in this folder only one file(fstest_458bb82d8884ed5c9dadec4ed93bec4e) > exists. > > When I dbg by gdb on sn-0 I found that: > > Parent dir changelog(fstest_035ffc492ec43551a64087f9280ffe3e) says need > to heal entry, but the changelog/gfid/filetype of the only entry in parent > dir shows there is nothing to be healed, so glustershd does nothing every > round of heal. And this entry will remain. > > My gdb shows that each round of heal on sn-0 , it exits in function > __afr_selfheal_entry (if (AFR_COUNT(healed_sinks, priv->child_count) == > 0)), because in this case all three healed_sinks are zero. > What is the return value of this function in gdb? >Have you any idea how to solve this issue from glusterfs pov? > Thanks! > > > > [test steps] > >Reboot three sn nodes( sn-0, sn-1, sn-2(arbiter)) sequentially, and > on another node (with glusterfs clients) run fstest. > > > > [problem description] > > > > Remaining entries in “gluster v heal mstate info” command even after > reboot sn-0 many times, the entries are still there! > > > > [root@sn-0:/home/robot] > > # gluster v heal mstate info > > Brick sn-0.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > Brick sn-1.local:/mnt/bricks/mstate/brick > > Status: Connected > > Number of entries: 0 > > > > Brick sn-2.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > > > > > some > env informations/// > > # gluster v info mstate > > Volume Name: mstate > > Type: Replicate > > Volume ID: 1d896674-17a2-4ae7-aa7c-c6e22013df99 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: sn-0.local:/mnt/bricks/mstate/brick > > Brick2: sn-1.local:/mnt/bricks/mstate/brick > > Brick3: sn-2.local:/mnt/bricks/mstate/brick (arbiter) > > Options Reconfigured: > > performance.client-io-threads: off > > nfs.disable: on > > transport.address-family: inet > > cluster.server-quorum-type: none > > cluster.quorum-reads: no > > cluster.favorite-child-policy: mtime > > cluster.consistent-metadata: on > > network.ping-timeout: 42 > > cluster.quorum-type: auto > > server.allow-insecure: on > > cluster.server-quorum-ratio: 51% > > [root@sn-1:/home/robot] > > > > > > [root@sn-2:/root] > > # getfattr -m . -d -e hex > /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/ > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/ > > trusted.afr.dirty=0x > > trusted.afr.mstate-client-0=0x00010003 > > trusted.gfid=0xa0975560eaef4cb299467101de00446a > > trusted.glusterfs.dht=0x0001 > > > > [root@sn-2:/root] > > # getfattr -m . -d -e hex > /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e > > getfattr: Removing leading '/' from absolute path names > > # file: > mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e > > trusted.afr.mstate-client-0=0x > > trusted.gfid=0x9fc20f587f094182816390f056f7370f > > > trusted.gfid2path.864159d77373ad5f=0x61303937353536302d656165662d346362322d393934362d3731303164653030343436612f6673746573745f3435386262383264383838346564356339646164656334656439336265633465 > > > > # cd /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > # ls > > fstest_458bb82d8884ed5c9dadec4ed93bec4e > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > # stat fstest_458bb82d8884ed5c9dadec4ed93bec4e > > File: fstest_458bb82d8884ed5c9dadec4ed93bec4e > > Size: 0 Blocks: 8 IO Block: 4096 fifo > > Device: fd31h/64817d Inode: 22086 Links: 2 > > Access: (0644/prw-r--r--) Uid: (0/root) Gid: (0/root) > > Access: 2018-08-30 04:33:17.552870661 +0300 > > Modify: 2018-08-30 04:33:17.552870661 +0300 > > Change: 2018-08-30 04:33:17.553870661 +0300 > > Birth: - > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > [root@sn-2:/root] > > # exit > > logout > > Connection to sn-2.local closed. > > [root@sn-0:/home/robot] > > # getfattr -m . -d -e hex >
Re: [Gluster-devel] remaining entry in gluster volume heal info command even after reboot
Which version of gluster is this? On Wed, Sep 5, 2018 at 1:27 PM Zhou, Cynthia (NSB - CN/Hangzhou) < cynthia.z...@nokia-sbell.com> wrote: > Hi glusterfs experts: > >Good day! > >Recently when I do some test on my gluster env, I found that there > are some remaining entries in command “gluster v heal mstate info” *even > after reboot*. > > fstest_035ffc492ec43551a64087f9280ffe3e is a folder in /mnt/mstate > and in this folder only one file(fstest_458bb82d8884ed5c9dadec4ed93bec4e) > exists. > > When I dbg by gdb on sn-0 I found that: > > Parent dir changelog(fstest_035ffc492ec43551a64087f9280ffe3e) says need > to heal entry, but the changelog/gfid/filetype of the only entry in parent > dir shows there is nothing to be healed, so glustershd does nothing every > round of heal. And this entry will remain. > > My gdb shows that each round of heal on sn-0 , it exits in function > __afr_selfheal_entry (if (AFR_COUNT(healed_sinks, priv->child_count) == > 0)), because in this case all three healed_sinks are zero. > >Have you any idea how to solve this issue from glusterfs pov? > Thanks! > > > > [test steps] > >Reboot three sn nodes( sn-0, sn-1, sn-2(arbiter)) sequentially, and > on another node (with glusterfs clients) run fstest. > > > > [problem description] > > > > Remaining entries in “gluster v heal mstate info” command even after > reboot sn-0 many times, the entries are still there! > > > > [root@sn-0:/home/robot] > > # gluster v heal mstate info > > Brick sn-0.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > Brick sn-1.local:/mnt/bricks/mstate/brick > > Status: Connected > > Number of entries: 0 > > > > Brick sn-2.local:/mnt/bricks/mstate/brick > > /fstest_035ffc492ec43551a64087f9280ffe3e > > Status: Connected > > Number of entries: 1 > > > > > > > > some > env informations/// > > # gluster v info mstate > > Volume Name: mstate > > Type: Replicate > > Volume ID: 1d896674-17a2-4ae7-aa7c-c6e22013df99 > > Status: Started > > Snapshot Count: 0 > > Number of Bricks: 1 x (2 + 1) = 3 > > Transport-type: tcp > > Bricks: > > Brick1: sn-0.local:/mnt/bricks/mstate/brick > > Brick2: sn-1.local:/mnt/bricks/mstate/brick > > Brick3: sn-2.local:/mnt/bricks/mstate/brick (arbiter) > > Options Reconfigured: > > performance.client-io-threads: off > > nfs.disable: on > > transport.address-family: inet > > cluster.server-quorum-type: none > > cluster.quorum-reads: no > > cluster.favorite-child-policy: mtime > > cluster.consistent-metadata: on > > network.ping-timeout: 42 > > cluster.quorum-type: auto > > server.allow-insecure: on > > cluster.server-quorum-ratio: 51% > > [root@sn-1:/home/robot] > > > > > > [root@sn-2:/root] > > # getfattr -m . -d -e hex > /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/ > > getfattr: Removing leading '/' from absolute path names > > # file: mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/ > > trusted.afr.dirty=0x > > trusted.afr.mstate-client-0=0x00010003 > > trusted.gfid=0xa0975560eaef4cb299467101de00446a > > trusted.glusterfs.dht=0x0001 > > > > [root@sn-2:/root] > > # getfattr -m . -d -e hex > /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e > > getfattr: Removing leading '/' from absolute path names > > # file: > mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e/fstest_458bb82d8884ed5c9dadec4ed93bec4e > > trusted.afr.mstate-client-0=0x > > trusted.gfid=0x9fc20f587f094182816390f056f7370f > > > trusted.gfid2path.864159d77373ad5f=0x61303937353536302d656165662d346362322d393934362d3731303164653030343436612f6673746573745f3435386262383264383838346564356339646164656334656439336265633465 > > > > # cd /mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > # ls > > fstest_458bb82d8884ed5c9dadec4ed93bec4e > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > # stat fstest_458bb82d8884ed5c9dadec4ed93bec4e > > File: fstest_458bb82d8884ed5c9dadec4ed93bec4e > > Size: 0 Blocks: 8 IO Block: 4096 fifo > > Device: fd31h/64817d Inode: 22086 Links: 2 > > Access: (0644/prw-r--r--) Uid: (0/root) Gid: (0/root) > > Access: 2018-08-30 04:33:17.552870661 +0300 > > Modify: 2018-08-30 04:33:17.552870661 +0300 > > Change: 2018-08-30 04:33:17.553870661 +0300 > > Birth: - > > [root@sn-2 > :/mnt/bricks/mstate/brick/fstest_035ffc492ec43551a64087f9280ffe3e] > > [root@sn-2:/root] > > # exit > > logout > > Connection to sn-2.local closed. > > [root@sn-0:/home/robot] > > # getfattr -m . -d -e hex >