Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
On 03/02/2018 12:34 AM, Anoop C S wrote: > On Fri, 2018-03-02 at 10:11 +0530, Ravishankar N wrote: >> + Anoop. >> >> It looks like clients on the old (3.12) nodes are not able to talk to >> the upgraded (4.0) node. I see messages like these on the old clients: >> >> [2018-03-02 03:49:13.483458] W [MSGID: 114007] >> [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-2: >> failed to find key 'clnt-lk-version' in the options > Seems like we need to set clnt-lk-version from server side too similar to > what we did for client via > https://review.gluster.org/#/c/19560/. Can you try with the attached patch? > Used the same, and tests on my end pass as well. Backported the same to 4.0 (so that by EOD it is ready for a merge). Thanks Anoop and Ravi for that quick turnaround, considering 4.0 release times. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
On 03/02/2018 02:41 AM, Ravishankar N wrote: > I still got the mkdir error on a plain distribute volume that I referred > to in the other email in this thread. Anyone who is interested in trying > it out, the steps are: > - Create a 2 node 2x1 plain distribute vol on 3.13 and fuse mount on node-1 > - Upgrade 2nd node to 4.0 and once it is up and running, > - Perform mkdir from the mount on node1-->this returns EIO Thanks Ravi. I tried this test in my setup, and did not face any issues. This was with Anoop's patch (same case as yours). Further we do not support rolling upgrade on pure distribute volumes, I am willing to not consider this a blocker (considering the predictability and the nature of the test). DHT folks, please check once and ensure this is not a critical failure, and that we are not assuming otherwise for the release. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
On 03/02/2018 11:04 AM, Anoop C S wrote: On Fri, 2018-03-02 at 10:11 +0530, Ravishankar N wrote: + Anoop. It looks like clients on the old (3.12) nodes are not able to talk to the upgraded (4.0) node. I see messages like these on the old clients: [2018-03-02 03:49:13.483458] W [MSGID: 114007] [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-2: failed to find key 'clnt-lk-version' in the options Seems like we need to set clnt-lk-version from server side too similar to what we did for client via https://review.gluster.org/#/c/19560/. Can you try with the attached patch? Thanks, self-heal works with this. You might want to get it merged in 4.0 ASAP. I still got the mkdir error on a plain distribute volume that I referred to in the other email in this thread. Anyone who is interested in trying it out, the steps are: - Create a 2 node 2x1 plain distribute vol on 3.13 and fuse mount on node-1 - Upgrade 2nd node to 4.0 and once it is up and running, - Perform mkdir from the mount on node1-->this returns EIO Thanks Ravi PS: Feeling a bit under the weather, so I might not be online today again. Is there something more to be done on BZ 1544366? -Ravi On 03/02/2018 08:44 AM, Ravishankar N wrote: On 03/02/2018 07:26 AM, Shyam Ranganathan wrote: Hi Pranith/Ravi, So, to keep a long story short, post upgrading 1 node in a 3 node 3.13 cluster, self-heal is not able to catch the heal backlog and this is a very simple synthetic test anyway, but the end result is that upgrade testing is failing. Let me try this now and get back. I had done some thing similar when testing the FIPS patch and the rolling upgrade had worked. Thanks, Ravi Here are the details, - Using https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA# I setup 3 server containers to install 3.13 first as follows (within the containers) (inside the 3 server containers) yum -y update; yum -y install centos-release-gluster313; yum install glusterfs-server; glusterd (inside centos-glfs-server1) gluster peer probe centos-glfs-server2 gluster peer probe centos-glfs-server3 gluster peer status gluster v create patchy replica 3 centos-glfs-server1:/d/brick1 centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3 centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5 centos-glfs-server3:/d/brick6 force gluster v start patchy gluster v status Create a client container as per the document above, and mount the above volume and create 1 file, 1 directory and a file within that directory. Now we start the upgrade process (as laid out for 3.13 here http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ): - killall glusterfs glusterfsd glusterd - yum install http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.cent os.x86_64.rpm - yum upgrade --enablerepo=centos-gluster40-test glusterfs-server < Go back to the client and edit the contents of one of the files and change the permissions of a directory, so that there are things to heal when we bring up the newly upgraded server> - gluster --version - glusterd - gluster v status - gluster v heal patchy The above starts failing as follows, [root@centos-glfs-server1 /]# gluster v heal patchy Launching heal operation to perform index self heal on volume patchy has been unsuccessful: Commit failed on centos-glfs-server2.glfstest20. Please check log file for details. Commit failed on centos-glfs-server3. Please check log file for details. From here, if further files or directories are created from the client, they just get added to the heal backlog, and heal does not catchup. As is obvious, I cannot proceed, as the upgrade procedure is broken. The issue itself may not be selfheal deamon, but something around connections, but as the process fails here, looking to you guys to unblock this as soon as possible, as we are already running a day's slip in the release. Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
On 03/02/2018 10:11 AM, Ravishankar N wrote: + Anoop. It looks like clients on the old (3.12) nodes are not able to talk to the upgraded (4.0) node. I see messages like these on the old clients: [2018-03-02 03:49:13.483458] W [MSGID: 114007] [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-2: failed to find key 'clnt-lk-version' in the options I see this in a 2x1 plain distribute also. I see ENOTCONN for the upgraded brick on the old client: [2018-03-02 04:58:54.559446] E [MSGID: 114058] [client-handshake.c:1571:client_query_portmap_cbk] 0-testvol-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2018-03-02 04:58:54.559618] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 0-testvol-client-1: disconnected from testvol-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2018-03-02 04:58:56.973199] I [rpc-clnt.c:1994:rpc_clnt_reconfig] 0-testvol-client-1: changing port to 49152 (from 0) [2018-03-02 04:58:56.975844] I [MSGID: 114057] [client-handshake.c:1484:select_server_supported_programs] 0-testvol-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2018-03-02 04:58:56.978114] W [MSGID: 114007] [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-1: failed to find key 'clnt-lk-version' in the options [2018-03-02 04:58:46.618036] E [MSGID: 114031] [client-rpc-fops.c:2768:client3_3_opendir_cbk] 0-testvol-client-1: remote operation failed. Path: / (----0001) [Transport endpoint is not connected] The message "W [MSGID: 114031] [client-rpc-fops.c:2577:client3_3_readdirp_cbk] 0-testvol-client-1: remote operation failed [Transport endpoint is not connected]" repeated 3 times between [2018-03-02 04:58:46.609529] and [2018-03-02 04:58:46.618683] Also, mkdir fails on the old mount with EIO, though physically succeeding on both bricks. Can the rpc folks offer a helping hand? -Ravi Is there something more to be done on BZ 1544366? -Ravi On 03/02/2018 08:44 AM, Ravishankar N wrote: On 03/02/2018 07:26 AM, Shyam Ranganathan wrote: Hi Pranith/Ravi, So, to keep a long story short, post upgrading 1 node in a 3 node 3.13 cluster, self-heal is not able to catch the heal backlog and this is a very simple synthetic test anyway, but the end result is that upgrade testing is failing. Let me try this now and get back. I had done some thing similar when testing the FIPS patch and the rolling upgrade had worked. Thanks, Ravi Here are the details, - Using https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA# I setup 3 server containers to install 3.13 first as follows (within the containers) (inside the 3 server containers) yum -y update; yum -y install centos-release-gluster313; yum install glusterfs-server; glusterd (inside centos-glfs-server1) gluster peer probe centos-glfs-server2 gluster peer probe centos-glfs-server3 gluster peer status gluster v create patchy replica 3 centos-glfs-server1:/d/brick1 centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3 centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5 centos-glfs-server3:/d/brick6 force gluster v start patchy gluster v status Create a client container as per the document above, and mount the above volume and create 1 file, 1 directory and a file within that directory. Now we start the upgrade process (as laid out for 3.13 here http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ): - killall glusterfs glusterfsd glusterd - yum install http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.centos.x86_64.rpm - yum upgrade --enablerepo=centos-gluster40-test glusterfs-server < Go back to the client and edit the contents of one of the files and change the permissions of a directory, so that there are things to heal when we bring up the newly upgraded server> - gluster --version - glusterd - gluster v status - gluster v heal patchy The above starts failing as follows, [root@centos-glfs-server1 /]# gluster v heal patchy Launching heal operation to perform index self heal on volume patchy has been unsuccessful: Commit failed on centos-glfs-server2.glfstest20. Please check log file for details. Commit failed on centos-glfs-server3. Please check log file for details. From here, if further files or directories are created from the client, they just get added to the heal backlog, and heal does not catchup. As is obvious, I cannot proceed, as the upgrade procedure is broken. The issue itself may not be selfheal deamon, but something around connections, but as the process fails here, looking to you guys to unblock this as soon as possible, as we are already running a day's slip in the release. Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://list
Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
+ Anoop. It looks like clients on the old (3.12) nodes are not able to talk to the upgraded (4.0) node. I see messages like these on the old clients: [2018-03-02 03:49:13.483458] W [MSGID: 114007] [client-handshake.c:1197:client_setvolume_cbk] 0-testvol-client-2: failed to find key 'clnt-lk-version' in the options Is there something more to be done on BZ 1544366? -Ravi On 03/02/2018 08:44 AM, Ravishankar N wrote: On 03/02/2018 07:26 AM, Shyam Ranganathan wrote: Hi Pranith/Ravi, So, to keep a long story short, post upgrading 1 node in a 3 node 3.13 cluster, self-heal is not able to catch the heal backlog and this is a very simple synthetic test anyway, but the end result is that upgrade testing is failing. Let me try this now and get back. I had done some thing similar when testing the FIPS patch and the rolling upgrade had worked. Thanks, Ravi Here are the details, - Using https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA# I setup 3 server containers to install 3.13 first as follows (within the containers) (inside the 3 server containers) yum -y update; yum -y install centos-release-gluster313; yum install glusterfs-server; glusterd (inside centos-glfs-server1) gluster peer probe centos-glfs-server2 gluster peer probe centos-glfs-server3 gluster peer status gluster v create patchy replica 3 centos-glfs-server1:/d/brick1 centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3 centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5 centos-glfs-server3:/d/brick6 force gluster v start patchy gluster v status Create a client container as per the document above, and mount the above volume and create 1 file, 1 directory and a file within that directory. Now we start the upgrade process (as laid out for 3.13 here http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ): - killall glusterfs glusterfsd glusterd - yum install http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.centos.x86_64.rpm - yum upgrade --enablerepo=centos-gluster40-test glusterfs-server < Go back to the client and edit the contents of one of the files and change the permissions of a directory, so that there are things to heal when we bring up the newly upgraded server> - gluster --version - glusterd - gluster v status - gluster v heal patchy The above starts failing as follows, [root@centos-glfs-server1 /]# gluster v heal patchy Launching heal operation to perform index self heal on volume patchy has been unsuccessful: Commit failed on centos-glfs-server2.glfstest20. Please check log file for details. Commit failed on centos-glfs-server3. Please check log file for details. From here, if further files or directories are created from the client, they just get added to the heal backlog, and heal does not catchup. As is obvious, I cannot proceed, as the upgrade procedure is broken. The issue itself may not be selfheal deamon, but something around connections, but as the process fails here, looking to you guys to unblock this as soon as possible, as we are already running a day's slip in the release. Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release 4.0: Unable to complete rolling upgrade tests
On 03/02/2018 07:26 AM, Shyam Ranganathan wrote: Hi Pranith/Ravi, So, to keep a long story short, post upgrading 1 node in a 3 node 3.13 cluster, self-heal is not able to catch the heal backlog and this is a very simple synthetic test anyway, but the end result is that upgrade testing is failing. Let me try this now and get back. I had done some thing similar when testing the FIPS patch and the rolling upgrade had worked. Thanks, Ravi Here are the details, - Using https://hackmd.io/GYIwTADCDsDMCGBaArAUxAY0QFhBAbIgJwCMySIwJmAJvGMBvNEA# I setup 3 server containers to install 3.13 first as follows (within the containers) (inside the 3 server containers) yum -y update; yum -y install centos-release-gluster313; yum install glusterfs-server; glusterd (inside centos-glfs-server1) gluster peer probe centos-glfs-server2 gluster peer probe centos-glfs-server3 gluster peer status gluster v create patchy replica 3 centos-glfs-server1:/d/brick1 centos-glfs-server2:/d/brick2 centos-glfs-server3:/d/brick3 centos-glfs-server1:/d/brick4 centos-glfs-server2:/d/brick5 centos-glfs-server3:/d/brick6 force gluster v start patchy gluster v status Create a client container as per the document above, and mount the above volume and create 1 file, 1 directory and a file within that directory. Now we start the upgrade process (as laid out for 3.13 here http://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_3.13/ ): - killall glusterfs glusterfsd glusterd - yum install http://cbs.centos.org/kojifiles/work/tasks/1548/311548/centos-release-gluster40-0.9-1.el7.centos.x86_64.rpm - yum upgrade --enablerepo=centos-gluster40-test glusterfs-server < Go back to the client and edit the contents of one of the files and change the permissions of a directory, so that there are things to heal when we bring up the newly upgraded server> - gluster --version - glusterd - gluster v status - gluster v heal patchy The above starts failing as follows, [root@centos-glfs-server1 /]# gluster v heal patchy Launching heal operation to perform index self heal on volume patchy has been unsuccessful: Commit failed on centos-glfs-server2.glfstest20. Please check log file for details. Commit failed on centos-glfs-server3. Please check log file for details. From here, if further files or directories are created from the client, they just get added to the heal backlog, and heal does not catchup. As is obvious, I cannot proceed, as the upgrade procedure is broken. The issue itself may not be selfheal deamon, but something around connections, but as the process fails here, looking to you guys to unblock this as soon as possible, as we are already running a day's slip in the release. Thanks, Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel