Re: [Gluster-users] trashcan on dist. repl. volume with geo-replication
Hi Kotresh, ...another test. this time the trashcan was enabled on master only. as in the test before it's a gfs 3.12.6 on ubuntu 16.04.4 the geo rep error appeared again and disabling the trashcan does not change anything. as in the former test the error appears when i try to list files in the trashcan. the shown gfid belongs to a directory in trashcan with just one file in it...like in the former test. [2018-03-13 11:08:30.777489] E [master(/brick1/mvol1):784:log_failures] _GMaster: ENTRY FAILED data=({'uid': 0, 'gfid': '71379ee0-c40a-49db-b3ed-9f3145ed409a', 'gid': 0, 'mode': 16877, 'entry': '.gfid/4f59c068-6c77-40f2-b556-aa761834caf1/dir1', 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) below the setup, further informations and all activities. is there anything else i could test or check...? a generally question, is there a recommendation for the use of the trashcan feature in geo-replication envrionments...? for my use-case it's not necessary to activate it on the slave...but is this needed to activate it on master and slave ? best regards Dietmar master volume : root@gl-node1:~# gluster volume info mvol1 Volume Name: mvol1 Type: Distributed-Replicate Volume ID: 7590b6a0-520b-4c51-ad63-3ba5be0ed0df Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gl-node1-int:/brick1/mvol1 Brick2: gl-node2-int:/brick1/mvol1 Brick3: gl-node3-int:/brick1/mvol1 Brick4: gl-node4-int:/brick1/mvol1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.trash-max-filesize: 2GB features.trash: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@gl-node1:~# slave volume : root@gl-node5:~# gluster volume info mvol1 Volume Name: mvol1 Type: Distributed-Replicate Volume ID: aba4e057-7374-4a62-bcd7-c1c6f71e691b Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gl-node5-int:/brick1/mvol1 Brick2: gl-node6-int:/brick1/mvol1 Brick3: gl-node7-int:/brick1/mvol1 Brick4: gl-node8-int:/brick1/mvol1 Options Reconfigured: transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@gl-node5:~# root@gl-node1:~# gluster volume geo-replication mvol1 gl-node5-int::mvol1 config special_sync_mode: partial state_socket_unencoded: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket gluster_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem ignore_deletes: false change_detector: changelog gluster_command_dir: /usr/sbin/ state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.status remote_gsyncd: /nonexistent/gsyncd log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log changelog_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-changes.log socketdir: /var/run/gluster working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 state_detail_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status use_meta_volume: true ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ access_mount: true gluster_params: aux-gfid-mount acl root@gl-node1:~# root@gl-node1:~# gluster volume geo-replication mvol1 gl-node5-int::mvol1 status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED gl-node1-int mvol1 /brick1/mvol1 root gl-node5-int::mvol1 gl-node5-int Active Changelog Crawl 2018-03-13 09:43:46 gl-node4-int mvol1 /brick1/mvol1 root gl-node5-int::mvol1 gl-node8-int Active Changelog Crawl 2018-03-13 09:43:47 gl-node2-int mvol1 /brick1/mvol1 root gl-node5-int::mvol1 gl-node6-int Passive N/A N/A gl-node3-int mvol1 /brick1/mvol1 root gl-node5-int::mvol1 gl-node7-int Passive N/A N/A root@gl-node1:~# volume's are locally mounted as : gl-node1:/mvol1 20G 65M 20G 1% /m_vol
Re: [Gluster-users] trashcan on dist. repl. volume with geo-replication
Hi Dietmar, I am trying to understand the problem and have few questions. 1. Is trashcan enabled only on master volume? 2. Does the 'rm -rf' done on master volume synced to slave ? 3. If trashcan is disabled, the issue goes away? The geo-rep error just says the it failed to create the directory "Oracle_VM_VirtualBox_Extension" on slave. Usually this would be because of gfid mismatch but I don't see that in your case. So I am little more interested in present state of the geo-rep. Is it still throwing same errors and same failure to sync the same directory. If so does the parent 'test1/b1' exists on slave? And doing ls on trashcan should not affect geo-rep. Is there a easy reproducer for this ? Thanks, Kotresh HR On Mon, Mar 12, 2018 at 10:13 PM, Dietmar Putzwrote: > Hello, > > in regard to > https://bugzilla.redhat.com/show_bug.cgi?id=1434066 > i have been faced to another issue when using the trashcan feature on a > dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) > for e.g. removing an entire directory with subfolders : > tron@gl-node1:/myvol-1/test1/b1$ rm -rf * > > afterwards listing files in the trashcan : > tron@gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ > > leads to an outage of the geo-replication. > error on master-01 and master-02 : > > [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] > _GMaster: slave's time stime=(1520861818, 0) > [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] > _GMaster: ENTRY FAILEDdata=({'uid': 0, 'gfid': > 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': > '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', > 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) > [2018-03-12 13:37:14.835911] E > [syncdutils(/brick1/mvol1):299:log_raise_exception] > : The above directory failed to sync. Please fix it to proceed further. > > > both gfid's of the directories as shown in the log : > brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c > brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension > 0xc38f75e3194a4d22909450ac8f8756e7 > > the shown directory contains just one file which is stored on gl-node3 and > gl-node4 while node1 and 2 are in geo replication error. > since the filesize limitation of the trashcan is obsolete i'm really > interested to use the trashcan feature but i'm concerned it will interrupt > the geo-replication entirely. > does anybody else have been faced with this situation...any hints, > workarounds... ? > > best regards > Dietmar Putz > > > root@gl-node1:~/tmp# gluster volume info mvol1 > > Volume Name: mvol1 > Type: Distributed-Replicate > Volume ID: a1c74931-568c-4f40-8573-dd344553e557 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp > Bricks: > Brick1: gl-node1-int:/brick1/mvol1 > Brick2: gl-node2-int:/brick1/mvol1 > Brick3: gl-node3-int:/brick1/mvol1 > Brick4: gl-node4-int:/brick1/mvol1 > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > features.trash-max-filesize: 2GB > features.trash: on > transport.address-family: inet > nfs.disable: on > performance.client-io-threads: off > > root@gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 > gl-node5-int::mvol1 config > special_sync_mode: partial > gluster_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1.gluster.log > ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem > change_detector: changelog > use_meta_volume: true > session_owner: a1c74931-568c-4f40-8573-dd344553e557 > state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.status > gluster_params: aux-gfid-mount acl > remote_gsyncd: /nonexistent/gsyncd > working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 > state_detail_file: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168. > 178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status > gluster_command_dir: /usr/sbin/ > pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ > monitor.pid > georep_session_working_dir: /var/lib/glusterd/geo-replicat > ion/mvol1_gl-node5-int_mvol1/ > ssh_command_tar: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replicat > ion/tar_ssh.pem > master.stime_xattr_name: trusted.glusterfs.a1c74931-568 > c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime > changelog_log_file: /var/log/glusterfs/geo-replica > tion/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A% > 2F%2F127.0.0.1%3Amvol1-changes.log > socketdir: /var/run/gluster > volume_id: a1c74931-568c-4f40-8573-dd344553e557 > ignore_deletes: false >
[Gluster-users] trashcan on dist. repl. volume with geo-replication
Hello, in regard to https://bugzilla.redhat.com/show_bug.cgi?id=1434066 i have been faced to another issue when using the trashcan feature on a dist. repl. volume running a geo-replication. (gfs 3.12.6 on ubuntu 16.04.4) for e.g. removing an entire directory with subfolders : tron@gl-node1:/myvol-1/test1/b1$ rm -rf * afterwards listing files in the trashcan : tron@gl-node1:/myvol-1/test1$ ls -la /myvol-1/.trashcan/test1/b1/ leads to an outage of the geo-replication. error on master-01 and master-02 : [2018-03-12 13:37:14.827204] I [master(/brick1/mvol1):1385:crawl] _GMaster: slave's time stime=(1520861818, 0) [2018-03-12 13:37:14.835535] E [master(/brick1/mvol1):784:log_failures] _GMaster: ENTRY FAILED data=({'uid': 0, 'gfid': 'c38f75e3-194a-4d22-9094-50ac8f8756e7', 'gid': 0, 'mode': 16877, 'entry': '.gfid/5531bd64-ac50-462b-943e-c0bf1c52f52c/Oracle_VM_VirtualBox_Extension', 'op': 'MKDIR'}, 2, {'gfid_mismatch': False, 'dst': False}) [2018-03-12 13:37:14.835911] E [syncdutils(/brick1/mvol1):299:log_raise_exception] : The above directory failed to sync. Please fix it to proceed further. both gfid's of the directories as shown in the log : brick1/mvol1/.trashcan/test1/b1 0x5531bd64ac50462b943ec0bf1c52f52c brick1/mvol1/.trashcan/test1/b1/Oracle_VM_VirtualBox_Extension 0xc38f75e3194a4d22909450ac8f8756e7 the shown directory contains just one file which is stored on gl-node3 and gl-node4 while node1 and 2 are in geo replication error. since the filesize limitation of the trashcan is obsolete i'm really interested to use the trashcan feature but i'm concerned it will interrupt the geo-replication entirely. does anybody else have been faced with this situation...any hints, workarounds... ? best regards Dietmar Putz root@gl-node1:~/tmp# gluster volume info mvol1 Volume Name: mvol1 Type: Distributed-Replicate Volume ID: a1c74931-568c-4f40-8573-dd344553e557 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: gl-node1-int:/brick1/mvol1 Brick2: gl-node2-int:/brick1/mvol1 Brick3: gl-node3-int:/brick1/mvol1 Brick4: gl-node4-int:/brick1/mvol1 Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on features.trash-max-filesize: 2GB features.trash: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off root@gl-node1:/myvol-1/test1# gluster volume geo-replication mvol1 gl-node5-int::mvol1 config special_sync_mode: partial gluster_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.gluster.log ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem change_detector: changelog use_meta_volume: true session_owner: a1c74931-568c-4f40-8573-dd344553e557 state_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.status gluster_params: aux-gfid-mount acl remote_gsyncd: /nonexistent/gsyncd working_dir: /var/lib/misc/glusterfsd/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1 state_detail_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-detail.status gluster_command_dir: /usr/sbin/ pid_file: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/monitor.pid georep_session_working_dir: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ ssh_command_tar: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem master.stime_xattr_name: trusted.glusterfs.a1c74931-568c-4f40-8573-dd344553e557.d62bda3a-1396-492a-ad99-7c6238d93c6a.stime changelog_log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1-changes.log socketdir: /var/run/gluster volume_id: a1c74931-568c-4f40-8573-dd344553e557 ignore_deletes: false state_socket_unencoded: /var/lib/glusterd/geo-replication/mvol1_gl-node5-int_mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.socket log_file: /var/log/glusterfs/geo-replication/mvol1/ssh%3A%2F%2Froot%40192.168.178.65%3Agluster%3A%2F%2F127.0.0.1%3Amvol1.log access_mount: true root@gl-node1:/myvol-1/test1# -- ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users