Re: [Gluster-users] gfid entries in volume heal info that do not heal
That took a while! I have the following stats: 4085169 files in both bricks3162940 files only have a single hard link. All of the files exist on both servers. bmidata2 (below) WAS running when bmidata1 died. gluster volume heal clifford statistics heal-countGathering count of entries to be healed on volume clifford has been successful Brick bmidata1:/data/glusterfs/clifford/brick/brickNumber of entries: 0 Brick bmidata2:/data/glusterfs/clifford/brick/brickNumber of entries: 296252 Brick bmidata1:/data/glusterfs/clifford3/brick/brickNumber of entries: 1 Brick bmidata2:/data/glusterfs/clifford3/brick/brickNumber of entries: 182407 Why those numbers are so much smaller than the data from a stat run through the entire brick, I have no idea. The 20TB of space for this mount point is composed of 4 10TB brick in a 2x2. As this is all from a large copy in from a backup source, I'm thinking of rerunning rsync to overwrite files with same create/modify times on the mount to realign things (maybe?) I ran a giant ls/stat on the mount but nothing changed. Ran it again with no changes. gluster-health-report Loaded reports: glusterd-op-version, georep, gfid-mismatch-dht-report, glusterd-peer-disconnect, disk_usage, errors_in_logs, coredump, glusterd, glusterd_volume_version_cksum_errors, kernel_issues, errors_in_logs, ifconfig, nic-health, process_status [ OK] Disk used percentage path=/ percentage=7[ OK] Disk used percentage path=/var percentage=7[ OK] Disk used percentage path=/tmp percentage=7[ ERROR] Report failure report=report_check_errors_in_glusterd_log[ OK] All peers are in connected state connected_count=1 total_peer_count=1[ OK] no gfid mismatch[ NOT OK] Failed to check op-version[ NOT OK] The maximum size of core files created is NOT set to unlimited.[ ERROR] Report failure report=report_check_worker_restarts[ ERROR] Report failure report=report_non_participating_bricks[ OK] Glusterd is running uptime_sec=5177509[ ERROR] Report failure report=report_check_version_or_cksum_errors_in_glusterd_log[ ERROR] Report failure report=report_check_errors_in_glusterd_log[ NOT OK] Recieve errors in "ifconfig enp131s0" output[ NOT OK] Recieve errors in "ifconfig eth0" output[ NOT OK] Recieve errors in "ifconfig eth3" output[ NOT OK] Recieve errors in "ifconfig mlx_ib0" output[ NOT OK] Transmission errors in "ifconfig mlx_ib0" output[ NOT OK] Errors seen in "cat /proc/net/dev -- eth0" output[ NOT OK] Errors seen in "cat /proc/net/dev -- eth3" output[ NOT OK] Errors seen in "cat /proc/net/dev -- mlx_ib0" output[ NOT OK] Errors seen in "cat /proc/net/dev -- enp131s0" outputHigh CPU usage by Self-heal NOTE: Bmidata2 up for over 300 days. due for reboot. On Tue, 2017-10-24 at 12:35 +0530, Karthik Subrahmanya wrote: > Hi Jim, > > Can you check whether the same hardlinks are present on both the > bricks & both of them have the link count 2? > If the link count is 2 then "find -samefile > bits of gfid>//" > should give you the file path. > > Regards, > Karthik > > On Tue, Oct 24, 2017 at 3:28 AM, Jim Kinney> wrote: > > > > > > > > I'm not so lucky. ALL of mine show 2 links and none have the attr > > data that supplies the path to the original. > > > > I have the inode from stat. Looking now to dig out the > > path/filename from xfs_db on the specific inodes individually. > > > > Is the hash of the filename or /filename and if so relative > > to where? /, , ? > > > > On Mon, 2017-10-23 at 18:54 +, Matt Waymack wrote: > > > In my case I was able to delete the hard links in the .glusterfs > > > folders of the bricks and it seems to have done the trick, > > > thanks! > > > > > > > > > From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] > > > > > > > > > Sent: Monday, October 23, 2017 1:52 AM > > > > > > To: Jim Kinney ; Matt Waymack > > dv.com> > > > > > > Cc: gluster-users > > > > > > Subject: Re: [Gluster-users] gfid entries in volume heal info > > > that do not heal > > > > > > > > > > > > > > > Hi Jim & Matt, > > > > > > Can you also check for the link count in the stat output of those > > > hardlink entries in the .glusterfs folder on the bricks. > > > > > > If the link count is 1 on all the bricks for those entries, then > > > they are orphaned entries and you can delete those hardlinks. > > > > > > > > > To be on the safer side have a backup before deleting any of the > > > entries. > > > > > > > > > Regards, > > > > > > > > > Karthik > > > > > > > > > > > > > > > > > > On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney > > > wrote: > > > > > > > > I've been following this particular thread as I have a similar > > > > issue (RAID6 array failed out with 3 dead drives at once while > > > > a 12 TB load was being copied into one mounted space - what a > > > > mess) > > > > > > > > > > > > > > > > > > > > > > > > I have
Re: [Gluster-users] [Gluster-devel] Request for Comments: Upgrades from 3.x to 4.0+
Ahh OK I see, thanks On 6 November 2017 at 00:54, Kaushal Mwrote: > On Fri, Nov 3, 2017 at 8:50 PM, Alastair Neil > wrote: > > Just so I am clear the upgrade process will be as follows: > > > > upgrade all clients to 4.0 > > > > rolling upgrade all servers to 4.0 (with GD1) > > > > kill all GD1 daemons on all servers and run upgrade script (new clients > > unable to connect at this point) > > > > start GD2 ( necessary or does the upgrade script do this?) > > > > > > I assume that once the cluster had been migrated to GD2 the glusterd > startup > > script will be smart enough to start the correct version? > > > > This should be the process, mostly. > > The upgrade script needs to GD2 running on all nodes before it can > begin migration. > But they don't need to have a cluster formed, the script should take > care of forming the cluster. > > > > -Thanks > > > > > > > > > > > > On 3 November 2017 at 04:06, Kaushal M wrote: > >> > >> On Thu, Nov 2, 2017 at 7:53 PM, Darrell Budic > >> wrote: > >> > Will the various client packages (centos in my case) be able to > >> > automatically handle the upgrade vs new install decision, or will we > be > >> > required to do something manually to determine that? > >> > >> We should be able to do this with CentOS (and other RPM based distros) > >> which have well split glusterfs packages currently. > >> At this moment, I don't know exactly how much can be handled > >> automatically, but I expect the amount of manual intervention to be > >> minimal. > >> The least minimum amount of manual work needed would be enabling and > >> starting GD2 and starting the migration script. > >> > >> > > >> > It’s a little unclear that things will continue without interruption > >> > because > >> > of the way you describe the change from GD1 to GD2, since it sounds > like > >> > it > >> > stops GD1. > >> > >> With the described upgrade strategy, we can ensure continuous volume > >> access to clients during the whole process (provided volumes have been > >> setup with replication or ec). > >> > >> During the migration from GD1 to GD2, any existing clients still > >> retain access, and can continue to work without interruption. > >> This is possible because gluster keeps the management (glusterds) and > >> data (bricks and clients) parts separate. > >> So it is possible to interrupt the management parts, without > >> interrupting data access to existing clients. > >> Clients and the server side brick processes need GlusterD to start up. > >> But once they're running, they can run without GlusterD. GlusterD is > >> only required again if something goes wrong. > >> Stopping GD1 during the migration process, will not lead to any > >> interruptions for existing clients. > >> The brick process continue to run, and any connected clients continue > >> to remain connected to the bricks. > >> Any new clients which try to mount the volumes during this migration > >> will fail, as a GlusterD will not be available (either GD1 or GD2). > >> > >> > Early days, obviously, but if you could clarify if that’s what > >> > we’re used to as a rolling upgrade or how it works, that would be > >> > appreciated. > >> > >> A Gluster rolling upgrade process, allows data access to volumes > >> during the process, while upgrading the brick processes as well. > >> Rolling upgrades with uninterrupted access requires that volumes have > >> redundancy (replicate or ec). > >> Rolling upgrades involves upgrading servers belonging to a redundancy > >> set (replica set or ec set), one at a time. > >> One at a time, > >> - A server is picked from a redundancy set > >> - All Gluster processes are killed on the server, glusterd, bricks and > >> other daemons included. > >> - Gluster is upgraded and restarted on the server > >> - A heal is performed to heal new data onto the bricks. > >> - Move onto next server after heal finishes. > >> > >> Clients maintain uninterrupted access, because a full redundancy set > >> is never taken offline all at once. > >> > >> > Also clarification that we’ll be able to upgrade from 3.x > >> > (3.1x?) to 4.0, manually or automatically? > >> > >> Rolling upgrades from 3.1x to 4.0 are a manual process. But I believe, > >> gdeploy has playbooks to automate it. > >> At the end of this you will be left with a 4.0 cluster, but still be > >> running GD1. > >> Upgrading from GD1 to GD2, in 4.0 will be a manual process. A script > >> that automates this is planned only for 4.1. > >> > >> > > >> > > >> > > >> > From: Kaushal M > >> > Subject: [Gluster-users] Request for Comments: Upgrades from 3.x to > 4.0+ > >> > Date: November 2, 2017 at 3:56:05 AM CDT > >> > To: gluster-users@gluster.org; Gluster Devel > >> > > >> > We're fast approaching the time for Gluster-4.0. And we would like to > >> > set out the expected upgrade strategy and try to polish it to be as
[Gluster-users] Gluster Summit BOF - Rebalance
Hi, We had a BOF on Rebalance at the Gluster Summit to get feedback from Gluster users. - Performance has improved over the last few releases and it works well for large files. - However, it is still not fast enough on volumes which contain a lot of directories and small files. The bottleneck appears to be the single-threaded filesystem crawl. - Scripts using the fix-layout and file migration virtual xattrs to rebalance volumes which are available via the mount point would be helpful - Rebalance is currently broken on volumes with ZFS bricks (and other FS where fallocate is not available). A fix for this is being worked on [1] and should be ready soon. - The rebalance status output is satisfactory Amar, Susant, Raghavendra, please add if I have missed something. Regards, Nithya [1] https://review.gluster.org/18573 ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how to verify bitrot signed file manually?
Any update? On Fri, Oct 13, 2017 at 1:14 PM, Amudhan Pwrote: > any update?. > > why is it marked bad? > > Any way to find out what happened to the file? > > > On Tue, Oct 3, 2017 at 12:44 PM, Amudhan P wrote: > >> >> my volume is distributed disperse volume 8+2 EC. >> file1 and file2 are different files lying in same brick. I am able to >> read the file from mount point without any issue because of EC it reads >> rest of the available blocks in other nodes. >> >> my question is "file1" sha256 value matches bitrot signature value but >> still, it is also marked as bad by scrubber daemon. why is that? >> >> >> >> On Fri, Sep 29, 2017 at 12:52 PM, Kotresh Hiremath Ravishankar < >> khire...@redhat.com> wrote: >> >>> Hi Amudhan, >>> >>> Sorry for the late response as I was busy with other things. You are >>> right bitrot uses sha256 for checksum. >>> If file-1, file-2 are marked bad, the I/O should be errored out with >>> EIO. If that is not happening, we need >>> to look further into it. But what's the file contents of file-1 and >>> file-2 on the replica bricks ? Are they >>> matching ? >>> >>> Thanks and Regards, >>> Kotresh HR >>> >>> On Mon, Sep 25, 2017 at 4:19 PM, Amudhan P wrote: >>> resending mail. On Fri, Sep 22, 2017 at 5:30 PM, Amudhan P wrote: > ok, from bitrot code I figured out gluster using sha256 hashing algo. > > > Now coming to the problem, during scrub run in my cluster some of my > files were marked as bad in few set of nodes. > I just wanted to confirm bad file. so, I have used "sha256sum" tool in > Linux to manually get file hash. > > here is the result. > > file-1, file-2 marked as bad by scrub and file-3 is healthy. > > file-1 sha256 and bitrot signature value matches but still it's been > marked as bad. > > file-2 sha256 and bitrot signature value don't match, could be a > victim of bitrot or bitflip.file is still readable without any issue and > no > errors found in the drive. > > file-3 sha256 and bitrot signature matches and healthy. > > > file-1 output from > > "sha256sum" = "71eada9352b1352aaef0f806d3d56 > 1768ce2df905ded1668f665e06eca2d0bd4" > > > "getfattr -m. -e hex -d " > # file: file-1 > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x01020071eada9352b135 > 2aaef0f806d3d561768ce2df905ded1668f665e06eca2d0bd4 > trusted.bit-rot.version=0x020058e4f3b40006793d > trusted.ec.config=0x080a02000200 > trusted.ec.dirty=0x > trusted.ec.size=0x000718996701 > trusted.ec.version=0x00038c4c00038c4d > trusted.gfid=0xf078a24134fe4f9bb953eca8c28dea9a > > output scrub log: > [2017-09-02 13:02:20.311160] A [MSGID: 118023] > [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: > CORRUPTION DETECTED: Object /file-1 {Brick: /media/disk16/brick16 | GFID: > f078a241-34fe-4f9b-b953-eca8c28dea9a} > [2017-09-02 13:02:20.311579] A [MSGID: 118024] > [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: > Marking /file-1 [GFID: f078a241-34fe-4f9b-b953-eca8c28dea9a | Brick: > /media/disk16/brick16] as corrupted.. > > file-2 output from > > "sha256sum" = "c41ef9c81faed4f3e6010ea67984c > 3cfefd842f98ee342939151f9250972dcda" > > > "getfattr -m. -e hex -d " > # file: file-2 > trusted.bit-rot.bad-file=0x3100 > trusted.bit-rot.signature=0x0102009162cb17d4f0be > e676fcb7830c5286d05b8e8940d14f3d117cb90b7b1defc129 > trusted.bit-rot.version=0x020058e4f3b400019bb2 > trusted.ec.config=0x080a02000200 > trusted.ec.dirty=0x > trusted.ec.size=0x403433f6 > trusted.ec.version=0x201a201b > trusted.gfid=0xa50012b0a632477c99232313928d239a > > output scrub log: > [2017-09-02 05:18:14.003156] A [MSGID: 118023] > [bit-rot-scrub.c:244:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: > CORRUPTION DETECTED: Object /file-2 {Brick: /media/disk13/brick13 | GFID: > a50012b0-a632-477c-9923-2313928d239a} > [2017-09-02 05:18:14.006629] A [MSGID: 118024] > [bit-rot-scrub.c:264:bitd_compare_ckum] 0-qubevaultdr-bit-rot-0: > Marking /file-2 [GFID: a50012b0-a632-477c-9923-2313928d239a | Brick: > /media/disk13/brick13] as corrupted.. > > > file-3 output from > > "sha256sum" = "a590735b3c8936cc7ca9835128a19 > c38a3f79c8fd53fddc031a9349b7e273f27" > > > "getfattr -m. -e hex -d " > # file: file-3 > trusted.bit-rot.signature=0x010200a590735b3c8936 > cc7ca9835128a19c38a3f79c8fd53fddc031a9349b7e273f27 > trusted.bit-rot.version=0x020058e4f3b400019bb2 >
[Gluster-users] ?????? glusterfs segmentation fault in rdma mode
Hi ,all We found a strange problem. Some clients worked normally while some clients couldn't access sepcial files. For exmaple, Client A couldn't create the directory xxx, but Client B could. However, if Client B created the directory, Client A could acess it and even deleted it. But Client A still couldn't create the same directory later. If I changed the directory name, Client A worked without problems. It seemed that there were some problems with special bricks in special clients. But all the bricks were online. I saw this in the logs in the GlusterFS client after creating directory failure: [2017-11-06 11:55:18.420610] W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-data-dht: no subvolume for hash (value) = 4148753024 [2017-11-06 11:55:18.457744] W [fuse-bridge.c:521:fuse_entry_cbk] 0-glusterfs-fuse: 488: MKDIR() /xxx => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:186:dht_layout_search] 0-data-dht: no subvolume for hash (value) = 4148753024" repeated 3 times between [2017-11-06 11:55:18.420610] and [2017-11-06 11:55:18.457731] -- -- ??: "Bennbsp;Turner";; : 2017??11??5??(??) 3:00 ??: "acfreeman"<21291...@qq.com>; : "gluster-users" ; : Re: [Gluster-users] glusterfs segmentation fault in rdma mode This looks like there could be some some problem requesting / leaking / whatever memory but without looking at the core its tought to tell for sure. Note: /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618] Can you open up a bugzilla and get us the core file to review? -b - Original Message - > From: "??" <21291...@qq.com> > To: "gluster-users" > Sent: Saturday, November 4, 2017 5:27:50 AM > Subject: [Gluster-users] glusterfs segmentation fault in rdma mode > > > > Hi, All, > > > > > I used Infiniband to connect all GlusterFS nodes and the clients. Previously > I run IP over IB and everything was OK. Now I used rdma transport mode > instead. And then I ran the traffic. After I while, the glusterfs process > exited because of segmentation fault. > > > > > Here were the messages when I saw segmentation fault: > > pending frames: > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(1) op(WRITE) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > frame : type(0) op(0) > > patchset: git:// git.gluster.org/glusterfs.git > > signal received: 11 > > time of crash: > > 2017-11-01 11:11:23 > > configuration details: > > argp 1 > > backtrace 1 > > dlfcn 1 > > libpthread 1 > > llistxattr 1 > > setfsid 1 > > spinlock 1 > > epoll.h 1 > > xattr.h 1 > > st_atim.tv_nsec 1 > > package-string: glusterfs 3.11.0 > > /usr/lib64/ libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7f95bc54e618 ] > > /usr/lib64/ libglusterfs.so.0(gf_print_trace+0x324)[0x7f95bc557834 ] > > /lib64/ libc.so.6(+0x32510)[0x7f95bace2510 ] > > The client OS was CentOS 7.3. The server OS was CentOS 6.5. The GlusterFS > version was 3.11.0 both in clients and servers. The Infiniband card was > Mellanox. The Mellanox IB driver version was v4.1-1.0.2 (27 Jun 2017) both > in clients and servers. > > > Is rdma code stable for GlusterFS? Need I upgrade the IB driver or apply a > patch? > > Thanks! > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users