Re: [ceph-users] PG inconsistency
For #1, it depends what you mean by fast. I wouldn't worry about it taking 15 minutes. If you mark the old OSD out, ceph will start remapping data immediately, including a bunch of PGs on unrelated OSDs. Once you replace the disk, and put the same OSDID back in the same host, the CRUSH map will be back to what it was before you started. All of those remaps on unrelated OSDs will reverse. They'll complete fairly quickly, because they only have to backfill the data that was written during the remap. I prefer #1. ceph pg repair will just overwrite the replicas with whatever the primary OSD has, which may copy bad data from your bad OSD over good replicas. So #2 has the potential to corrupt the data. #1 will delete the data you know is bad, leaving only good data behind to replicate. Once ceph pg repair gets more intelligent, I'll revisit this. I also prefer the simplicity. If it's dead or corrupt, they're treated the same. On Sun, Nov 9, 2014 at 7:25 PM, GuangYang wrote: > > In terms of disk replacement, to avoid migrating data back and forth, are > the below two approaches reasonable? > 1. Keep the OSD in and do an ad-hoc disk replacement and provision a new > OSD (so that keep the OSD id as the same), and then trigger data migration. > In this way the data migration only happens once, however, it does require > operators to replace the disk very fast. > 2. Move the data on the broken disk to a new disk completely and use Ceph > to repair bad objects. > > Thanks, > Guang > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Thanks Sage! > Date: Fri, 7 Nov 2014 02:19:06 -0800 > From: s...@newdream.net > To: yguan...@outlook.com > CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com > Subject: Re: PG inconsistency > > On Thu, 6 Nov 2014, GuangYang wrote: >> Hello Cephers, >> Recently we observed a couple of inconsistencies in our Ceph cluster, >> there were two major patterns leading to inconsistency as I observed: 1) >> EIO to read the file, 2) the digest is inconsistent (for EC) even there >> is no read error). >> >> While ceph has built-in tool sets to repair the inconsistencies, I also >> would like to check with the community in terms of what is the best ways >> to handle such issues (e.g. should we run fsck / xfs_repair when such >> issue happens). >> >> In more details, I have the following questions: >> 1. When there is inconsistency detected, what is the chance there is >> some hardware issues which need to be repaired physically, or should I >> run some disk/filesystem tools to further check? > > I'm not really an operator so I'm not as familiar with these tools as I > should be :(, but I suspect the prodent route is to check the SMART info > on the disk, and/or trigger a scrub of everything else on the OSD (ceph > osd scrub N). For DreamObjects, I think they usually just fail the OSD > once it starts throwing bad sectors (most of the hardware is already > reasonably aged). Google's data also shows the strong correlation between scrub error (especially several SMART parameters) and disk failure - https://www.usenix.org/legacy/event/fast07/tech/full_papers/pinheiro/pinheiro.pdf. > >> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should >> we solely relay on Ceph's repair tool sets? > > That might not be a bad idea, but I would urge caution if xfs_repair finds > any issues or makes any changes, as subtle changes to the fs contents can > confuse ceph-osd. At an absolute minimum, do a full scrub after, but > even better would be to fail the OSD. > > (FWIW I think we should document a recommended "safe" process for > failing/replacing an OSD that takes the suspect data offline but waits for > the cluster to heal before destroying any data. Simply marking the OSD > out will work, but then when a fresh drive is added there will be a second > repair/rebalance event, which isn't ideal.) Yeah that would be very helpful, I think the first decision to make is to whether should we replace the disk, in our clusters, there is data corruption (EIO) along with SMART warnings, which is an indicator of bad disk, meanwhile, we also observed xattr is lost (http://tracker.ceph.com/issues/10018) without any SMART warnings, after talking to Sam, we suspected it might be due to unexpected host rebooting (or mis-configured RAID controller), in which case we properly no need to replace the disk but only repair by ceph. In terms of disk replacement, to avoid migrating data back and forth, are the below two approaches reasonable? 1. Keep the OSD in and do an ad-hoc disk replacement and provision a new OSD (so that keep the OSD id as the same), and then trigger data migration. In this way the data migration only happens once, however, it does require operators to replace the disk very fast. 2. Move the data on the broken disk to a new disk completely and use Ceph to repair bad objects. Thanks, Guang > > sage > >> >> It would be great to hear you experience and suggestions. >> >> BTW, we are using XFS in the cluster. >> >> Thanks, >> Guang >> Nyb?v?{.n??z??ayj???f:+v??zZ+??"?!? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
On Thu, 6 Nov 2014, GuangYang wrote: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: 1) > EIO to read the file, 2) the digest is inconsistent (for EC) even there > is no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best ways > to handle such issues (e.g. should we run fsck / xfs_repair when such > issue happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is > some hardware issues which need to be repaired physically, or should I > run some disk/filesystem tools to further check? I'm not really an operator so I'm not as familiar with these tools as I should be :(, but I suspect the prodent route is to check the SMART info on the disk, and/or trigger a scrub of everything else on the OSD (ceph osd scrub N). For DreamObjects, I think they usually just fail the OSD once it starts throwing bad sectors (most of the hardware is already reasonably aged). > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should > we solely relay on Ceph's repair tool sets? That might not be a bad idea, but I would urge caution if xfs_repair finds any issues or makes any changes, as subtle changes to the fs contents can confuse ceph-osd. At an absolute minimum, do a full scrub after, but even better would be to fail the OSD. (FWIW I think we should document a recommended "safe" process for failing/replacing an OSD that takes the suspect data offline but waits for the cluster to heal before destroying any data. Simply marking the OSD out will work, but then when a fresh drive is added there will be a second repair/rebalance event, which isn't ideal.) sage > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang > Nyb?v?{.n??z??ayj???f:+v??zZ+??"?!? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
IIRC, the EIO we had also correlated with a SMART status that showed the disk was bad enough for a warranty replacement -- so yes, I replaced the disk in these cases. Cheers, Dan On Thu Nov 06 2014 at 2:44:08 PM GuangYang wrote: > Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the > disk? Not an filesystem expert here, but would like to understand the > underlying what happened behind the EIO and does that reveal something > (e.g. hardware issue). > > In our case, we are using 6TB drive so that there are lot of data to > migrate and as backfilling/recovering bring latency increasing, we hope to > avoid that as much as we can.. > > Thanks, > Guang > > > > From: daniel.vanders...@cern.ch > > Date: Thu, 6 Nov 2014 13:36:46 + > > Subject: Re: PG inconsistency > > To: yguan...@outlook.com; ceph-users@lists.ceph.com > > > > Hi, > > I've only ever seen (1), EIO to read a file. In this case I've always > > just killed / formatted / replaced that OSD completely -- that moves > > the PG to a new master and the new replication "fixes" the > > inconsistency. This way, I've never had to pg repair. I don't know if > > this is a best or even good practise, but it works for us. > > Cheers, Dan > > > > On Thu Nov 06 2014 at 2:24:32 PM GuangYang > > mailto:yguan...@outlook.com>> wrote: > > Hello Cephers, > > Recently we observed a couple of inconsistencies in our Ceph cluster, > > there were two major patterns leading to inconsistency as I observed: > > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even > > there is no read error). > > > > While ceph has built-in tool sets to repair the inconsistencies, I also > > would like to check with the community in terms of what is the best > > ways to handle such issues (e.g. should we run fsck / xfs_repair when > > such issue happens). > > > > In more details, I have the following questions: > > 1. When there is inconsistency detected, what is the chance there is > > some hardware issues which need to be repaired physically, or should I > > run some disk/filesystem tools to further check? > > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or > > should we solely relay on Ceph's repair tool sets? > > > > It would be great to hear you experience and suggestions. > > > > BTW, we are using XFS in the cluster. > > > > Thanks, > > Guang > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Thu Nov 06 2014 at 16:44:09, GuangYang : > Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the > disk? Not an filesystem expert here, but would like to understand the > underlying what happened behind the EIO and does that reveal something > (e.g. hardware issue). > > In our case, we are using 6TB drive so that there are lot of data to > migrate and as backfilling/recovering bring latency increasing, we hope to > avoid that as much as we can.. > For example, use the following parameters: osd_recovery_delay_start = 10 osd recovery op priority = 2 osd max backfills = 1 osd recovery max active =1 osd recovery threads = 1 > > Thanks, > Guang > > > > From: daniel.vanders...@cern.ch > > Date: Thu, 6 Nov 2014 13:36:46 + > > Subject: Re: PG inconsistency > > To: yguan...@outlook.com; ceph-users@lists.ceph.com > > > > Hi, > > I've only ever seen (1), EIO to read a file. In this case I've always > > just killed / formatted / replaced that OSD completely -- that moves > > the PG to a new master and the new replication "fixes" the > > inconsistency. This way, I've never had to pg repair. I don't know if > > this is a best or even good practise, but it works for us. > > Cheers, Dan > > > > On Thu Nov 06 2014 at 2:24:32 PM GuangYang > > mailto:yguan...@outlook.com>> wrote: > > Hello Cephers, > > Recently we observed a couple of inconsistencies in our Ceph cluster, > > there were two major patterns leading to inconsistency as I observed: > > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even > > there is no read error). > > > > While ceph has built-in tool sets to repair the inconsistencies, I also > > would like to check with the community in terms of what is the best > > ways to handle such issues (e.g. should we run fsck / xfs_repair when > > such issue happens). > > > > In more details, I have the following questions: > > 1. When there is inconsistency detected, what is the chance there is > > some hardware issues which need to be repaired physically, or should I > > run some disk/filesystem tools to further check? > > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or > > should we solely relay on Ceph's repair tool sets? > > > > It would be great to hear you experience and suggestions. > > > > BTW, we are using XFS in the cluster. > > > > Thanks, > > Guang > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
We are using v0.80.4. Just would like to ask for general suggestion here :) Thanks, Guang > From: malm...@gmail.com > Date: Thu, 6 Nov 2014 13:46:12 + > Subject: Re: [ceph-users] PG inconsistency > To: yguan...@outlook.com; ceph-de...@vger.kernel.org; > ceph-users@lists.ceph.com > > What is your version of the ceph? > 0.80.0 - 0.80.3 > https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b > > Thu Nov 06 2014 at 16:24:21, GuangYang > mailto:yguan...@outlook.com>>: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even > there is no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best > ways to handle such issues (e.g. should we run fsck / xfs_repair when > such issue happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is > some hardware issues which need to be repaired physically, or should I > run some disk/filesystem tools to further check? > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or > should we solely relay on Ceph's repair tool sets? > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang > ___ > ceph-users mailing list > ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
What is your version of the ceph? 0.80.0 - 0.80.3 https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b Thu Nov 06 2014 at 16:24:21, GuangYang : > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: 1) > EIO to read the file, 2) the digest is inconsistent (for EC) even there is > no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best ways to > handle such issues (e.g. should we run fsck / xfs_repair when such issue > happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is some > hardware issues which need to be repaired physically, or should I run some > disk/filesystem tools to further check? > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should > we solely relay on Ceph's repair tool sets? > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the disk? Not an filesystem expert here, but would like to understand the underlying what happened behind the EIO and does that reveal something (e.g. hardware issue). In our case, we are using 6TB drive so that there are lot of data to migrate and as backfilling/recovering bring latency increasing, we hope to avoid that as much as we can.. Thanks, Guang > From: daniel.vanders...@cern.ch > Date: Thu, 6 Nov 2014 13:36:46 + > Subject: Re: PG inconsistency > To: yguan...@outlook.com; ceph-users@lists.ceph.com > > Hi, > I've only ever seen (1), EIO to read a file. In this case I've always > just killed / formatted / replaced that OSD completely -- that moves > the PG to a new master and the new replication "fixes" the > inconsistency. This way, I've never had to pg repair. I don't know if > this is a best or even good practise, but it works for us. > Cheers, Dan > > On Thu Nov 06 2014 at 2:24:32 PM GuangYang > mailto:yguan...@outlook.com>> wrote: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even > there is no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best > ways to handle such issues (e.g. should we run fsck / xfs_repair when > such issue happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is > some hardware issues which need to be repaired physically, or should I > run some disk/filesystem tools to further check? > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or > should we solely relay on Ceph's repair tool sets? > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG inconsistency
Hi, I've only ever seen (1), EIO to read a file. In this case I've always just killed / formatted / replaced that OSD completely -- that moves the PG to a new master and the new replication "fixes" the inconsistency. This way, I've never had to pg repair. I don't know if this is a best or even good practise, but it works for us. Cheers, Dan On Thu Nov 06 2014 at 2:24:32 PM GuangYang wrote: > Hello Cephers, > Recently we observed a couple of inconsistencies in our Ceph cluster, > there were two major patterns leading to inconsistency as I observed: 1) > EIO to read the file, 2) the digest is inconsistent (for EC) even there is > no read error). > > While ceph has built-in tool sets to repair the inconsistencies, I also > would like to check with the community in terms of what is the best ways to > handle such issues (e.g. should we run fsck / xfs_repair when such issue > happens). > > In more details, I have the following questions: > 1. When there is inconsistency detected, what is the chance there is some > hardware issues which need to be repaired physically, or should I run some > disk/filesystem tools to further check? > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should > we solely relay on Ceph's repair tool sets? > > It would be great to hear you experience and suggestions. > > BTW, we are using XFS in the cluster. > > Thanks, > Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG inconsistency
Hello Cephers, Recently we observed a couple of inconsistencies in our Ceph cluster, there were two major patterns leading to inconsistency as I observed: 1) EIO to read the file, 2) the digest is inconsistent (for EC) even there is no read error). While ceph has built-in tool sets to repair the inconsistencies, I also would like to check with the community in terms of what is the best ways to handle such issues (e.g. should we run fsck / xfs_repair when such issue happens). In more details, I have the following questions: 1. When there is inconsistency detected, what is the chance there is some hardware issues which need to be repaired physically, or should I run some disk/filesystem tools to further check? 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we solely relay on Ceph's repair tool sets? It would be great to hear you experience and suggestions. BTW, we are using XFS in the cluster. Thanks, Guang ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com