Re: [ceph-users] PG inconsistency

2014-11-10 Thread Craig Lewis
For #1, it depends what you mean by fast.  I wouldn't worry about it taking
15 minutes.

If you mark the old OSD out, ceph will start remapping data immediately,
including a bunch of PGs on unrelated OSDs.  Once you replace the disk, and
put the same OSDID back in the same host, the CRUSH map will be back to
what it was before you started.  All of those remaps on unrelated OSDs will
reverse.  They'll complete fairly quickly, because they only have to
backfill the data that was written during the remap.


I prefer #1.  ceph pg repair will just overwrite the replicas with whatever
the primary OSD has, which may copy bad data from your bad OSD over good
replicas.  So #2 has the potential to corrupt the data.  #1 will delete the
data you know is bad, leaving only good data behind to replicate.  Once
ceph pg repair gets more intelligent, I'll revisit this.

I also prefer the simplicity.  If it's dead or corrupt, they're treated the
same.




On Sun, Nov 9, 2014 at 7:25 PM, GuangYang  wrote:

>
> In terms of disk replacement, to avoid migrating data back and forth, are
> the below two approaches reasonable?
>  1. Keep the OSD in and do an ad-hoc disk replacement and provision a new
> OSD (so that keep the OSD id as the same), and then trigger data migration.
> In this way the data migration only happens once, however, it does require
> operators to replace the disk very fast.
>  2. Move the data on the broken disk to a new disk completely and use Ceph
> to repair bad objects.
>
> Thanks,
> Guang
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-09 Thread GuangYang
Thanks Sage!


> Date: Fri, 7 Nov 2014 02:19:06 -0800
> From: s...@newdream.net
> To: yguan...@outlook.com
> CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
> Subject: Re: PG inconsistency
>
> On Thu, 6 Nov 2014, GuangYang wrote:
>> Hello Cephers,
>> Recently we observed a couple of inconsistencies in our Ceph cluster,
>> there were two major patterns leading to inconsistency as I observed: 1)
>> EIO to read the file, 2) the digest is inconsistent (for EC) even there
>> is no read error).
>>
>> While ceph has built-in tool sets to repair the inconsistencies, I also
>> would like to check with the community in terms of what is the best ways
>> to handle such issues (e.g. should we run fsck / xfs_repair when such
>> issue happens).
>>
>> In more details, I have the following questions:
>> 1. When there is inconsistency detected, what is the chance there is
>> some hardware issues which need to be repaired physically, or should I
>> run some disk/filesystem tools to further check?
>
> I'm not really an operator so I'm not as familiar with these tools as I
> should be :(, but I suspect the prodent route is to check the SMART info
> on the disk, and/or trigger a scrub of everything else on the OSD (ceph
> osd scrub N). For DreamObjects, I think they usually just fail the OSD
> once it starts throwing bad sectors (most of the hardware is already
> reasonably aged).
Google's data also shows the strong correlation between scrub error (especially 
several SMART parameters) and disk failure - 
https://www.usenix.org/legacy/event/fast07/tech/full_papers/pinheiro/pinheiro.pdf.
>
>> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should
>> we solely relay on Ceph's repair tool sets?
>
> That might not be a bad idea, but I would urge caution if xfs_repair finds
> any issues or makes any changes, as subtle changes to the fs contents can
> confuse ceph-osd. At an absolute minimum, do a full scrub after, but
> even better would be to fail the OSD.
>
> (FWIW I think we should document a recommended "safe" process for
> failing/replacing an OSD that takes the suspect data offline but waits for
> the cluster to heal before destroying any data. Simply marking the OSD
> out will work, but then when a fresh drive is added there will be a second
> repair/rebalance event, which isn't ideal.)
Yeah that would be very helpful, I think the first decision to make is to 
whether should we replace the disk, in our clusters, there is data corruption 
(EIO) along with SMART warnings, which is an indicator of bad disk, meanwhile, 
we also observed xattr is lost (http://tracker.ceph.com/issues/10018) without 
any SMART warnings, after talking to Sam, we suspected it might be due to 
unexpected host rebooting (or mis-configured RAID controller), in which case we 
properly no need to replace the disk but only repair by ceph.

In terms of disk replacement, to avoid migrating data back and forth, are the 
below two approaches reasonable?
 1. Keep the OSD in and do an ad-hoc disk replacement and provision a new OSD 
(so that keep the OSD id as the same), and then trigger data migration. In this 
way the data migration only happens once, however, it does require operators to 
replace the disk very fast.
 2. Move the data on the broken disk to a new disk completely and use Ceph to 
repair bad objects.

Thanks,
Guang

>
> sage
>
>>
>> It would be great to hear you experience and suggestions.
>>
>> BTW, we are using XFS in the cluster.
>>
>> Thanks,
>> Guang 
>> Nyb?v?{.n??z??ayj???f:+v??zZ+??"?!?
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-07 Thread Sage Weil
On Thu, 6 Nov 2014, GuangYang wrote:
> Hello Cephers,
> Recently we observed a couple of inconsistencies in our Ceph cluster, 
> there were two major patterns leading to inconsistency as I observed: 1) 
> EIO to read the file, 2) the digest is inconsistent (for EC) even there 
> is no read error).
> 
> While ceph has built-in tool sets to repair the inconsistencies, I also 
> would like to check with the community in terms of what is the best ways 
> to handle such issues (e.g. should we run fsck / xfs_repair when such 
> issue happens).
> 
> In more details, I have the following questions:
> 1. When there is inconsistency detected, what is the chance there is 
> some hardware issues which need to be repaired physically, or should I 
> run some disk/filesystem tools to further check?

I'm not really an operator so I'm not as familiar with these tools as I 
should be :(, but I suspect the prodent route is to check the SMART info 
on the disk, and/or trigger a scrub of everything else on the OSD (ceph 
osd scrub N).  For DreamObjects, I think they usually just fail the OSD 
once it starts throwing bad sectors (most of the hardware is already 
reasonably aged).

> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should 
> we solely relay on Ceph's repair tool sets?

That might not be a bad idea, but I would urge caution if xfs_repair finds 
any issues or makes any changes, as subtle changes to the fs contents can 
confuse ceph-osd.  At an absolute minimum, do a full scrub after, but 
even better would be to fail the OSD.

(FWIW I think we should document a recommended "safe" process for 
failing/replacing an OSD that takes the suspect data offline but waits for 
the cluster to heal before destroying any data.  Simply marking the OSD 
out will work, but then when a fresh drive is added there will be a second 
repair/rebalance event, which isn't ideal.)

sage

> 
> It would be great to hear you experience and suggestions.
> 
> BTW, we are using XFS in the cluster.
> 
> Thanks,
> Guang   
> Nyb?v?{.n??z??ayj???f:+v??zZ+??"?!?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
IIRC, the EIO we had also correlated with a SMART status that showed the
disk was bad enough for a warranty replacement -- so yes, I replaced the
disk in these cases.

Cheers, Dan

On Thu Nov 06 2014 at 2:44:08 PM GuangYang  wrote:

> Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the
> disk? Not an filesystem expert here, but would like to understand the
> underlying what happened behind the EIO and does that reveal something
> (e.g. hardware issue).
>
> In our case, we are using 6TB drive so that there are lot of data to
> migrate and as backfilling/recovering bring latency increasing, we hope to
> avoid that as much as we can..
>
> Thanks,
> Guang
>
> 
> > From: daniel.vanders...@cern.ch
> > Date: Thu, 6 Nov 2014 13:36:46 +
> > Subject: Re: PG inconsistency
> > To: yguan...@outlook.com; ceph-users@lists.ceph.com
> >
> > Hi,
> > I've only ever seen (1), EIO to read a file. In this case I've always
> > just killed / formatted / replaced that OSD completely -- that moves
> > the PG to a new master and the new replication "fixes" the
> > inconsistency. This way, I've never had to pg repair. I don't know if
> > this is a best or even good practise, but it works for us.
> > Cheers, Dan
> >
> > On Thu Nov 06 2014 at 2:24:32 PM GuangYang
> > mailto:yguan...@outlook.com>> wrote:
> > Hello Cephers,
> > Recently we observed a couple of inconsistencies in our Ceph cluster,
> > there were two major patterns leading to inconsistency as I observed:
> > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even
> > there is no read error).
> >
> > While ceph has built-in tool sets to repair the inconsistencies, I also
> > would like to check with the community in terms of what is the best
> > ways to handle such issues (e.g. should we run fsck / xfs_repair when
> > such issue happens).
> >
> > In more details, I have the following questions:
> > 1. When there is inconsistency detected, what is the chance there is
> > some hardware issues which need to be repaired physically, or should I
> > run some disk/filesystem tools to further check?
> > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or
> > should we solely relay on Ceph's repair tool sets?
> >
> > It would be great to hear you experience and suggestions.
> >
> > BTW, we are using XFS in the cluster.
> >
> > Thanks,
> > Guang
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
Thu Nov 06 2014 at 16:44:09, GuangYang :

> Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the
> disk? Not an filesystem expert here, but would like to understand the
> underlying what happened behind the EIO and does that reveal something
> (e.g. hardware issue).
>
> In our case, we are using 6TB drive so that there are lot of data to
> migrate and as backfilling/recovering bring latency increasing, we hope to
> avoid that as much as we can..
>

For example, use the following parameters:
osd_recovery_delay_start = 10
osd recovery op priority = 2
osd max backfills = 1
osd recovery max active =1
osd recovery threads = 1



>
> Thanks,
> Guang
>
> 
> > From: daniel.vanders...@cern.ch
> > Date: Thu, 6 Nov 2014 13:36:46 +
> > Subject: Re: PG inconsistency
> > To: yguan...@outlook.com; ceph-users@lists.ceph.com
> >
> > Hi,
> > I've only ever seen (1), EIO to read a file. In this case I've always
> > just killed / formatted / replaced that OSD completely -- that moves
> > the PG to a new master and the new replication "fixes" the
> > inconsistency. This way, I've never had to pg repair. I don't know if
> > this is a best or even good practise, but it works for us.
> > Cheers, Dan
> >
> > On Thu Nov 06 2014 at 2:24:32 PM GuangYang
> > mailto:yguan...@outlook.com>> wrote:
> > Hello Cephers,
> > Recently we observed a couple of inconsistencies in our Ceph cluster,
> > there were two major patterns leading to inconsistency as I observed:
> > 1) EIO to read the file, 2) the digest is inconsistent (for EC) even
> > there is no read error).
> >
> > While ceph has built-in tool sets to repair the inconsistencies, I also
> > would like to check with the community in terms of what is the best
> > ways to handle such issues (e.g. should we run fsck / xfs_repair when
> > such issue happens).
> >
> > In more details, I have the following questions:
> > 1. When there is inconsistency detected, what is the chance there is
> > some hardware issues which need to be repaired physically, or should I
> > run some disk/filesystem tools to further check?
> > 2. Should we use fsck / xfs_repair to fix the inconsistencies, or
> > should we solely relay on Ceph's repair tool sets?
> >
> > It would be great to hear you experience and suggestions.
> >
> > BTW, we are using XFS in the cluster.
> >
> > Thanks,
> > Guang
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
We are using v0.80.4. Just would like to ask for general suggestion here :)

Thanks,
Guang


> From: malm...@gmail.com 
> Date: Thu, 6 Nov 2014 13:46:12 + 
> Subject: Re: [ceph-users] PG inconsistency 
> To: yguan...@outlook.com; ceph-de...@vger.kernel.org; 
> ceph-users@lists.ceph.com 
> 
> What is your version of the ceph? 
> 0.80.0 - 0.80.3 
> https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b 
> 
> Thu Nov 06 2014 at 16:24:21, GuangYang 
> mailto:yguan...@outlook.com>>: 
> Hello Cephers, 
> Recently we observed a couple of inconsistencies in our Ceph cluster, 
> there were two major patterns leading to inconsistency as I observed: 
> 1) EIO to read the file, 2) the digest is inconsistent (for EC) even 
> there is no read error). 
> 
> While ceph has built-in tool sets to repair the inconsistencies, I also 
> would like to check with the community in terms of what is the best 
> ways to handle such issues (e.g. should we run fsck / xfs_repair when 
> such issue happens). 
> 
> In more details, I have the following questions: 
> 1. When there is inconsistency detected, what is the chance there is 
> some hardware issues which need to be repaired physically, or should I 
> run some disk/filesystem tools to further check? 
> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or 
> should we solely relay on Ceph's repair tool sets? 
> 
> It would be great to hear you experience and suggestions. 
> 
> BTW, we are using XFS in the cluster. 
> 
> Thanks, 
> Guang 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com> 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Irek Fasikhov
What is your version of the ceph?
0.80.0 - 0.80.3
https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b

Thu Nov 06 2014 at 16:24:21, GuangYang :

> Hello Cephers,
> Recently we observed a couple of inconsistencies in our Ceph cluster,
> there were two major patterns leading to inconsistency as I observed: 1)
> EIO to read the file, 2) the digest is inconsistent (for EC) even there is
> no read error).
>
> While ceph has built-in tool sets to repair the inconsistencies, I also
> would like to check with the community in terms of what is the best ways to
> handle such issues (e.g. should we run fsck / xfs_repair when such issue
> happens).
>
> In more details, I have the following questions:
> 1. When there is inconsistency detected, what is the chance there is some
> hardware issues which need to be repaired physically, or should I run some
> disk/filesystem tools to further check?
> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should
> we solely relay on Ceph's repair tool sets?
>
> It would be great to hear you experience and suggestions.
>
> BTW, we are using XFS in the cluster.
>
> Thanks,
> Guang
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Thanks Dan. By "killed/formatted/replaced the OSD", did you replace the disk? 
Not an filesystem expert here, but would like to understand the underlying what 
happened behind the EIO and does that reveal something (e.g. hardware issue).

In our case, we are using 6TB drive so that there are lot of data to migrate 
and as backfilling/recovering bring latency increasing, we hope to avoid that 
as much as we can..

Thanks,
Guang


> From: daniel.vanders...@cern.ch 
> Date: Thu, 6 Nov 2014 13:36:46 + 
> Subject: Re: PG inconsistency 
> To: yguan...@outlook.com; ceph-users@lists.ceph.com 
> 
> Hi, 
> I've only ever seen (1), EIO to read a file. In this case I've always 
> just killed / formatted / replaced that OSD completely -- that moves 
> the PG to a new master and the new replication "fixes" the 
> inconsistency. This way, I've never had to pg repair. I don't know if 
> this is a best or even good practise, but it works for us. 
> Cheers, Dan 
> 
> On Thu Nov 06 2014 at 2:24:32 PM GuangYang 
> mailto:yguan...@outlook.com>> wrote: 
> Hello Cephers, 
> Recently we observed a couple of inconsistencies in our Ceph cluster, 
> there were two major patterns leading to inconsistency as I observed: 
> 1) EIO to read the file, 2) the digest is inconsistent (for EC) even 
> there is no read error). 
> 
> While ceph has built-in tool sets to repair the inconsistencies, I also 
> would like to check with the community in terms of what is the best 
> ways to handle such issues (e.g. should we run fsck / xfs_repair when 
> such issue happens). 
> 
> In more details, I have the following questions: 
> 1. When there is inconsistency detected, what is the chance there is 
> some hardware issues which need to be repaired physically, or should I 
> run some disk/filesystem tools to further check? 
> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or 
> should we solely relay on Ceph's repair tool sets? 
> 
> It would be great to hear you experience and suggestions. 
> 
> BTW, we are using XFS in the cluster. 
> 
> Thanks, 
> Guang 
  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG inconsistency

2014-11-06 Thread Dan van der Ster
Hi,
I've only ever seen (1), EIO to read a file. In this case I've always just
killed / formatted / replaced that OSD completely -- that moves the PG to a
new master and the new replication "fixes" the inconsistency. This way,
I've never had to pg repair. I don't know if this is a best or even good
practise, but it works for us.
Cheers, Dan

On Thu Nov 06 2014 at 2:24:32 PM GuangYang  wrote:

> Hello Cephers,
> Recently we observed a couple of inconsistencies in our Ceph cluster,
> there were two major patterns leading to inconsistency as I observed: 1)
> EIO to read the file, 2) the digest is inconsistent (for EC) even there is
> no read error).
>
> While ceph has built-in tool sets to repair the inconsistencies, I also
> would like to check with the community in terms of what is the best ways to
> handle such issues (e.g. should we run fsck / xfs_repair when such issue
> happens).
>
> In more details, I have the following questions:
> 1. When there is inconsistency detected, what is the chance there is some
> hardware issues which need to be repaired physically, or should I run some
> disk/filesystem tools to further check?
> 2. Should we use fsck / xfs_repair to fix the inconsistencies, or should
> we solely relay on Ceph's repair tool sets?
>
> It would be great to hear you experience and suggestions.
>
> BTW, we are using XFS in the cluster.
>
> Thanks,
> Guang
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG inconsistency

2014-11-06 Thread GuangYang
Hello Cephers,
Recently we observed a couple of inconsistencies in our Ceph cluster, there 
were two major patterns leading to inconsistency as I observed: 1) EIO to read 
the file, 2) the digest is inconsistent (for EC) even there is no read error).

While ceph has built-in tool sets to repair the inconsistencies, I also would 
like to check with the community in terms of what is the best ways to handle 
such issues (e.g. should we run fsck / xfs_repair when such issue happens).

In more details, I have the following questions:
1. When there is inconsistency detected, what is the chance there is some 
hardware issues which need to be repaired physically, or should I run some 
disk/filesystem tools to further check?
2. Should we use fsck / xfs_repair to fix the inconsistencies, or should we 
solely relay on Ceph's repair tool sets?

It would be great to hear you experience and suggestions.

BTW, we are using XFS in the cluster.

Thanks,
Guang 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com