Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Christian Wuerdig
I'm not a big expert but the OP said he's suspecting bitrot is at least part of issue in which case you can have the situation where the drive has ACK'ed the write but a later scrub discovered checksum errors Plus you don't need to actually loose a drive to get inconsistent pgs with size=2 min_size

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Denes Dolhay
Hi Greg, Accepting the fact, that an osd with outdated data can never accept write, or io of any kind, how is it possible, that the system goes into this state? -All osds are Bluestore, checksum, mtime etc. -All osds are up and in -No hw failures, lost disks, damaged journals or databases e

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Gregory Farnum
On Thu, Nov 2, 2017 at 1:21 AM koukou73gr wrote: > The scenario is actually a bit different, see: > > Let's assume size=2, min_size=1 > -We are looking at pg "A" acting [1, 2] > -osd 1 goes down > -osd 2 accepts a write for pg "A" > -osd 2 goes down > -osd 1 comes back up, while osd 2 still down

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
Never mind, I should’ve read the whole thread first. > On Nov 2, 2017, at 10:50 AM, Hans van den Bogert wrote: > > >> On Nov 1, 2017, at 4:45 PM, David Turner > > wrote: >> >> All it takes for data loss is that an osd on server 1 is marked down and a >> write happ

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
> On Nov 1, 2017, at 4:45 PM, David Turner wrote: > > All it takes for data loss is that an osd on server 1 is marked down and a > write happens to an osd on server 2. Now the osd on server 2 goes down > before the osd on server 1 has finished backfilling and the first osd > receives a reque

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread koukou73gr
The scenario is actually a bit different, see: Let's assume size=2, min_size=1 -We are looking at pg "A" acting [1, 2] -osd 1 goes down -osd 2 accepts a write for pg "A" -osd 2 goes down -osd 1 comes back up, while osd 2 still down -osd 1 has no way to know osd 2 accepted a write in pg "A" -osd 1

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread David Turner
In that thread, I really like how Wido puts it. He takes out any bit of code paths, bugs, etc... In reference to size=3 min_size=1 he says, "Loosing two disks at the same time is something which doesn't happen that much, but if it happens you don't want to modify any data on the only copy which y

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread David Turner
I don't know. I've seen several cases where people have inconsistent pgs that they can't recover from and they didn't lose any disks. The most common thread between them is min_size=1. My postulated scenario might not be the actual path in the code that leads to it, but something does... and min

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Gregory Farnum
On Wed, Nov 1, 2017 at 11:27 AM Denes Dolhay wrote: > Hello, > I have a trick question for Mr. Turner's scenario: > Let's assume size=2, min_size=1 > -We are looking at pg "A" acting [1, 2] > -osd 1 goes down, OK > -osd 1 comes back up, backfill of pg "A" commences from osd 2 to osd 1, OK > -osd

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Denes Dolhay
Hello, I have a trick question for Mr. Turner's scenario: Let's assume size=2, min_size=1 -We are looking at pg "A" acting [1, 2] -osd 1 goes down, OK -osd 1 comes back up, backfill of pg "A" commences from osd 2 to osd 1, OK -osd 2 goes down (and therefore pg "A" 's backfill to osd 1 is incompl

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread David Turner
RAID may make it likely that disk failures aren't going to be the cause of your data loss, but none of my examples referred to hardware failure. The daemon and the code having issues causing OSDs to restart or just not respond long enough to be marked down. Data loss in this case isn't talking ab

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Mario Giammarco
I have read your post then read the thread you suggested, very interesting. Then I read again your post and understood better. The most important thing is that even with min_size=1 writes are acknowledged after ceph wrote size=2 copies. In the thread above there is: As David already said, when all

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Gregory Farnum
Okay, so just to be clear you *haven't* run pg repair yet? These PG copies look wildly different, but maybe I'm misunderstanding something about the output. I would run the repair first and see if that makes things happy. If you're running on Bluestore, it will *not* break anything or "repair" wi

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread David Turner
It looks like you're running with a size = 2 and min_size = 1 (the min_size is a guess, the size is based on how many osds belong to your problem PGs). Here's some good reading for you. https://www.spinics.net/lists/ceph-users/msg32895.html Basically the jist is that when running with size = 2 yo

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-01 Thread Mario Giammarco
Sure here it is ceph -s: cluster: id: 8bc45d9a-ef50-4038-8e1b-1f25ac46c945 health: HEALTH_ERR 100 scrub errors Possible data damage: 56 pgs inconsistent services: mon: 3 daemons, quorum 0,1,pve3 mgr: pve3(active) osd: 3 osds: 3 up, 3 in data: pools:

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Gregory Farnum
You'll need to tell us exactly what error messages you're seeing, what the output of ceph -s is, and the output of pg query for the relevant PGs. There's not a lot of documentation because much of this tooling is new, it's changing quickly, and most people don't have the kinds of problems that turn

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Mario Giammarco
>[Questions to the list] >How is it possible that the cluster cannot repair itself with ceph pg repair? >No good copies are remaining? >Cannot decide which copy is valid or up-to date? >If so, why not, when there is checksum, mtime for everything? >In this inconsistent state which object does th

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Mario Giammarco
>In general you should find that clusters running bluestore are much more >effective about doing a repair automatically (because bluestore has >checksums on all data, it knows which object is correct!), but there are >still some situations where they won't. If that happens to you, I would not >f

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-30 Thread Gregory Farnum
On Sat, Oct 28, 2017 at 5:38 AM Denes Dolhay wrote: > Hello, > > First of all, I would recommend, that you use ceph pg repair wherever you > can. > > > When you have size=3 the cluster can compare 3 instances, therefore it is > easier for it to spot which two is good, and which one is bad. > > Wh

Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-10-28 Thread Denes Dolhay
Hello, First of all, I would recommend, that you use ceph pg repair wherever you can. When you have size=3 the cluster can compare 3 instances, therefore it is easier for it to spot which two is good, and which one is bad. When you use size=2 the case is harder for o-so-many ways: -Accord

[ceph-users] PGs inconsistent, do I fear data loss?

2017-10-28 Thread Mario Giammarco
Hello, we recently upgraded two clusters to Ceph luminous with bluestore and we discovered that we have many more pgs in state active+clean+inconsistent. (Possible data damage, xx pgs inconsistent) This is probably due to checksums in bluestore that discover more errors. We have some pools with r