Re: [ceph-users] Another cluster completely hang

2016-06-30 Thread Brian ::
Hi Mario

Perhaps its covered under proxmox support, Do you have support on your
proxmox install from the guys in Proxmox?

Otherwise you can always buy from Redhat

https://www.redhat.com/en/technologies/storage/ceph



On Thu, Jun 30, 2016 at 7:37 AM, Mario Giammarco  wrote:
> Last two questions:
> 1) I have used other systems in the past. In case of split brain or serious
> problems they offered me to choose which copy is "good" and then work again.
> Is there a way to tell ceph that all is ok? This morning again I have 19
> incomplete pgs after recovery
> 2) Where can I find paid support? I mean someone that logs in to my cluster
> and tell cephs that all is active+clean
>
> Thanks,
> Mario
>
> Il giorno mer 29 giu 2016 alle ore 16:08 Mario Giammarco
>  ha scritto:
>>
>> This time at the end of recovery procedure you described it was like most
>> pgs active+clean 20 pgs incomplete.
>> After that when trying to use the cluster I got "request blocked more
>> than" and no vm can start.
>> I know that something has happened after the broken disk, probably a
>> server reboot. I am investigating.
>> But even if I find the origin of the problem it will not help in finding a
>> solution now.
>> So I am using my time in repairing the pool only to save the production
>> data and I will throw away the rest.
>> Now after marking all pgs as complete with ceph_objectstore_tool I see
>> that:
>>
>> 1) ceph has put out three hdds ( I suppose due to scrub but it is my only
>> my idea, I will check logs) BAD
>> 2) it is recovering for objects degraded and misplaced GOOD
>> 3) vm are not usable yet BAD
>> 4) I see some pgs in state down+peering (I hope is not BAD)
>>
>> Regarding 1) how I can put again that three hdds in the cluster? Should I
>> remove them from crush and start again?
>> Can I tell ceph that they are not bad?
>> Mario
>>
>> Il giorno mer 29 giu 2016 alle ore 15:34 Lionel Bouton
>>  ha scritto:
>>>
>>> Hi,
>>>
>>> Le 29/06/2016 12:00, Mario Giammarco a écrit :
>>> > Now the problem is that ceph has put out two disks because scrub  has
>>> > failed (I think it is not a disk fault but due to mark-complete)
>>>
>>> There is something odd going on. I've only seen deep-scrub failing (ie
>>> detect one inconsistency and marking the pg so) so I'm not sure what
>>> happens in the case of a "simple" scrub failure but what should not
>>> happen is the whole OSD going down on scrub of deepscrub fairure which
>>> you seem to imply did happen.
>>> Do you have logs for these two failures giving a hint at what happened
>>> (probably /var/log/ceph/ceph-osd..log) ? Any kernel log pointing to
>>> hardware failure(s) around the time these events happened ?
>>>
>>> Another point : you said that you had one disk "broken". Usually ceph
>>> handles this case in the following manner :
>>> - the OSD detects the problem and commit suicide (unless it's configured
>>> to ignore IO errors which is not the default),
>>> - your cluster is then in degraded state with one OSD down/in,
>>> - after a timeout (several minutes), Ceph decides that the OSD won't
>>> come up again soon and marks the OSD "out" (so one OSD down/out),
>>> - as the OSD is out, crush adapts pg positions based on the remaining
>>> available OSDs and bring back all degraded pg to clean state by creating
>>> missing replicas while moving pgs around. You see a lot of IO, many pg
>>> in wait_backfill/backfilling states at this point,
>>> - when all is done the cluster is back to HEALTH_OK
>>>
>>> When your disk was broken and you waited 24 hours how far along this
>>> process was your cluster ?
>>>
>>> Best regards,
>>>
>>> Lionel
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-30 Thread Mario Giammarco
Last two questions:
1) I have used other systems in the past. In case of split brain or serious
problems they offered me to choose which copy is "good" and then work
again. Is there a way to tell ceph that all is ok? This morning again I
have 19 incomplete pgs after recovery
2) Where can I find paid support? I mean someone that logs in to my cluster
and tell cephs that all is active+clean

Thanks,
Mario

Il giorno mer 29 giu 2016 alle ore 16:08 Mario Giammarco <
mgiamma...@gmail.com> ha scritto:

> This time at the end of recovery procedure you described it was like most
> pgs active+clean 20 pgs incomplete.
> After that when trying to use the cluster I got "request blocked more
> than" and no vm can start.
> I know that something has happened after the broken disk, probably a
> server reboot. I am investigating.
> But even if I find the origin of the problem it will not help in finding a
> solution now.
> So I am using my time in repairing the pool only to save the production
> data and I will throw away the rest.
> Now after marking all pgs as complete with ceph_objectstore_tool I see
> that:
>
> 1) ceph has put out three hdds ( I suppose due to scrub but it is my only
> my idea, I will check logs) BAD
> 2) it is recovering for objects degraded and misplaced GOOD
> 3) vm are not usable yet BAD
> 4) I see some pgs in state down+peering (I hope is not BAD)
>
> Regarding 1) how I can put again that three hdds in the cluster? Should I
> remove them from crush and start again?
> Can I tell ceph that they are not bad?
> Mario
>
> Il giorno mer 29 giu 2016 alle ore 15:34 Lionel Bouton <
> lionel+c...@bouton.name> ha scritto:
>
>> Hi,
>>
>> Le 29/06/2016 12:00, Mario Giammarco a écrit :
>> > Now the problem is that ceph has put out two disks because scrub  has
>> > failed (I think it is not a disk fault but due to mark-complete)
>>
>> There is something odd going on. I've only seen deep-scrub failing (ie
>> detect one inconsistency and marking the pg so) so I'm not sure what
>> happens in the case of a "simple" scrub failure but what should not
>> happen is the whole OSD going down on scrub of deepscrub fairure which
>> you seem to imply did happen.
>> Do you have logs for these two failures giving a hint at what happened
>> (probably /var/log/ceph/ceph-osd..log) ? Any kernel log pointing to
>> hardware failure(s) around the time these events happened ?
>>
>> Another point : you said that you had one disk "broken". Usually ceph
>> handles this case in the following manner :
>> - the OSD detects the problem and commit suicide (unless it's configured
>> to ignore IO errors which is not the default),
>> - your cluster is then in degraded state with one OSD down/in,
>> - after a timeout (several minutes), Ceph decides that the OSD won't
>> come up again soon and marks the OSD "out" (so one OSD down/out),
>> - as the OSD is out, crush adapts pg positions based on the remaining
>> available OSDs and bring back all degraded pg to clean state by creating
>> missing replicas while moving pgs around. You see a lot of IO, many pg
>> in wait_backfill/backfilling states at this point,
>> - when all is done the cluster is back to HEALTH_OK
>>
>> When your disk was broken and you waited 24 hours how far along this
>> process was your cluster ?
>>
>> Best regards,
>>
>> Lionel
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
This time at the end of recovery procedure you described it was like most
pgs active+clean 20 pgs incomplete.
After that when trying to use the cluster I got "request blocked more than"
and no vm can start.
I know that something has happened after the broken disk, probably a server
reboot. I am investigating.
But even if I find the origin of the problem it will not help in finding a
solution now.
So I am using my time in repairing the pool only to save the production
data and I will throw away the rest.
Now after marking all pgs as complete with ceph_objectstore_tool I see that:

1) ceph has put out three hdds ( I suppose due to scrub but it is my only
my idea, I will check logs) BAD
2) it is recovering for objects degraded and misplaced GOOD
3) vm are not usable yet BAD
4) I see some pgs in state down+peering (I hope is not BAD)

Regarding 1) how I can put again that three hdds in the cluster? Should I
remove them from crush and start again?
Can I tell ceph that they are not bad?
Mario

Il giorno mer 29 giu 2016 alle ore 15:34 Lionel Bouton <
lionel+c...@bouton.name> ha scritto:

> Hi,
>
> Le 29/06/2016 12:00, Mario Giammarco a écrit :
> > Now the problem is that ceph has put out two disks because scrub  has
> > failed (I think it is not a disk fault but due to mark-complete)
>
> There is something odd going on. I've only seen deep-scrub failing (ie
> detect one inconsistency and marking the pg so) so I'm not sure what
> happens in the case of a "simple" scrub failure but what should not
> happen is the whole OSD going down on scrub of deepscrub fairure which
> you seem to imply did happen.
> Do you have logs for these two failures giving a hint at what happened
> (probably /var/log/ceph/ceph-osd..log) ? Any kernel log pointing to
> hardware failure(s) around the time these events happened ?
>
> Another point : you said that you had one disk "broken". Usually ceph
> handles this case in the following manner :
> - the OSD detects the problem and commit suicide (unless it's configured
> to ignore IO errors which is not the default),
> - your cluster is then in degraded state with one OSD down/in,
> - after a timeout (several minutes), Ceph decides that the OSD won't
> come up again soon and marks the OSD "out" (so one OSD down/out),
> - as the OSD is out, crush adapts pg positions based on the remaining
> available OSDs and bring back all degraded pg to clean state by creating
> missing replicas while moving pgs around. You see a lot of IO, many pg
> in wait_backfill/backfilling states at this point,
> - when all is done the cluster is back to HEALTH_OK
>
> When your disk was broken and you waited 24 hours how far along this
> process was your cluster ?
>
> Best regards,
>
> Lionel
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Lionel Bouton
Hi,

Le 29/06/2016 12:00, Mario Giammarco a écrit :
> Now the problem is that ceph has put out two disks because scrub  has
> failed (I think it is not a disk fault but due to mark-complete)

There is something odd going on. I've only seen deep-scrub failing (ie
detect one inconsistency and marking the pg so) so I'm not sure what
happens in the case of a "simple" scrub failure but what should not
happen is the whole OSD going down on scrub of deepscrub fairure which
you seem to imply did happen.
Do you have logs for these two failures giving a hint at what happened
(probably /var/log/ceph/ceph-osd..log) ? Any kernel log pointing to
hardware failure(s) around the time these events happened ?

Another point : you said that you had one disk "broken". Usually ceph
handles this case in the following manner :
- the OSD detects the problem and commit suicide (unless it's configured
to ignore IO errors which is not the default),
- your cluster is then in degraded state with one OSD down/in,
- after a timeout (several minutes), Ceph decides that the OSD won't
come up again soon and marks the OSD "out" (so one OSD down/out),
- as the OSD is out, crush adapts pg positions based on the remaining
available OSDs and bring back all degraded pg to clean state by creating
missing replicas while moving pgs around. You see a lot of IO, many pg
in wait_backfill/backfilling states at this point,
- when all is done the cluster is back to HEALTH_OK

When your disk was broken and you waited 24 hours how far along this
process was your cluster ?

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Oliver Dzombic
Hi,

it does not.

But in your case, you have 10 OSD, and 7 of them have incomplete PG's.

So since your proxmox vps's are not on single PG's but spread across
many PG's you have a good chance that at least some data of any vps is
on one of the defect PG's.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.06.2016 um 13:09 schrieb Mario Giammarco:
> Just one question: why when ceph has some incomplete pgs it refuses to
> do I/o on good pgs?
> 
> Il giorno mer 29 giu 2016 alle ore 12:55 Oliver Dzombic
> > ha scritto:
> 
> Hi,
> 
> again:
> 
> You >must< check all your logs ( as fucky as it is for sure ).
> 
> Means on the ceph nodes in /var/log/ceph/*
> 
> And go back to the time where things went down the hill.
> 
> There must be something else going on, beyond normal osd crash.
> 
> And your manual pg repair/pg remove/pg set complete is, most probably,
> just getting your situation worst.
> 
> So really, if you want to have a chance to find out whats going on, you
> must check all the logs. Especially the OSD logs, especially the OSD log
> of the OSD you removed, and then the OSD logs of those pg, which are
> incomplete/stuck/what_ever_not_good.
> 
> --
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de 
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 29.06.2016 um 12:33 schrieb Mario Giammarco:
> > Thanks,
> > I can put in osds but the do not stay in, and I am pretty sure
> that are
> > not broken.
> >
> > Il giorno mer 29 giu 2016 alle ore 12:07 Oliver Dzombic
> > 
> >> ha
> scritto:
> >
> > hi,
> >
> > ceph osd set noscrub
> > ceph osd set nodeep-scrub
> >
> > ceph osd in 
> >
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de 
> >
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> > > Now the problem is that ceph has put out two disks because
> scrub  has
> > > failed (I think it is not a disk fault but due to mark-complete)
> > > How can I:
> > > - disable scrub
> > > - put in again the two disks
> > >
> > > I will wait anyway the end of recovery to be sure it really
> works
> > again
> > >
> > > Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> > > 
> >
> > 
>  scritto:
> > >
> > > Infact I am worried because:
> > >
> > > 1) ceph is under proxmox, and proxmox may decide to reboot a
> > server
> > > if it is not responding
> > > 2) probably a server was rebooted while ceph was
> reconstructing
> > > 3) even using max=3 do not help
> > >
> > > Anyway this is the "unofficial" procedure that I am
> using, much
> > > simpler than blog post:
> > >
> > > 1) find host where is pg
> > > 2) stop ceph in that host
> > > 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete
> > --data-path
> > > /var/lib/ceph/osd/ceph-9 --journal-path
> > > /var/lib/ceph/osd/ceph-9/journal
> > > 4) start ceph
> > > 5) look finally it reconstructing
> > >
> > > Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> > > 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Just one question: why when ceph has some incomplete pgs it refuses to do
I/o on good pgs?

Il giorno mer 29 giu 2016 alle ore 12:55 Oliver Dzombic <
i...@ip-interactive.de> ha scritto:

> Hi,
>
> again:
>
> You >must< check all your logs ( as fucky as it is for sure ).
>
> Means on the ceph nodes in /var/log/ceph/*
>
> And go back to the time where things went down the hill.
>
> There must be something else going on, beyond normal osd crash.
>
> And your manual pg repair/pg remove/pg set complete is, most probably,
> just getting your situation worst.
>
> So really, if you want to have a chance to find out whats going on, you
> must check all the logs. Especially the OSD logs, especially the OSD log
> of the OSD you removed, and then the OSD logs of those pg, which are
> incomplete/stuck/what_ever_not_good.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 29.06.2016 um 12:33 schrieb Mario Giammarco:
> > Thanks,
> > I can put in osds but the do not stay in, and I am pretty sure that are
> > not broken.
> >
> > Il giorno mer 29 giu 2016 alle ore 12:07 Oliver Dzombic
> > > ha scritto:
> >
> > hi,
> >
> > ceph osd set noscrub
> > ceph osd set nodeep-scrub
> >
> > ceph osd in 
> >
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de 
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> > > Now the problem is that ceph has put out two disks because scrub
> has
> > > failed (I think it is not a disk fault but due to mark-complete)
> > > How can I:
> > > - disable scrub
> > > - put in again the two disks
> > >
> > > I will wait anyway the end of recovery to be sure it really works
> > again
> > >
> > > Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> > > 
> > >> ha
> scritto:
> > >
> > > Infact I am worried because:
> > >
> > > 1) ceph is under proxmox, and proxmox may decide to reboot a
> > server
> > > if it is not responding
> > > 2) probably a server was rebooted while ceph was reconstructing
> > > 3) even using max=3 do not help
> > >
> > > Anyway this is the "unofficial" procedure that I am using, much
> > > simpler than blog post:
> > >
> > > 1) find host where is pg
> > > 2) stop ceph in that host
> > > 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete
> > --data-path
> > > /var/lib/ceph/osd/ceph-9 --journal-path
> > > /var/lib/ceph/osd/ceph-9/journal
> > > 4) start ceph
> > > 5) look finally it reconstructing
> > >
> > > Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> > > 
> > >> ha
> > scritto:
> > >
> > > Hi,
> > >
> > > removing ONE disk while your replication is 2, is no
> problem.
> > >
> > > You dont need to wait a single second to replace of remove
> > it. Its
> > > anyway not used and out/down. So from ceph's point of view
> its
> > > not existent.
> > >
> > > 
> > >
> > > But as christian told you already, what we see now fits to
> a
> > > szenario
> > > where you lost the osd and eighter you did something, or
> > > something else
> > > happens, but the data were not recovered again.
> > >
> > > Eighter because another OSD was broken, or because you did
> > > something.
> > >
> > > Maybe, because of the "too many PGs per OSD (307 > max
> 300)"
> > > ceph never
> > > recovered.
> > >
> > > What i can see from http://pastebin.com/VZD7j2vN is that
> > >
> > > OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding
> the
> > > incomplete data.
> > >
> > > This are 7 OSD's from 10. So something happend to that
> > OSD's or
> > > the data
> >   

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Oliver Dzombic
Hi,

again:

You >must< check all your logs ( as fucky as it is for sure ).

Means on the ceph nodes in /var/log/ceph/*

And go back to the time where things went down the hill.

There must be something else going on, beyond normal osd crash.

And your manual pg repair/pg remove/pg set complete is, most probably,
just getting your situation worst.

So really, if you want to have a chance to find out whats going on, you
must check all the logs. Especially the OSD logs, especially the OSD log
of the OSD you removed, and then the OSD logs of those pg, which are
incomplete/stuck/what_ever_not_good.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.06.2016 um 12:33 schrieb Mario Giammarco:
> Thanks,
> I can put in osds but the do not stay in, and I am pretty sure that are
> not broken.
> 
> Il giorno mer 29 giu 2016 alle ore 12:07 Oliver Dzombic
> > ha scritto:
> 
> hi,
> 
> ceph osd set noscrub
> ceph osd set nodeep-scrub
> 
> ceph osd in 
> 
> 
> --
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de 
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> > Now the problem is that ceph has put out two disks because scrub  has
> > failed (I think it is not a disk fault but due to mark-complete)
> > How can I:
> > - disable scrub
> > - put in again the two disks
> >
> > I will wait anyway the end of recovery to be sure it really works
> again
> >
> > Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> > 
> >> ha scritto:
> >
> > Infact I am worried because:
> >
> > 1) ceph is under proxmox, and proxmox may decide to reboot a
> server
> > if it is not responding
> > 2) probably a server was rebooted while ceph was reconstructing
> > 3) even using max=3 do not help
> >
> > Anyway this is the "unofficial" procedure that I am using, much
> > simpler than blog post:
> >
> > 1) find host where is pg
> > 2) stop ceph in that host
> > 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete
> --data-path
> > /var/lib/ceph/osd/ceph-9 --journal-path
> > /var/lib/ceph/osd/ceph-9/journal
> > 4) start ceph
> > 5) look finally it reconstructing
> >
> > Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> > 
> >> ha
> scritto:
> >
> > Hi,
> >
> > removing ONE disk while your replication is 2, is no problem.
> >
> > You dont need to wait a single second to replace of remove
> it. Its
> > anyway not used and out/down. So from ceph's point of view its
> > not existent.
> >
> > 
> >
> > But as christian told you already, what we see now fits to a
> > szenario
> > where you lost the osd and eighter you did something, or
> > something else
> > happens, but the data were not recovered again.
> >
> > Eighter because another OSD was broken, or because you did
> > something.
> >
> > Maybe, because of the "too many PGs per OSD (307 > max 300)"
> > ceph never
> > recovered.
> >
> > What i can see from http://pastebin.com/VZD7j2vN is that
> >
> > OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
> > incomplete data.
> >
> > This are 7 OSD's from 10. So something happend to that
> OSD's or
> > the data
> > in them. And that had nothing to do with a single disk
> failing.
> >
> > Something else must have been happend.
> >
> > And as christian already wrote: you will have to go
> through your
> > logs
> > back until the point were things going down.
> >
> > Because a fail of a single OSD, no matter what your
> replication
> > size is,
> > can ( 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Thanks,
I can put in osds but the do not stay in, and I am pretty sure that are not
broken.

Il giorno mer 29 giu 2016 alle ore 12:07 Oliver Dzombic <
i...@ip-interactive.de> ha scritto:

> hi,
>
> ceph osd set noscrub
> ceph osd set nodeep-scrub
>
> ceph osd in 
>
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> > Now the problem is that ceph has put out two disks because scrub  has
> > failed (I think it is not a disk fault but due to mark-complete)
> > How can I:
> > - disable scrub
> > - put in again the two disks
> >
> > I will wait anyway the end of recovery to be sure it really works again
> >
> > Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> > > ha scritto:
> >
> > Infact I am worried because:
> >
> > 1) ceph is under proxmox, and proxmox may decide to reboot a server
> > if it is not responding
> > 2) probably a server was rebooted while ceph was reconstructing
> > 3) even using max=3 do not help
> >
> > Anyway this is the "unofficial" procedure that I am using, much
> > simpler than blog post:
> >
> > 1) find host where is pg
> > 2) stop ceph in that host
> > 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete --data-path
> > /var/lib/ceph/osd/ceph-9 --journal-path
> > /var/lib/ceph/osd/ceph-9/journal
> > 4) start ceph
> > 5) look finally it reconstructing
> >
> > Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> > > ha scritto:
> >
> > Hi,
> >
> > removing ONE disk while your replication is 2, is no problem.
> >
> > You dont need to wait a single second to replace of remove it.
> Its
> > anyway not used and out/down. So from ceph's point of view its
> > not existent.
> >
> > 
> >
> > But as christian told you already, what we see now fits to a
> > szenario
> > where you lost the osd and eighter you did something, or
> > something else
> > happens, but the data were not recovered again.
> >
> > Eighter because another OSD was broken, or because you did
> > something.
> >
> > Maybe, because of the "too many PGs per OSD (307 > max 300)"
> > ceph never
> > recovered.
> >
> > What i can see from http://pastebin.com/VZD7j2vN is that
> >
> > OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
> > incomplete data.
> >
> > This are 7 OSD's from 10. So something happend to that OSD's or
> > the data
> > in them. And that had nothing to do with a single disk failing.
> >
> > Something else must have been happend.
> >
> > And as christian already wrote: you will have to go through your
> > logs
> > back until the point were things going down.
> >
> > Because a fail of a single OSD, no matter what your replication
> > size is,
> > can ( normally ) not harm the consistency of 7 other OSD's,
> > means 70% of
> > your total cluster.
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de 
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
> > > Yes I have removed it from crush because it was broken. I have
> > waited 24
> > > hours to see if cephs would like to heals itself. Then I
> > removed the
> > > disk completely (it was broken...) and I waited 24 hours
> > again. Then I
> > > start getting worried.
> > > Are you saying to me that I should not remove a broken disk
> from
> > > cluster? 24 hours were not enough?
> > >
> > > Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
> > > 
> >  > >> ha scritto:
> > >
> > > Just loosing one disk doesn’t automagically delete it from
> > CRUSH,
> > > but in the output you had 10 disks listed, so there 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Oliver Dzombic
hi,

ceph osd set noscrub
ceph osd set nodeep-scrub

ceph osd in 


-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.06.2016 um 12:00 schrieb Mario Giammarco:
> Now the problem is that ceph has put out two disks because scrub  has
> failed (I think it is not a disk fault but due to mark-complete)
> How can I:
> - disable scrub
> - put in again the two disks
> 
> I will wait anyway the end of recovery to be sure it really works again
> 
> Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco
> > ha scritto:
> 
> Infact I am worried because:
> 
> 1) ceph is under proxmox, and proxmox may decide to reboot a server
> if it is not responding
> 2) probably a server was rebooted while ceph was reconstructing
> 3) even using max=3 do not help
> 
> Anyway this is the "unofficial" procedure that I am using, much
> simpler than blog post:
> 
> 1) find host where is pg
> 2) stop ceph in that host
> 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete --data-path
> /var/lib/ceph/osd/ceph-9 --journal-path
> /var/lib/ceph/osd/ceph-9/journal 
> 4) start ceph
> 5) look finally it reconstructing
> 
> Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic
> > ha scritto:
> 
> Hi,
> 
> removing ONE disk while your replication is 2, is no problem.
> 
> You dont need to wait a single second to replace of remove it. Its
> anyway not used and out/down. So from ceph's point of view its
> not existent.
> 
> 
> 
> But as christian told you already, what we see now fits to a
> szenario
> where you lost the osd and eighter you did something, or
> something else
> happens, but the data were not recovered again.
> 
> Eighter because another OSD was broken, or because you did
> something.
> 
> Maybe, because of the "too many PGs per OSD (307 > max 300)"
> ceph never
> recovered.
> 
> What i can see from http://pastebin.com/VZD7j2vN is that
> 
> OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
> incomplete data.
> 
> This are 7 OSD's from 10. So something happend to that OSD's or
> the data
> in them. And that had nothing to do with a single disk failing.
> 
> Something else must have been happend.
> 
> And as christian already wrote: you will have to go through your
> logs
> back until the point were things going down.
> 
> Because a fail of a single OSD, no matter what your replication
> size is,
> can ( normally ) not harm the consistency of 7 other OSD's,
> means 70% of
> your total cluster.
> 
> --
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de 
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
> Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
> > Yes I have removed it from crush because it was broken. I have
> waited 24
> > hours to see if cephs would like to heals itself. Then I
> removed the
> > disk completely (it was broken...) and I waited 24 hours
> again. Then I
> > start getting worried.
> > Are you saying to me that I should not remove a broken disk from
> > cluster? 24 hours were not enough?
> >
> > Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
> > 
>  >> ha scritto:
> >
> > Just loosing one disk doesn’t automagically delete it from
> CRUSH,
> > but in the output you had 10 disks listed, so there must be
> > something else going - did you delete the disk from the
> crush map as
> > well?
> >
> > Ceph waits by default 300 secs AFAIK to mark an OSD out
> after it
> > will start to recover.
> >
> >
> >> On 29 Jun 2016, at 10:42, Mario Giammarco
> 
> >> 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Now the problem is that ceph has put out two disks because scrub  has
failed (I think it is not a disk fault but due to mark-complete)
How can I:
- disable scrub
- put in again the two disks

I will wait anyway the end of recovery to be sure it really works again

Il giorno mer 29 giu 2016 alle ore 11:16 Mario Giammarco <
mgiamma...@gmail.com> ha scritto:

> Infact I am worried because:
>
> 1) ceph is under proxmox, and proxmox may decide to reboot a server if it
> is not responding
> 2) probably a server was rebooted while ceph was reconstructing
> 3) even using max=3 do not help
>
> Anyway this is the "unofficial" procedure that I am using, much simpler
> than blog post:
>
> 1) find host where is pg
> 2) stop ceph in that host
> 3) ceph-objectstore-tool --pgid 1.98 --op mark-complete --data-path
> /var/lib/ceph/osd/ceph-9 --journal-path /var/lib/ceph/osd/ceph-9/journal
> 4) start ceph
> 5) look finally it reconstructing
>
> Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic <
> i...@ip-interactive.de> ha scritto:
>
>> Hi,
>>
>> removing ONE disk while your replication is 2, is no problem.
>>
>> You dont need to wait a single second to replace of remove it. Its
>> anyway not used and out/down. So from ceph's point of view its not
>> existent.
>>
>> 
>>
>> But as christian told you already, what we see now fits to a szenario
>> where you lost the osd and eighter you did something, or something else
>> happens, but the data were not recovered again.
>>
>> Eighter because another OSD was broken, or because you did something.
>>
>> Maybe, because of the "too many PGs per OSD (307 > max 300)" ceph never
>> recovered.
>>
>> What i can see from http://pastebin.com/VZD7j2vN is that
>>
>> OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
>> incomplete data.
>>
>> This are 7 OSD's from 10. So something happend to that OSD's or the data
>> in them. And that had nothing to do with a single disk failing.
>>
>> Something else must have been happend.
>>
>> And as christian already wrote: you will have to go through your logs
>> back until the point were things going down.
>>
>> Because a fail of a single OSD, no matter what your replication size is,
>> can ( normally ) not harm the consistency of 7 other OSD's, means 70% of
>> your total cluster.
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
>> > Yes I have removed it from crush because it was broken. I have waited 24
>> > hours to see if cephs would like to heals itself. Then I removed the
>> > disk completely (it was broken...) and I waited 24 hours again. Then I
>> > start getting worried.
>> > Are you saying to me that I should not remove a broken disk from
>> > cluster? 24 hours were not enough?
>> >
>> > Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
>> > > ha
>> scritto:
>> >
>> > Just loosing one disk doesn’t automagically delete it from CRUSH,
>> > but in the output you had 10 disks listed, so there must be
>> > something else going - did you delete the disk from the crush map as
>> > well?
>> >
>> > Ceph waits by default 300 secs AFAIK to mark an OSD out after it
>> > will start to recover.
>> >
>> >
>> >> On 29 Jun 2016, at 10:42, Mario Giammarco > >> > wrote:
>> >>
>> >> I thank you for your reply so I can add my experience:
>> >>
>> >> 1) the other time this thing happened to me I had a cluster with
>> >> min_size=2 and size=3 and the problem was the same. That time I
>> >> put min_size=1 to recover the pool but it did not help. So I do
>> >> not understand where is the advantage to put three copies when
>> >> ceph can decide to discard all three.
>> >> 2) I started with 11 hdds. The hard disk failed. Ceph waited
>> >> forever for hard disk coming back. But hard disk is really
>> >> completelly broken so I have followed the procedure to really
>> >> delete from cluster. Anyway ceph did not recover.
>> >> 3) I have 307 pgs more than 300 but it is due to the fact that I
>> >> had 11 hdds now only 10. I will add more hdds after I repair the
>> pool
>> >> 4) I have reduced the monitors to 3
>> >>
>> >>
>> >>
>> >> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer
>> >> > ha scritto:
>> >>
>> >>
>> >> Hello,
>> >>
>> >> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
>> >>
>> >> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
>> >> object_hash
>> 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Infact I am worried because:

1) ceph is under proxmox, and proxmox may decide to reboot a server if it
is not responding
2) probably a server was rebooted while ceph was reconstructing
3) even using max=3 do not help

Anyway this is the "unofficial" procedure that I am using, much simpler
than blog post:

1) find host where is pg
2) stop ceph in that host
3) ceph-objectstore-tool --pgid 1.98 --op mark-complete --data-path
/var/lib/ceph/osd/ceph-9 --journal-path /var/lib/ceph/osd/ceph-9/journal
4) start ceph
5) look finally it reconstructing

Il giorno mer 29 giu 2016 alle ore 11:11 Oliver Dzombic <
i...@ip-interactive.de> ha scritto:

> Hi,
>
> removing ONE disk while your replication is 2, is no problem.
>
> You dont need to wait a single second to replace of remove it. Its
> anyway not used and out/down. So from ceph's point of view its not
> existent.
>
> 
>
> But as christian told you already, what we see now fits to a szenario
> where you lost the osd and eighter you did something, or something else
> happens, but the data were not recovered again.
>
> Eighter because another OSD was broken, or because you did something.
>
> Maybe, because of the "too many PGs per OSD (307 > max 300)" ceph never
> recovered.
>
> What i can see from http://pastebin.com/VZD7j2vN is that
>
> OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
> incomplete data.
>
> This are 7 OSD's from 10. So something happend to that OSD's or the data
> in them. And that had nothing to do with a single disk failing.
>
> Something else must have been happend.
>
> And as christian already wrote: you will have to go through your logs
> back until the point were things going down.
>
> Because a fail of a single OSD, no matter what your replication size is,
> can ( normally ) not harm the consistency of 7 other OSD's, means 70% of
> your total cluster.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
> > Yes I have removed it from crush because it was broken. I have waited 24
> > hours to see if cephs would like to heals itself. Then I removed the
> > disk completely (it was broken...) and I waited 24 hours again. Then I
> > start getting worried.
> > Are you saying to me that I should not remove a broken disk from
> > cluster? 24 hours were not enough?
> >
> > Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
> > > ha
> scritto:
> >
> > Just loosing one disk doesn’t automagically delete it from CRUSH,
> > but in the output you had 10 disks listed, so there must be
> > something else going - did you delete the disk from the crush map as
> > well?
> >
> > Ceph waits by default 300 secs AFAIK to mark an OSD out after it
> > will start to recover.
> >
> >
> >> On 29 Jun 2016, at 10:42, Mario Giammarco  >> > wrote:
> >>
> >> I thank you for your reply so I can add my experience:
> >>
> >> 1) the other time this thing happened to me I had a cluster with
> >> min_size=2 and size=3 and the problem was the same. That time I
> >> put min_size=1 to recover the pool but it did not help. So I do
> >> not understand where is the advantage to put three copies when
> >> ceph can decide to discard all three.
> >> 2) I started with 11 hdds. The hard disk failed. Ceph waited
> >> forever for hard disk coming back. But hard disk is really
> >> completelly broken so I have followed the procedure to really
> >> delete from cluster. Anyway ceph did not recover.
> >> 3) I have 307 pgs more than 300 but it is due to the fact that I
> >> had 11 hdds now only 10. I will add more hdds after I repair the
> pool
> >> 4) I have reduced the monitors to 3
> >>
> >>
> >>
> >> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer
> >> > ha scritto:
> >>
> >>
> >> Hello,
> >>
> >> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
> >>
> >> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
> >> object_hash
> >>^
> >> And that's the root cause of all your woes.
> >> The default replication size is 3 for a reason and while I do
> >> run pools
> >> with replication of 2 they are either HDD RAIDs or extremely
> >> trustworthy
> >> and well monitored SSD.
> >>
> >> That said, something more than a single HDD failure must have
> >> happened
> >> here, you should check the logs and backtrace all the 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Oliver Dzombic
Hi,

removing ONE disk while your replication is 2, is no problem.

You dont need to wait a single second to replace of remove it. Its
anyway not used and out/down. So from ceph's point of view its not existent.



But as christian told you already, what we see now fits to a szenario
where you lost the osd and eighter you did something, or something else
happens, but the data were not recovered again.

Eighter because another OSD was broken, or because you did something.

Maybe, because of the "too many PGs per OSD (307 > max 300)" ceph never
recovered.

What i can see from http://pastebin.com/VZD7j2vN is that

OSD 5,13,9,0,6,2,3 and maybe others, are the OSD's holding the
incomplete data.

This are 7 OSD's from 10. So something happend to that OSD's or the data
in them. And that had nothing to do with a single disk failing.

Something else must have been happend.

And as christian already wrote: you will have to go through your logs
back until the point were things going down.

Because a fail of a single OSD, no matter what your replication size is,
can ( normally ) not harm the consistency of 7 other OSD's, means 70% of
your total cluster.

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.06.2016 um 10:56 schrieb Mario Giammarco:
> Yes I have removed it from crush because it was broken. I have waited 24
> hours to see if cephs would like to heals itself. Then I removed the
> disk completely (it was broken...) and I waited 24 hours again. Then I
> start getting worried.
> Are you saying to me that I should not remove a broken disk from
> cluster? 24 hours were not enough?
> 
> Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy
> > ha scritto:
> 
> Just loosing one disk doesn’t automagically delete it from CRUSH,
> but in the output you had 10 disks listed, so there must be
> something else going - did you delete the disk from the crush map as
> well?
> 
> Ceph waits by default 300 secs AFAIK to mark an OSD out after it
> will start to recover.
> 
> 
>> On 29 Jun 2016, at 10:42, Mario Giammarco > > wrote:
>>
>> I thank you for your reply so I can add my experience:
>>
>> 1) the other time this thing happened to me I had a cluster with
>> min_size=2 and size=3 and the problem was the same. That time I
>> put min_size=1 to recover the pool but it did not help. So I do
>> not understand where is the advantage to put three copies when
>> ceph can decide to discard all three.
>> 2) I started with 11 hdds. The hard disk failed. Ceph waited
>> forever for hard disk coming back. But hard disk is really
>> completelly broken so I have followed the procedure to really
>> delete from cluster. Anyway ceph did not recover.
>> 3) I have 307 pgs more than 300 but it is due to the fact that I
>> had 11 hdds now only 10. I will add more hdds after I repair the pool
>> 4) I have reduced the monitors to 3
>>
>>
>>
>> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer
>> > ha scritto:
>>
>>
>> Hello,
>>
>> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
>>
>> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash
>>^
>> And that's the root cause of all your woes.
>> The default replication size is 3 for a reason and while I do
>> run pools
>> with replication of 2 they are either HDD RAIDs or extremely
>> trustworthy
>> and well monitored SSD.
>>
>> That said, something more than a single HDD failure must have
>> happened
>> here, you should check the logs and backtrace all the step you
>> did after
>> that OSD failed.
>>
>> You said there were 11 HDDs and your first ceph -s output showed:
>> ---
>>  osdmap e10182: 10 osds: 10 up, 10 in
>> 
>> And your crush map states the same.
>>
>> So how and WHEN did you remove that OSD?
>> My suspicion would be it was removed before recovery was complete.
>>
>> Also, as I think was mentioned before, 7 mons are overkill 3-5
>> would be a
>> saner number.
>>
>> Christian
>>
>> > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags
>> hashpspool
>> > stripe_width 0
>> >removed_snaps [1~3]
>> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
>> object_hash
>> > rjenkins pg_num 512 pgp_num 512 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Yes I have removed it from crush because it was broken. I have waited 24
hours to see if cephs would like to heals itself. Then I removed the disk
completely (it was broken...) and I waited 24 hours again. Then I start
getting worried.
Are you saying to me that I should not remove a broken disk from cluster?
24 hours were not enough?

Il giorno mer 29 giu 2016 alle ore 10:53 Zoltan Arnold Nagy <
zol...@linux.vnet.ibm.com> ha scritto:

> Just loosing one disk doesn’t automagically delete it from CRUSH, but in
> the output you had 10 disks listed, so there must be something else going -
> did you delete the disk from the crush map as well?
>
> Ceph waits by default 300 secs AFAIK to mark an OSD out after it will
> start to recover.
>
>
> On 29 Jun 2016, at 10:42, Mario Giammarco  wrote:
>
> I thank you for your reply so I can add my experience:
>
> 1) the other time this thing happened to me I had a cluster with
> min_size=2 and size=3 and the problem was the same. That time I put
> min_size=1 to recover the pool but it did not help. So I do not understand
> where is the advantage to put three copies when ceph can decide to discard
> all three.
> 2) I started with 11 hdds. The hard disk failed. Ceph waited forever for
> hard disk coming back. But hard disk is really completelly broken so I have
> followed the procedure to really delete from cluster. Anyway ceph did not
> recover.
> 3) I have 307 pgs more than 300 but it is due to the fact that I had 11
> hdds now only 10. I will add more hdds after I repair the pool
> 4) I have reduced the monitors to 3
>
>
>
> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer 
> ha scritto:
>
>>
>> Hello,
>>
>> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
>>
>> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>>^
>> And that's the root cause of all your woes.
>> The default replication size is 3 for a reason and while I do run pools
>> with replication of 2 they are either HDD RAIDs or extremely trustworthy
>> and well monitored SSD.
>>
>> That said, something more than a single HDD failure must have happened
>> here, you should check the logs and backtrace all the step you did after
>> that OSD failed.
>>
>> You said there were 11 HDDs and your first ceph -s output showed:
>> ---
>>  osdmap e10182: 10 osds: 10 up, 10 in
>> 
>> And your crush map states the same.
>>
>> So how and WHEN did you remove that OSD?
>> My suspicion would be it was removed before recovery was complete.
>>
>> Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a
>> saner number.
>>
>> Christian
>>
>> > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
>> > stripe_width 0
>> >removed_snaps [1~3]
>> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
>> > stripe_width 0
>> >removed_snaps [1~3]
>> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>> > rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
>> > stripe_width 0
>> >removed_snaps [1~3]
>> >
>> >
>> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
>> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
>> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
>> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
>> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
>> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
>> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
>> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
>> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
>> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
>> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
>> >  TOTAL 14846G 9136G 5710G 61.54
>> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
>> >
>> >
>> >
>> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>> >
>> > http://pastebin.com/SvGfcSHb
>> > http://pastebin.com/gYFatsNS
>> > http://pastebin.com/VZD7j2vN
>> >
>> > I do not understand why I/O on ENTIRE cluster is blocked when only few
>> > pgs are incomplete.
>> >
>> > Many thanks,
>> > Mario
>> >
>> >
>> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
>> > s.pri...@profihost.ag> ha scritto:
>> >
>> > > And ceph health detail
>> > >
>> > > Stefan
>> > >
>> > > Excuse my typo sent from my mobile phone.
>> > >
>> > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic > >:
>> > >
>> > > Hi Mario,
>> > >
>> > > please give some more details:
>> > >
>> > > Please the output of:
>> > >
>> > > ceph osd pool ls detail
>> > > ceph osd df
>> > > ceph --version
>> > >
>> > > ceph -w for 10 seconds ( use http://pastebin.com/ please )
>> > >
>> > > ceph osd crush dump ( also pastebin pls )
>> > >
>> > > --
>> > > Mit freundlichen Gruessen / Best regards
>> > >
>> > > Oliver Dzombic
>> > > IP-Interactive

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Zoltan Arnold Nagy
Just loosing one disk doesn’t automagically delete it from CRUSH, but in the 
output you had 10 disks listed, so there must be something else going - did you 
delete the disk from the crush map as well?

Ceph waits by default 300 secs AFAIK to mark an OSD out after it will start to 
recover.


> On 29 Jun 2016, at 10:42, Mario Giammarco  wrote:
> 
> I thank you for your reply so I can add my experience:
> 
> 1) the other time this thing happened to me I had a cluster with min_size=2 
> and size=3 and the problem was the same. That time I put min_size=1 to 
> recover the pool but it did not help. So I do not understand where is the 
> advantage to put three copies when ceph can decide to discard all three.
> 2) I started with 11 hdds. The hard disk failed. Ceph waited forever for hard 
> disk coming back. But hard disk is really completelly broken so I have 
> followed the procedure to really delete from cluster. Anyway ceph did not 
> recover.
> 3) I have 307 pgs more than 300 but it is due to the fact that I had 11 hdds 
> now only 10. I will add more hdds after I repair the pool
> 4) I have reduced the monitors to 3
> 
> 
> 
> Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer  > ha scritto:
> 
> Hello,
> 
> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
> 
> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>^
> And that's the root cause of all your woes.
> The default replication size is 3 for a reason and while I do run pools
> with replication of 2 they are either HDD RAIDs or extremely trustworthy
> and well monitored SSD.
> 
> That said, something more than a single HDD failure must have happened
> here, you should check the logs and backtrace all the step you did after
> that OSD failed.
> 
> You said there were 11 HDDs and your first ceph -s output showed:
> ---
>  osdmap e10182: 10 osds: 10 up, 10 in
> 
> And your crush map states the same.
> 
> So how and WHEN did you remove that OSD?
> My suspicion would be it was removed before recovery was complete.
> 
> Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a
> saner number.
> 
> Christian
> 
> > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> > rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> > rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> >
> >
> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
> >  TOTAL 14846G 9136G 5710G 61.54
> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> >
> >
> >
> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> >
> > http://pastebin.com/SvGfcSHb 
> > http://pastebin.com/gYFatsNS 
> > http://pastebin.com/VZD7j2vN 
> >
> > I do not understand why I/O on ENTIRE cluster is blocked when only few
> > pgs are incomplete.
> >
> > Many thanks,
> > Mario
> >
> >
> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
> > s.pri...@profihost.ag > ha scritto:
> >
> > > And ceph health detail
> > >
> > > Stefan
> > >
> > > Excuse my typo sent from my mobile phone.
> > >
> > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic  > > >:
> > >
> > > Hi Mario,
> > >
> > > please give some more details:
> > >
> > > Please the output of:
> > >
> > > ceph osd pool ls detail
> > > ceph osd df
> > > ceph --version
> > >
> > > ceph -w for 10 seconds ( use http://pastebin.com/  
> > > please )
> > >
> > > ceph osd crush dump ( also pastebin pls )
> > >
> > > --
> > > Mit freundlichen Gruessen / Best regards
> > >
> > > Oliver Dzombic
> > > IP-Interactive
> > >
> > > mailto:i...@ip-interactive.de  
> > > >
> > >
> > > Anschrift:
> > >
> > > IP Interactive UG ( haftungsbeschraenkt )
> > > Zum Sonnenberg 1-3
> > > 63571 Gelnhausen
> > >
> > > HRB 93402 beim Amtsgericht Hanau
> > > 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
I thank you for your reply so I can add my experience:

1) the other time this thing happened to me I had a cluster with min_size=2
and size=3 and the problem was the same. That time I put min_size=1 to
recover the pool but it did not help. So I do not understand where is the
advantage to put three copies when ceph can decide to discard all three.
2) I started with 11 hdds. The hard disk failed. Ceph waited forever for
hard disk coming back. But hard disk is really completelly broken so I have
followed the procedure to really delete from cluster. Anyway ceph did not
recover.
3) I have 307 pgs more than 300 but it is due to the fact that I had 11
hdds now only 10. I will add more hdds after I repair the pool
4) I have reduced the monitors to 3



Il giorno mer 29 giu 2016 alle ore 10:25 Christian Balzer 
ha scritto:

>
> Hello,
>
> On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:
>
> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
>^
> And that's the root cause of all your woes.
> The default replication size is 3 for a reason and while I do run pools
> with replication of 2 they are either HDD RAIDs or extremely trustworthy
> and well monitored SSD.
>
> That said, something more than a single HDD failure must have happened
> here, you should check the logs and backtrace all the step you did after
> that OSD failed.
>
> You said there were 11 HDDs and your first ceph -s output showed:
> ---
>  osdmap e10182: 10 osds: 10 up, 10 in
> 
> And your crush map states the same.
>
> So how and WHEN did you remove that OSD?
> My suspicion would be it was removed before recovery was complete.
>
> Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a
> saner number.
>
> Christian
>
> > rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> > rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> > rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
> > stripe_width 0
> >removed_snaps [1~3]
> >
> >
> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
> >  TOTAL 14846G 9136G 5710G 61.54
> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> >
> >
> >
> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> >
> > http://pastebin.com/SvGfcSHb
> > http://pastebin.com/gYFatsNS
> > http://pastebin.com/VZD7j2vN
> >
> > I do not understand why I/O on ENTIRE cluster is blocked when only few
> > pgs are incomplete.
> >
> > Many thanks,
> > Mario
> >
> >
> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
> > s.pri...@profihost.ag> ha scritto:
> >
> > > And ceph health detail
> > >
> > > Stefan
> > >
> > > Excuse my typo sent from my mobile phone.
> > >
> > > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic  >:
> > >
> > > Hi Mario,
> > >
> > > please give some more details:
> > >
> > > Please the output of:
> > >
> > > ceph osd pool ls detail
> > > ceph osd df
> > > ceph --version
> > >
> > > ceph -w for 10 seconds ( use http://pastebin.com/ please )
> > >
> > > ceph osd crush dump ( also pastebin pls )
> > >
> > > --
> > > Mit freundlichen Gruessen / Best regards
> > >
> > > Oliver Dzombic
> > > IP-Interactive
> > >
> > > mailto:i...@ip-interactive.de 
> > >
> > > Anschrift:
> > >
> > > IP Interactive UG ( haftungsbeschraenkt )
> > > Zum Sonnenberg 1-3
> > > 63571 Gelnhausen
> > >
> > > HRB 93402 beim Amtsgericht Hanau
> > > Geschäftsführung: Oliver Dzombic
> > >
> > > Steuer Nr.: 35 236 3622 1
> > > UST ID: DE274086107
> > >
> > >
> > > Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> > >
> > > Hello,
> > >
> > > this is the second time that happens to me, I hope that someone can
> > >
> > > explain what I can do.
> > >
> > > Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
> > >
> > >
> > > One hdd goes down due to bad sectors.
> > >
> > > Ceph recovers but it ends with:
> > >
> > >
> > > cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
> > >
> > > health HEALTH_WARN
> > >
> > >3 pgs down
> > >
> > >19 pgs incomplete
> > >
> > >19 pgs stuck inactive
> > >
> > >

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Christian Balzer

Hello,

On Wed, 29 Jun 2016 06:02:59 + Mario Giammarco wrote:

> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
   ^
And that's the root cause of all your woes.
The default replication size is 3 for a reason and while I do run pools
with replication of 2 they are either HDD RAIDs or extremely trustworthy
and well monitored SSD.

That said, something more than a single HDD failure must have happened
here, you should check the logs and backtrace all the step you did after
that OSD failed.

You said there were 11 HDDs and your first ceph -s output showed:
---
 osdmap e10182: 10 osds: 10 up, 10 in

And your crush map states the same.

So how and WHEN did you remove that OSD?
My suspicion would be it was removed before recovery was complete.

Also, as I think was mentioned before, 7 mons are overkill 3-5 would be a
saner number.

Christian

> rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
> pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
> pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
> 
> 
> ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> 4 0.8  1.0   926G  702G  223G 75.88 1.23
> 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> 9 0.8  1.0   926G  573G  352G 61.91 1.01
> 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> 13 0.45000  1.0   460G  307G  153G 66.74 1.08
>  TOTAL 14846G 9136G 5710G 61.54
> MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> 
> 
> 
> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> 
> http://pastebin.com/SvGfcSHb
> http://pastebin.com/gYFatsNS
> http://pastebin.com/VZD7j2vN
> 
> I do not understand why I/O on ENTIRE cluster is blocked when only few
> pgs are incomplete.
> 
> Many thanks,
> Mario
> 
> 
> Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
> s.pri...@profihost.ag> ha scritto:
> 
> > And ceph health detail
> >
> > Stefan
> >
> > Excuse my typo sent from my mobile phone.
> >
> > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
> >
> > Hi Mario,
> >
> > please give some more details:
> >
> > Please the output of:
> >
> > ceph osd pool ls detail
> > ceph osd df
> > ceph --version
> >
> > ceph -w for 10 seconds ( use http://pastebin.com/ please )
> >
> > ceph osd crush dump ( also pastebin pls )
> >
> > --
> > Mit freundlichen Gruessen / Best regards
> >
> > Oliver Dzombic
> > IP-Interactive
> >
> > mailto:i...@ip-interactive.de 
> >
> > Anschrift:
> >
> > IP Interactive UG ( haftungsbeschraenkt )
> > Zum Sonnenberg 1-3
> > 63571 Gelnhausen
> >
> > HRB 93402 beim Amtsgericht Hanau
> > Geschäftsführung: Oliver Dzombic
> >
> > Steuer Nr.: 35 236 3622 1
> > UST ID: DE274086107
> >
> >
> > Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> >
> > Hello,
> >
> > this is the second time that happens to me, I hope that someone can
> >
> > explain what I can do.
> >
> > Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
> >
> >
> > One hdd goes down due to bad sectors.
> >
> > Ceph recovers but it ends with:
> >
> >
> > cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
> >
> > health HEALTH_WARN
> >
> >3 pgs down
> >
> >19 pgs incomplete
> >
> >19 pgs stuck inactive
> >
> >19 pgs stuck unclean
> >
> >7 requests are blocked > 32 sec
> >
> > monmap e11: 7 mons at
> >
> > {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
> >
> > 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
> >
> > 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
> >
> >election epoch 722, quorum
> >
> > 0,1,2,3,4,5,6 1,4,2,0,3,5,6
> >
> > osdmap e10182: 10 osds: 10 up, 10 in
> >
> >  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
> >
> >9136 GB used, 5710 GB / 14846 GB avail
> >
> >1005 active+clean
> >
> >  16 incomplete
> >
> >   3 down+incomplete
> >
> >
> > Unfortunately "7 requests blocked" means no virtual machine can boot
> >
> > because ceph has stopped i/o.
> >
> >
> > I can accept to lose some data, but not ALL data!
> >
> > Can you help me please?
> >
> > Thanks,
> >
> > Mario
> >
> >
> > ___
> >
> > ceph-users mailing list
> >
> > ceph-users@lists.ceph.com
> >
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Oliver Dzombic
Hi Mario,

in my opinion you should

1. fix

 too many PGs per OSD (307 > max 300)

2. stop scrubbing / deeb scrubbing

--

How looks your current

ceph osd tree

?



-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.06.2016 um 09:50 schrieb Mario Giammarco:
> I have searched google and I see that there is no official procedure.
> 
> Il giorno mer 29 giu 2016 alle ore 09:43 Mario Giammarco
> > ha scritto:
> 
> I have read many times the post "incomplete pgs, oh my"
> I think my case is different. 
> The broken disk is completely broken.
> So how can I simply mark incomplete pgs as complete? 
> Should I stop ceph before?
> 
> 
> Il giorno mer 29 giu 2016 alle ore 09:36 Tomasz Kuzemko
> >
> ha scritto:
> 
> Hi,
> if you need fast access to your remaining data you can use
> ceph-objectstore-tool to mark those PGs as complete, however
> this will
> irreversibly lose the missing data.
> 
> If you understand the risks, this procedure is pretty good
> explained here:
> http://ceph.com/community/incomplete-pgs-oh-my/
> 
> Since this article was written, ceph-objectstore-tool gained a
> feature
> that was not available at that time, that is "--op mark-complete". I
> think it will be necessary in your case to call --op
> mark-complete after
> you import the PG to temporary OSD (between steps 12 and 13).
> 
> On 29.06.2016 09:09, Mario Giammarco wrote:
> > Now I have also discovered that, by mistake, someone has put
> production
> > data on a virtual machine of the cluster. I need that ceph
> starts I/O so
> > I can boot that virtual machine.
> > Can I mark the incomplete pgs as valid?
> > If needed, where can I buy some paid support?
> > Thanks again,
> > Mario
> >
> > Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco
> > 
> >> ha
> scritto:
> >
> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 9313 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 9314 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 10537 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> >
> >
> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
> >  TOTAL 14846G 9136G 5710G 61.54
> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> >
> >
> >
> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> >
> > http://pastebin.com/SvGfcSHb
> > http://pastebin.com/gYFatsNS
> > http://pastebin.com/VZD7j2vN
> >
> > I do not understand why I/O on ENTIRE cluster is blocked
> when only
> > few pgs are incomplete.
> >
> > Many thanks,
> > Mario
> >
> >
> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe -
> Profihost
> > AG 
> >>
> ha scritto:
> >
> > And ceph health detail
> >
> > 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Tomasz Kuzemko
As far as I know there isn't, which is a shame. We have covered a
situation like this in our dev environment to be ready for it in
production and it worked, however be aware that the data that Ceph
believes is missing will be lost after you mark a PG complete.

In your situation I would find OSD which has a most complete copy of the
incomplete PG by looking at files in /var/lib/ceph/osd/*/current (based
on size or maybe mtime of files) and export it using
ceph-objectstore-tool. After that you can follow procedure described in
"incomplete pgs, oh my" with the addition of "--op mark-complete"
between steps 12 and 13.

On 29.06.2016 09:50, Mario Giammarco wrote:
> I have searched google and I see that there is no official procedure.
> 
> Il giorno mer 29 giu 2016 alle ore 09:43 Mario Giammarco
> > ha scritto:
> 
> I have read many times the post "incomplete pgs, oh my"
> I think my case is different. 
> The broken disk is completely broken.
> So how can I simply mark incomplete pgs as complete? 
> Should I stop ceph before?
> 
> 
> Il giorno mer 29 giu 2016 alle ore 09:36 Tomasz Kuzemko
> >
> ha scritto:
> 
> Hi,
> if you need fast access to your remaining data you can use
> ceph-objectstore-tool to mark those PGs as complete, however
> this will
> irreversibly lose the missing data.
> 
> If you understand the risks, this procedure is pretty good
> explained here:
> http://ceph.com/community/incomplete-pgs-oh-my/
> 
> Since this article was written, ceph-objectstore-tool gained a
> feature
> that was not available at that time, that is "--op mark-complete". I
> think it will be necessary in your case to call --op
> mark-complete after
> you import the PG to temporary OSD (between steps 12 and 13).
> 
> On 29.06.2016 09:09, Mario Giammarco wrote:
> > Now I have also discovered that, by mistake, someone has put
> production
> > data on a virtual machine of the cluster. I need that ceph
> starts I/O so
> > I can boot that virtual machine.
> > Can I mark the incomplete pgs as valid?
> > If needed, where can I buy some paid support?
> > Thanks again,
> > Mario
> >
> > Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco
> > 
> >> ha
> scritto:
> >
> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 9313 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 9314 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0
> > object_hash rjenkins pg_num 512 pgp_num 512 last_change
> 10537 flags
> > hashpspool stripe_width 0
> >removed_snaps [1~3]
> >
> >
> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
> >  TOTAL 14846G 9136G 5710G 61.54
> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> >
> >
> >
> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> >
> > http://pastebin.com/SvGfcSHb
> > http://pastebin.com/gYFatsNS
> > http://pastebin.com/VZD7j2vN
> >
> > I do not understand why I/O on ENTIRE cluster is blocked
> when only
> > few pgs are incomplete.
> >
> > Many thanks,
> > Mario
> >
> >
> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe -
> Profihost
> > AG 
> 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
I have searched google and I see that there is no official procedure.

Il giorno mer 29 giu 2016 alle ore 09:43 Mario Giammarco <
mgiamma...@gmail.com> ha scritto:

> I have read many times the post "incomplete pgs, oh my"
> I think my case is different.
> The broken disk is completely broken.
> So how can I simply mark incomplete pgs as complete?
> Should I stop ceph before?
>
>
> Il giorno mer 29 giu 2016 alle ore 09:36 Tomasz Kuzemko <
> tomasz.kuze...@corp.ovh.com> ha scritto:
>
>> Hi,
>> if you need fast access to your remaining data you can use
>> ceph-objectstore-tool to mark those PGs as complete, however this will
>> irreversibly lose the missing data.
>>
>> If you understand the risks, this procedure is pretty good explained here:
>> http://ceph.com/community/incomplete-pgs-oh-my/
>>
>> Since this article was written, ceph-objectstore-tool gained a feature
>> that was not available at that time, that is "--op mark-complete". I
>> think it will be necessary in your case to call --op mark-complete after
>> you import the PG to temporary OSD (between steps 12 and 13).
>>
>> On 29.06.2016 09:09, Mario Giammarco wrote:
>> > Now I have also discovered that, by mistake, someone has put production
>> > data on a virtual machine of the cluster. I need that ceph starts I/O so
>> > I can boot that virtual machine.
>> > Can I mark the incomplete pgs as valid?
>> > If needed, where can I buy some paid support?
>> > Thanks again,
>> > Mario
>> >
>> > Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco
>> > > ha scritto:
>> >
>> > pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 512 pgp_num 512 last_change 9313 flags
>> > hashpspool stripe_width 0
>> >removed_snaps [1~3]
>> > pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 512 pgp_num 512 last_change 9314 flags
>> > hashpspool stripe_width 0
>> >removed_snaps [1~3]
>> > pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0
>> > object_hash rjenkins pg_num 512 pgp_num 512 last_change 10537 flags
>> > hashpspool stripe_width 0
>> >removed_snaps [1~3]
>> >
>> >
>> > ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
>> > 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
>> > 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
>> > 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
>> > 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
>> > 4 0.8  1.0   926G  702G  223G 75.88 1.23
>> > 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
>> > 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
>> > 9 0.8  1.0   926G  573G  352G 61.91 1.01
>> > 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
>> > 13 0.45000  1.0   460G  307G  153G 66.74 1.08
>> >  TOTAL 14846G 9136G 5710G 61.54
>> > MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
>> >
>> >
>> >
>> > ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>> >
>> > http://pastebin.com/SvGfcSHb
>> > http://pastebin.com/gYFatsNS
>> > http://pastebin.com/VZD7j2vN
>> >
>> > I do not understand why I/O on ENTIRE cluster is blocked when only
>> > few pgs are incomplete.
>> >
>> > Many thanks,
>> > Mario
>> >
>> >
>> > Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost
>> > AG > ha
>> scritto:
>> >
>> > And ceph health detail
>> >
>> > Stefan
>> >
>> > Excuse my typo sent from my mobile phone.
>> >
>> > Am 28.06.2016 um 19:28 schrieb Oliver Dzombic
>> > >:
>> >
>> >> Hi Mario,
>> >>
>> >> please give some more details:
>> >>
>> >> Please the output of:
>> >>
>> >> ceph osd pool ls detail
>> >> ceph osd df
>> >> ceph --version
>> >>
>> >> ceph -w for 10 seconds ( use http://pastebin.com/ please )
>> >>
>> >> ceph osd crush dump ( also pastebin pls )
>> >>
>> >> --
>> >> Mit freundlichen Gruessen / Best regards
>> >>
>> >> Oliver Dzombic
>> >> IP-Interactive
>> >>
>> >> mailto:i...@ip-interactive.de
>> >>
>> >> Anschrift:
>> >>
>> >> IP Interactive UG ( haftungsbeschraenkt )
>> >> Zum Sonnenberg 1-3
>> >> 63571 Gelnhausen
>> >>
>> >> HRB 93402 beim Amtsgericht Hanau
>> >> Geschäftsführung: Oliver Dzombic
>> >>
>> >> Steuer Nr.: 35 236 3622 1
>> >> UST ID: DE274086107
>> >>
>> >>
>> >> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>> >>> Hello,
>> >>> this is the second time that happens to me, I hope that
>> >>> someone can
>> >>> explain what I can do.
>> >>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1,
>> size=2.
>> 

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Tomasz Kuzemko
Hi,
if you need fast access to your remaining data you can use
ceph-objectstore-tool to mark those PGs as complete, however this will
irreversibly lose the missing data.

If you understand the risks, this procedure is pretty good explained here:
http://ceph.com/community/incomplete-pgs-oh-my/

Since this article was written, ceph-objectstore-tool gained a feature
that was not available at that time, that is "--op mark-complete". I
think it will be necessary in your case to call --op mark-complete after
you import the PG to temporary OSD (between steps 12 and 13).

On 29.06.2016 09:09, Mario Giammarco wrote:
> Now I have also discovered that, by mistake, someone has put production
> data on a virtual machine of the cluster. I need that ceph starts I/O so
> I can boot that virtual machine.
> Can I mark the incomplete pgs as valid?
> If needed, where can I buy some paid support?
> Thanks again,
> Mario
> 
> Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco
> > ha scritto:
> 
> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 9313 flags
> hashpspool stripe_width 0
>removed_snaps [1~3]
> pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 9314 flags
> hashpspool stripe_width 0
>removed_snaps [1~3]
> pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0
> object_hash rjenkins pg_num 512 pgp_num 512 last_change 10537 flags
> hashpspool stripe_width 0
>removed_snaps [1~3]
> 
> 
> ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR   
> 5 1.81000  1.0  1857G  984G  872G 53.00 0.86  
> 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05  
> 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01  
> 3 1.35999  1.0  1391G  906G  485G 65.12 1.06  
> 4 0.8  1.0   926G  702G  223G 75.88 1.23  
> 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93  
> 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88  
> 9 0.8  1.0   926G  573G  352G 61.91 1.01  
> 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07  
> 13 0.45000  1.0   460G  307G  153G 66.74 1.08  
>  TOTAL 14846G 9136G 5710G 61.54   
> MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
> 
> 
> 
> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
> 
> http://pastebin.com/SvGfcSHb
> http://pastebin.com/gYFatsNS
> http://pastebin.com/VZD7j2vN
> 
> I do not understand why I/O on ENTIRE cluster is blocked when only
> few pgs are incomplete.
> 
> Many thanks,
> Mario
> 
> 
> Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost
> AG > ha scritto:
> 
> And ceph health detail
> 
> Stefan
> 
> Excuse my typo sent from my mobile phone.
> 
> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic
> >:
> 
>> Hi Mario,
>>
>> please give some more details:
>>
>> Please the output of:
>>
>> ceph osd pool ls detail
>> ceph osd df
>> ceph --version
>>
>> ceph -w for 10 seconds ( use http://pastebin.com/ please )
>>
>> ceph osd crush dump ( also pastebin pls )
>>
>> -- 
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>>> Hello,
>>> this is the second time that happens to me, I hope that
>>> someone can
>>> explain what I can do.
>>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>>>
>>> One hdd goes down due to bad sectors.
>>> Ceph recovers but it ends with:
>>>
>>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>>> health HEALTH_WARN
>>>3 pgs down
>>>19 pgs incomplete
>>>19 pgs stuck inactive
>>>19 pgs stuck unclean
>>>7 requests are blocked > 32 sec
>>> monmap e11: 7 mons at
>>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0
>>> ,
>>> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202
>>> 
>>> :
>>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0
>>>   

Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
Now I have also discovered that, by mistake, someone has put production
data on a virtual machine of the cluster. I need that ceph starts I/O so I
can boot that virtual machine.
Can I mark the incomplete pgs as valid?
If needed, where can I buy some paid support?
Thanks again,
Mario

Il giorno mer 29 giu 2016 alle ore 08:02 Mario Giammarco <
mgiamma...@gmail.com> ha scritto:

> pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
> pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
> pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
> rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
> stripe_width 0
>removed_snaps [1~3]
>
>
> ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
> 5 1.81000  1.0  1857G  984G  872G 53.00 0.86
> 6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
> 2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
> 3 1.35999  1.0  1391G  906G  485G 65.12 1.06
> 4 0.8  1.0   926G  702G  223G 75.88 1.23
> 7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
> 8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
> 9 0.8  1.0   926G  573G  352G 61.91 1.01
> 0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
> 13 0.45000  1.0   460G  307G  153G 66.74 1.08
>  TOTAL 14846G 9136G 5710G 61.54
> MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47
>
>
>
> ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>
> http://pastebin.com/SvGfcSHb
> http://pastebin.com/gYFatsNS
> http://pastebin.com/VZD7j2vN
>
> I do not understand why I/O on ENTIRE cluster is blocked when only few pgs
> are incomplete.
>
> Many thanks,
> Mario
>
>
> Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
> s.pri...@profihost.ag> ha scritto:
>
>> And ceph health detail
>>
>> Stefan
>>
>> Excuse my typo sent from my mobile phone.
>>
>> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
>>
>> Hi Mario,
>>
>> please give some more details:
>>
>> Please the output of:
>>
>> ceph osd pool ls detail
>> ceph osd df
>> ceph --version
>>
>> ceph -w for 10 seconds ( use http://pastebin.com/ please )
>>
>> ceph osd crush dump ( also pastebin pls )
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de 
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>>
>> Hello,
>>
>> this is the second time that happens to me, I hope that someone can
>>
>> explain what I can do.
>>
>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>>
>>
>> One hdd goes down due to bad sectors.
>>
>> Ceph recovers but it ends with:
>>
>>
>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>>
>> health HEALTH_WARN
>>
>>3 pgs down
>>
>>19 pgs incomplete
>>
>>19 pgs stuck inactive
>>
>>19 pgs stuck unclean
>>
>>7 requests are blocked > 32 sec
>>
>> monmap e11: 7 mons at
>>
>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
>>
>> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
>>
>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
>>
>>election epoch 722, quorum
>>
>> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>>
>> osdmap e10182: 10 osds: 10 up, 10 in
>>
>>  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
>>
>>9136 GB used, 5710 GB / 14846 GB avail
>>
>>1005 active+clean
>>
>>  16 incomplete
>>
>>   3 down+incomplete
>>
>>
>> Unfortunately "7 requests blocked" means no virtual machine can boot
>>
>> because ceph has stopped i/o.
>>
>>
>> I can accept to lose some data, but not ALL data!
>>
>> Can you help me please?
>>
>> Thanks,
>>
>> Mario
>>
>>
>> ___
>>
>> ceph-users mailing list
>>
>> ceph-users@lists.ceph.com
>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-29 Thread Mario Giammarco
pool 0 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 9313 flags hashpspool
stripe_width 0
   removed_snaps [1~3]
pool 1 'rbd2' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 9314 flags hashpspool
stripe_width 0
   removed_snaps [1~3]
pool 2 'rbd3' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 10537 flags hashpspool
stripe_width 0
   removed_snaps [1~3]


ID WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR
5 1.81000  1.0  1857G  984G  872G 53.00 0.86
6 1.81000  1.0  1857G 1202G  655G 64.73 1.05
2 1.81000  1.0  1857G 1158G  698G 62.38 1.01
3 1.35999  1.0  1391G  906G  485G 65.12 1.06
4 0.8  1.0   926G  702G  223G 75.88 1.23
7 1.81000  1.0  1857G 1063G  793G 57.27 0.93
8 1.81000  1.0  1857G 1011G  846G 54.44 0.88
9 0.8  1.0   926G  573G  352G 61.91 1.01
0 1.81000  1.0  1857G 1227G  629G 66.10 1.07
13 0.45000  1.0   460G  307G  153G 66.74 1.08
 TOTAL 14846G 9136G 5710G 61.54
MIN/MAX VAR: 0.86/1.23  STDDEV: 6.47



ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)

http://pastebin.com/SvGfcSHb
http://pastebin.com/gYFatsNS
http://pastebin.com/VZD7j2vN

I do not understand why I/O on ENTIRE cluster is blocked when only few pgs
are incomplete.

Many thanks,
Mario


Il giorno mar 28 giu 2016 alle ore 19:34 Stefan Priebe - Profihost AG <
s.pri...@profihost.ag> ha scritto:

> And ceph health detail
>
> Stefan
>
> Excuse my typo sent from my mobile phone.
>
> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
>
> Hi Mario,
>
> please give some more details:
>
> Please the output of:
>
> ceph osd pool ls detail
> ceph osd df
> ceph --version
>
> ceph -w for 10 seconds ( use http://pastebin.com/ please )
>
> ceph osd crush dump ( also pastebin pls )
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de 
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
>
>
> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>
> Hello,
>
> this is the second time that happens to me, I hope that someone can
>
> explain what I can do.
>
> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>
>
> One hdd goes down due to bad sectors.
>
> Ceph recovers but it ends with:
>
>
> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>
> health HEALTH_WARN
>
>3 pgs down
>
>19 pgs incomplete
>
>19 pgs stuck inactive
>
>19 pgs stuck unclean
>
>7 requests are blocked > 32 sec
>
> monmap e11: 7 mons at
>
> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
>
> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
>
> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
>
>election epoch 722, quorum
>
> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>
> osdmap e10182: 10 osds: 10 up, 10 in
>
>  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
>
>9136 GB used, 5710 GB / 14846 GB avail
>
>1005 active+clean
>
>  16 incomplete
>
>   3 down+incomplete
>
>
> Unfortunately "7 requests blocked" means no virtual machine can boot
>
> because ceph has stopped i/o.
>
>
> I can accept to lose some data, but not ALL data!
>
> Can you help me please?
>
> Thanks,
>
> Mario
>
>
> ___
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Stefan Priebe - Profihost AG
And ceph health detail

Stefan

Excuse my typo sent from my mobile phone.

> Am 28.06.2016 um 19:28 schrieb Oliver Dzombic :
> 
> Hi Mario,
> 
> please give some more details:
> 
> Please the output of:
> 
> ceph osd pool ls detail
> ceph osd df
> ceph --version
> 
> ceph -w for 10 seconds ( use http://pastebin.com/ please )
> 
> ceph osd crush dump ( also pastebin pls )
> 
> -- 
> Mit freundlichen Gruessen / Best regards
> 
> Oliver Dzombic
> IP-Interactive
> 
> mailto:i...@ip-interactive.de
> 
> Anschrift:
> 
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
> 
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
> 
> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107
> 
> 
>> Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
>> Hello,
>> this is the second time that happens to me, I hope that someone can 
>> explain what I can do.
>> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
>> 
>> One hdd goes down due to bad sectors. 
>> Ceph recovers but it ends with:
>> 
>> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>> health HEALTH_WARN
>>3 pgs down
>>19 pgs incomplete
>>19 pgs stuck inactive
>>19 pgs stuck unclean
>>7 requests are blocked > 32 sec
>> monmap e11: 7 mons at
>> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
>> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
>> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
>>election epoch 722, quorum 
>> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>> osdmap e10182: 10 osds: 10 up, 10 in
>>  pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
>>9136 GB used, 5710 GB / 14846 GB avail
>>1005 active+clean
>>  16 incomplete
>>   3 down+incomplete
>> 
>> Unfortunately "7 requests blocked" means no virtual machine can boot 
>> because ceph has stopped i/o.
>> 
>> I can accept to lose some data, but not ALL data!
>> Can you help me please?
>> Thanks,
>> Mario
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Another cluster completely hang

2016-06-28 Thread Oliver Dzombic
Hi Mario,

please give some more details:

Please the output of:

ceph osd pool ls detail
ceph osd df
ceph --version

ceph -w for 10 seconds ( use http://pastebin.com/ please )

ceph osd crush dump ( also pastebin pls )

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 28.06.2016 um 18:59 schrieb Mario Giammarco:
> Hello,
> this is the second time that happens to me, I hope that someone can 
> explain what I can do.
> Proxmox ceph cluster with 8 servers, 11 hdd. Min_size=1, size=2.
> 
> One hdd goes down due to bad sectors. 
> Ceph recovers but it ends with:
> 
> cluster f2a8dd7d-949a-4a29-acab-11d4900249f4
>  health HEALTH_WARN
> 3 pgs down
> 19 pgs incomplete
> 19 pgs stuck inactive
> 19 pgs stuck unclean
> 7 requests are blocked > 32 sec
>  monmap e11: 7 mons at
> {0=192.168.0.204:6789/0,1=192.168.0.201:6789/0,
> 2=192.168.0.203:6789/0,3=192.168.0.205:6789/0,4=192.168.0.202:
> 6789/0,5=192.168.0.206:6789/0,6=192.168.0.207:6789/0}
> election epoch 722, quorum 
> 0,1,2,3,4,5,6 1,4,2,0,3,5,6
>  osdmap e10182: 10 osds: 10 up, 10 in
>   pgmap v3295880: 1024 pgs, 2 pools, 4563 GB data, 1143 kobjects
> 9136 GB used, 5710 GB / 14846 GB avail
> 1005 active+clean
>   16 incomplete
>3 down+incomplete
> 
> Unfortunately "7 requests blocked" means no virtual machine can boot 
> because ceph has stopped i/o.
> 
> I can accept to lose some data, but not ALL data!
> Can you help me please?
> Thanks,
> Mario
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com