subject:"\[ceph\-users\] Help\: pool not responding"

Re: [ceph-users] [Help: pool not responding] Now osd crash

2016-03-08 Thread Mario Giammarco

h::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS ` is needed to
interpret this.


2016-03-02 9:38 GMT+01:00 Mario Giammarco <mgiamma...@gmail.com>:

> Here it is:
>
>  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>  health HEALTH_WARN
> 4 pgs incomplete
> 4 pgs stuck inactive
> 4 pgs stuck unclean
> 1 requests are blocked > 32 sec
>  monmap e8: 3 mons at {0=
> 10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
> election epoch 840, quorum 0,1,2 0,1,2
>  osdmap e2405: 3 osds: 3 up, 3 in
>   pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects
> 1090 GB used, 4481 GB / 5571 GB avail
>  284 active+clean
>4 incomplete
>   client io 4008 B/s rd, 446 kB/s wr, 23 op/s
>
>
> 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <ski...@redhat.com>:
>
>> Is "ceph -s" still showing you same output?
>>
>> > cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>> >  health HEALTH_WARN
>> > 4 pgs incomplete
>> > 4 pgs stuck inactive
>> > 4 pgs stuck unclean
>> >  monmap e8: 3 mons at
>> > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
>> > election epoch 832, quorum 0,1,2 0,1,2
>> >  osdmap e2400: 3 osds: 3 up, 3 in
>> >   pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
>> > 1090 GB used, 4481 GB / 5571 GB avail
>> >  284 active+clean
>> >4 incomplete
>>
>> Cheers,
>> S
>>
>> - Original Message -
>> From: "Mario Giammarco" <mgiamma...@gmail.com>
>> To: "Lionel Bouton" <lionel-subscript...@bouton.name>
>> Cc: "Shinobu Kinjo" <ski...@redhat.com>, ceph-users@lists.ceph.com
>> Sent: Wednesday, March 2, 2016 4:27:15 PM
>> Subject: Re: [ceph-users] Help: pool not responding
>>
>> Tried to set min_size=1 but unfortunately nothing has changed.
>> Thanks for the idea.
>>
>> 2016-02-29 22:56 GMT+01:00 Lionel Bouton <lionel-subscript...@bouton.name
>> >:
>>
>> > Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>> >
>> > the fact that they are optimized for benchmarks and certainly not
>> > Ceph OSD usage patterns (with or without internal journal).
>> >
>> > Are you assuming that SSHD is causing the issue?
>> > If you could elaborate on this more, it would be helpful.
>> >
>> >
>> > Probably not (unless they reveal themselves extremely unreliable with
>> Ceph
>> > OSD usage patterns which would be surprising to me).
>> >
>> > For incomplete PG the documentation seems good enough for what should be
>> > done :
>> > http://docs.ceph.com/docs/master/rados/operations/pg-states/
>> >
>> > The relevant text:
>> >
>> > *Incomplete* Ceph detects that a placement group is missing information
>> > about writes that may have occurred, or does not have any healthy
>> copies.
>> > If you see this state, try to start any failed OSDs that may contain the
>> > needed information or temporarily adjust min_size to allow recovery.
>> >
>> > We don't have the full history but the most probable cause of these
>> > incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
>> > incomplete pgs didn't have as many replica as the min_size value. So if
>> > setting min_size to 2 isn't enough setting it to 1 should unfreeze them.
>> >
>> > Lionel
>> >
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-03-05 Thread Mario Giammarco

Tried in all ways to recover pool (putting also osd out, scrub, etc.)
If there is no way to reset that four pgs or to understand why they are not
repariring themselves I will destroy the pool.
But destroying an entire pool only to unblock 4 pgs that are incomplete is
incredible.

Mario

2016-03-03 21:51 GMT+01:00 Dimitar Boichev <dimitar.boic...@axsmarine.com>:

> But the whole cluster or what ?
>
> Regards.
>
> *Dimitar Boichev*
> SysAdmin Team Lead
> AXSMarine Sofia
> Phone: +359 889 22 55 42
> Skype: dimitar.boichev.axsmarine
> E-mail: dimitar.boic...@axsmarine.com
>
> On Mar 3, 2016, at 22:47, Mario Giammarco <mgiamma...@gmail.com> wrote:
>
> Uses init script to restart
>
> *Da: *Dimitar Boichev
> *Inviato: *giovedì 3 marzo 2016 21:44
> *A: *Mario Giammarco
> *Cc: *Oliver Dzombic; ceph-users@lists.ceph.com
> *Oggetto: *Re: [ceph-users] Help: pool not responding
>
> I see a lot of people (including myself) ending with PGs that are stuck in
> “creating” state when you force create them.
>
> How did you restart ceph ?
> Mine were created fine after I restarted the monitor nodes after a minor
> version upgrade.
> Did you do it monitors firs, osds second, etc etc …..
>
> Regards.
>
>
> On Mar 3, 2016, at 13:13, Mario Giammarco <mgiamma...@gmail.com> wrote:
>
> I have tried "force create". It says "creating" but at the end problem
> persists.
> I have restarted ceph as usual.
> I am evaluating ceph and I am shocked because it semeed a very robust
> filesystem and now for a glitch I have an entire pool blocked and there is
> no simple procedure to force a recovery.
>
> 2016-03-02 18:31 GMT+01:00 Oliver Dzombic <i...@ip-interactive.de>:
>
>> Hi,
>>
>> i could also not find any delete, but a create.
>>
>> I found this here, its basically your situation:
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032412.html
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 02.03.2016 um 18:28 schrieb Mario Giammarco:
>> > Thans for info even if it is a bad info.
>> > Anyway I am reading docs again and I do not see a way to delete PGs.
>> > How can I remove them?
>> > Thanks,
>> > Mario
>> >
>> > 2016-03-02 17:59 GMT+01:00 Oliver Dzombic <i...@ip-interactive.de
>> > <mailto:i...@ip-interactive.de>>:
>> >
>> > Hi,
>> >
>> > as i see your situation, somehow this 4 pg's got lost.
>> >
>> > They will not recover, because they are incomplete. So there is no
>> data
>> > from which it could be recovered.
>> >
>> > So all what is left is to delete this pg's.
>> >
>> > Since all 3 osd's are in and up, it does not seem like you can
>> somehow
>> > access this lost pg's.
>> >
>> > --
>> > Mit freundlichen Gruessen / Best regards
>> >
>> > Oliver Dzombic
>> > IP-Interactive
>> >
>> > mailto:i...@ip-interactive.de <mailto:i...@ip-interactive.de>
>> >
>> > Anschrift:
>> >
>> > IP Interactive UG ( haftungsbeschraenkt )
>> > Zum Sonnenberg 1-3
>> > 63571 Gelnhausen
>> >
>> > HRB 93402 beim Amtsgericht Hanau
>> > Geschäftsführung: Oliver Dzombic
>> >
>> > Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201>
>> > UST ID: DE274086107
>> >
>> >
>> > Am 02.03.2016  um 17:45 schrieb Mario Giammarco:
>> > >
>> > >
>> > > Here it is:
>> > >
>> > >  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>> > >  health HEALTH_WARN
>> > > 4 pgs incomplete
>> > > 4 pgs stuck inactive
>> > > 4 pgs stuck unclean
>> > > 1 requests are blocked > 32 sec
>> > >  monmap e8: 3 mons at
>> > > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0
>> > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>
&

Re: [ceph-users] Help: pool not responding

2016-03-04 Thread Mario Giammarco

I have restarted each host using init scripts. Is there another way?

2016-03-03 21:51 GMT+01:00 Dimitar Boichev <dimitar.boic...@axsmarine.com>:

> But the whole cluster or what ?
>
> Regards.
>
> *Dimitar Boichev*
> SysAdmin Team Lead
> AXSMarine Sofia
> Phone: +359 889 22 55 42
> Skype: dimitar.boichev.axsmarine
> E-mail: dimitar.boic...@axsmarine.com
>
> On Mar 3, 2016, at 22:47, Mario Giammarco <mgiamma...@gmail.com> wrote:
>
> Uses init script to restart
>
> *Da: *Dimitar Boichev
> *Inviato: *giovedì 3 marzo 2016 21:44
> *A: *Mario Giammarco
> *Cc: *Oliver Dzombic; ceph-users@lists.ceph.com
> *Oggetto: *Re: [ceph-users] Help: pool not responding
>
> I see a lot of people (including myself) ending with PGs that are stuck in
> “creating” state when you force create them.
>
> How did you restart ceph ?
> Mine were created fine after I restarted the monitor nodes after a minor
> version upgrade.
> Did you do it monitors firs, osds second, etc etc …..
>
> Regards.
>
>
> On Mar 3, 2016, at 13:13, Mario Giammarco <mgiamma...@gmail.com> wrote:
>
> I have tried "force create". It says "creating" but at the end problem
> persists.
> I have restarted ceph as usual.
> I am evaluating ceph and I am shocked because it semeed a very robust
> filesystem and now for a glitch I have an entire pool blocked and there is
> no simple procedure to force a recovery.
>
> 2016-03-02 18:31 GMT+01:00 Oliver Dzombic <i...@ip-interactive.de>:
>
>> Hi,
>>
>> i could also not find any delete, but a create.
>>
>> I found this here, its basically your situation:
>>
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032412.html
>>
>> --
>> Mit freundlichen Gruessen / Best regards
>>
>> Oliver Dzombic
>> IP-Interactive
>>
>> mailto:i...@ip-interactive.de
>>
>> Anschrift:
>>
>> IP Interactive UG ( haftungsbeschraenkt )
>> Zum Sonnenberg 1-3
>> 63571 Gelnhausen
>>
>> HRB 93402 beim Amtsgericht Hanau
>> Geschäftsführung: Oliver Dzombic
>>
>> Steuer Nr.: 35 236 3622 1
>> UST ID: DE274086107
>>
>>
>> Am 02.03.2016 um 18:28 schrieb Mario Giammarco:
>> > Thans for info even if it is a bad info.
>> > Anyway I am reading docs again and I do not see a way to delete PGs.
>> > How can I remove them?
>> > Thanks,
>> > Mario
>> >
>> > 2016-03-02 17:59 GMT+01:00 Oliver Dzombic <i...@ip-interactive.de
>> > <mailto:i...@ip-interactive.de>>:
>> >
>> > Hi,
>> >
>> > as i see your situation, somehow this 4 pg's got lost.
>> >
>> > They will not recover, because they are incomplete. So there is no
>> data
>> > from which it could be recovered.
>> >
>> > So all what is left is to delete this pg's.
>> >
>> > Since all 3 osd's are in and up, it does not seem like you can
>> somehow
>> > access this lost pg's.
>> >
>> > --
>> > Mit freundlichen Gruessen / Best regards
>> >
>> > Oliver Dzombic
>> > IP-Interactive
>> >
>> > mailto:i...@ip-interactive.de <mailto:i...@ip-interactive.de>
>> >
>> > Anschrift:
>> >
>> > IP Interactive UG ( haftungsbeschraenkt )
>> > Zum Sonnenberg 1-3
>> > 63571 Gelnhausen
>> >
>> > HRB 93402 beim Amtsgericht Hanau
>> > Geschäftsführung: Oliver Dzombic
>> >
>> > Steuer Nr.: 35 236 3622 1 <tel:35%20236%203622%201>
>> > UST ID: DE274086107
>> >
>> >
>> > Am 02.03.2016  um 17:45 schrieb Mario Giammarco:
>> > >
>> > >
>> > > Here it is:
>> > >
>> > >  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>> > >  health HEALTH_WARN
>> > > 4 pgs incomplete
>> > > 4 pgs stuck inactive
>> > > 4 pgs stuck unclean
>> > > 1 requests are blocked > 32 sec
>> > >  monmap e8: 3 mons at
>> > > {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0
>> > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>
>> > > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}
>> > > election epoch 840, quorum 0,1,2 0,1,2
>> > >  osdmap e2405: 3 o

Re: [ceph-users] Help: pool not responding

2016-03-03 Thread Dimitar Boichev

But the whole cluster or what ?

Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boic...@axsmarine.com<mailto:dimitar.boic...@axsmarine.com>

On Mar 3, 2016, at 22:47, Mario Giammarco 
<mgiamma...@gmail.com<mailto:mgiamma...@gmail.com>> wrote:

Uses init script to restart

Da: Dimitar Boichev
Inviato: giovedì 3 marzo 2016 21:44
A: Mario Giammarco
Cc: Oliver Dzombic; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Oggetto: Re: [ceph-users] Help: pool not responding


I see a lot of people (including myself) ending with PGs that are stuck in 
“creating” state when you force create them.

How did you restart ceph ?
Mine were created fine after I restarted the monitor nodes after a minor 
version upgrade.
Did you do it monitors firs, osds second, etc etc …..

Regards.


On Mar 3, 2016, at 13:13, Mario Giammarco 
<mgiamma...@gmail.com<mailto:mgiamma...@gmail.com>> wrote:

I have tried "force create". It says "creating" but at the end problem persists.
I have restarted ceph as usual.
I am evaluating ceph and I am shocked because it semeed a very robust 
filesystem and now for a glitch I have an entire pool blocked and there is no 
simple procedure to force a recovery.

2016-03-02 18:31 GMT+01:00 Oliver Dzombic 
<i...@ip-interactive.de<mailto:i...@ip-interactive.de>>:
Hi,

i could also not find any delete, but a create.

I found this here, its basically your situation:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032412.html

--
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de<mailto:i...@ip-interactive.de>

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1<tel:35%20236%203622%201>
UST ID: DE274086107


Am 02.03.2016 um 18:28 schrieb Mario Giammarco:
> Thans for info even if it is a bad info.
> Anyway I am reading docs again and I do not see a way to delete PGs.
> How can I remove them?
> Thanks,
> Mario
>
> 2016-03-02 17:59 GMT+01:00 Oliver Dzombic 
> <i...@ip-interactive.de<mailto:i...@ip-interactive.de>
> <mailto:i...@ip-interactive.de<mailto:i...@ip-interactive.de>>>:
>
> Hi,
>
> as i see your situation, somehow this 4 pg's got lost.
>
> They will not recover, because they are incomplete. So there is no data
> from which it could be recovered.
>
> So all what is left is to delete this pg's.
>
> Since all 3 osd's are in and up, it does not seem like you can somehow
> access this lost pg's.
>
> --
> Mit freundlichen Gruessen / Best regards
>
> Oliver Dzombic
> IP-Interactive
>
> mailto:i...@ip-interactive.de<mailto:i...@ip-interactive.de> 
> <mailto:i...@ip-interactive.de<mailto:i...@ip-interactive.de>>
>
> Anschrift:
>
> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen
>
> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic
>
> Steuer Nr.: 35 236 3622 1<tel:35%20236%203622%201> 
> <tel:35%20236%203622%201>
> UST ID: DE274086107
>
>
> Am 02.03.2016 <tel:02.03.2016> um 17:45 
> schrieb Mario Giammarco:
> >
> >
> > Here it is:
> >
> >  cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
> >  health HEALTH_WARN
> > 4 pgs incomplete
> > 4 pgs stuck inactive
> > 4 pgs stuck unclean
> > 1 requests are blocked > 32 sec
> >  monmap e8: 3 mons at
> > 
> {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0<http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>
> <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>
> > <http://10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0>}
> > election epoch 840, quorum 0,1,2 0,1,2
> >  osdmap e2405: 3 osds: 3 up, 3 in
> >   pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects
> > 1090 GB used, 4481 GB / 5571 GB avail
> >  284 active+clean
> >4 incomplete
> >   client io 4008 B/s rd, 446 kB/s wr, 23 op/s
> >
> >
> > 2016-03-02 9:31 GMT+01:00 Shinobu Kinjo 
> <ski...@redhat.com<mailto:ski...@redhat.com>
> <mailto:ski...@redhat.com<mailto:ski...@redhat.com>>
> > <mailto:ski...@redhat.com<mailto:ski...@redhat.com> 
> <m

Re: [ceph-users] Help: pool not responding

2016-03-02 Thread Shinobu Kinjo

Is "ceph -s" still showing you same output?

> cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>  health HEALTH_WARN
> 4 pgs incomplete
> 4 pgs stuck inactive
> 4 pgs stuck unclean
>  monmap e8: 3 mons at
> {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
> election epoch 832, quorum 0,1,2 0,1,2
>  osdmap e2400: 3 osds: 3 up, 3 in
>   pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
> 1090 GB used, 4481 GB / 5571 GB avail
>  284 active+clean
>4 incomplete

Cheers,
S

- Original Message -
From: "Mario Giammarco" <mgiamma...@gmail.com>
To: "Lionel Bouton" <lionel-subscript...@bouton.name>
Cc: "Shinobu Kinjo" <ski...@redhat.com>, ceph-users@lists.ceph.com
Sent: Wednesday, March 2, 2016 4:27:15 PM
Subject: Re: [ceph-users] Help: pool not responding

Tried to set min_size=1 but unfortunately nothing has changed.
Thanks for the idea.

2016-02-29 22:56 GMT+01:00 Lionel Bouton <lionel-subscript...@bouton.name>:

> Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>
> the fact that they are optimized for benchmarks and certainly not
> Ceph OSD usage patterns (with or without internal journal).
>
> Are you assuming that SSHD is causing the issue?
> If you could elaborate on this more, it would be helpful.
>
>
> Probably not (unless they reveal themselves extremely unreliable with Ceph
> OSD usage patterns which would be surprising to me).
>
> For incomplete PG the documentation seems good enough for what should be
> done :
> http://docs.ceph.com/docs/master/rados/operations/pg-states/
>
> The relevant text:
>
> *Incomplete* Ceph detects that a placement group is missing information
> about writes that may have occurred, or does not have any healthy copies.
> If you see this state, try to start any failed OSDs that may contain the
> needed information or temporarily adjust min_size to allow recovery.
>
> We don't have the full history but the most probable cause of these
> incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
> incomplete pgs didn't have as many replica as the min_size value. So if
> setting min_size to 2 isn't enough setting it to 1 should unfreeze them.
>
> Lionel
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Shinobu Kinjo

> Probably not (unless they reveal themselves extremely unreliable with
> Ceph OSD usage patterns which would be surprising to me).

Thank you for letting me know your thought.
That does make sense.

Cheers,

- Original Message -
From: "Lionel Bouton" <lionel-subscript...@bouton.name>
To: "Shinobu Kinjo" <ski...@redhat.com>
Cc: "Mario Giammarco" <mgiamma...@gmail.com>, ceph-users@lists.ceph.com
Sent: Tuesday, March 1, 2016 6:56:05 AM
Subject: Re: [ceph-users] Help: pool not responding

Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>> the fact that they are optimized for benchmarks and certainly not
>> Ceph OSD usage patterns (with or without internal journal).
> Are you assuming that SSHD is causing the issue?
> If you could elaborate on this more, it would be helpful.

Probably not (unless they reveal themselves extremely unreliable with
Ceph OSD usage patterns which would be surprising to me).

For incomplete PG the documentation seems good enough for what should be
done :
http://docs.ceph.com/docs/master/rados/operations/pg-states/

The relevant text:

/Incomplete/
Ceph detects that a placement group is missing information about
writes that may have occurred, or does not have any healthy copies.
If you see this state, try to start any failed OSDs that may contain
the needed information or temporarily adjust min_size to allow recovery.

We don't have the full history but the most probable cause of these
incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
incomplete pgs didn't have as many replica as the min_size value. So if
setting min_size to 2 isn't enough setting it to 1 should unfreeze them.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Nmz


In my free time I`m trying to understand how CEPH tries to detect corrupted 
data.
You can look here 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-February/007680.html

Can you try to do md5sum on stucks PG from all OSD?

> Oliver Dzombic  writes:


>> Hi,

>> i dont know, but as it seems to me:

>> incomplete = not enough data

>> the only solution would be to drop it ( delete )

>> so the cluster get in active healthy state.

>> How many copies do you do from each data ?



> Do you mean dropping the pg not working or the entire pool?

> It is a pool with replication=3 and I had alway at least two osd on.

> Is replication=3 not enough?


> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Lionel Bouton

Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>> the fact that they are optimized for benchmarks and certainly not
>> Ceph OSD usage patterns (with or without internal journal).
> Are you assuming that SSHD is causing the issue?
> If you could elaborate on this more, it would be helpful.

Probably not (unless they reveal themselves extremely unreliable with
Ceph OSD usage patterns which would be surprising to me).

For incomplete PG the documentation seems good enough for what should be
done :
http://docs.ceph.com/docs/master/rados/operations/pg-states/

The relevant text:

/Incomplete/
Ceph detects that a placement group is missing information about
writes that may have occurred, or does not have any healthy copies.
If you see this state, try to start any failed OSDs that may contain
the needed information or temporarily adjust min_size to allow recovery.

We don't have the full history but the most probable cause of these
incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
incomplete pgs didn't have as many replica as the min_size value. So if
setting min_size to 2 isn't enough setting it to 1 should unfreeze them.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Shinobu Kinjo

> the fact that they are optimized for benchmarks and certainly not
> Ceph OSD usage patterns (with or without internal journal).

Are you assuming that SSHD is causing the issue?
If you could elaborate on this more, it would be helpful.

Cheers,
Shinobu

- Original Message -
From: "Lionel Bouton" <lionel-subscript...@bouton.name>
To: "Mario Giammarco" <mgiamma...@gmail.com>, ceph-users@lists.ceph.com
Sent: Tuesday, March 1, 2016 5:29:38 AM
Subject: Re: [ceph-users] Help: pool not responding

Le 29/02/2016 20:43, Mario Giammarco a écrit :
> [...]
> I said SSHD that is a standard hdd with ssd cache. It is 7200rpms but in
> benchmarks it is better than a 1rpm disk.

Lies, damn lies and benchmarks...
SSHD usually have very small flash caches (16GB or less for 500GB of
data or more) and AFAIK there's no distribution supporting cache hints
or to be of any use here Ceph OSD cache hint support : the drive makes
the decisions about when to use the cache and you can trust only one
thing: the fact that they are optimized for benchmarks and certainly not
Ceph OSD usage patterns (with or without internal journal).

There are probably 2 kinds of optimizations that SSHD can perform :
- buffering random writes with a writeback cache algorithm targeting
random writes. With only 8 to 16GB of flash this would probably mean
that under heavy random write usage (typical for OSD) the flash will die
very fast which would kill the entire drive or lose data and so it's
unlikely that this is what they use.
- write the most used data (what is first loaded on system boot and what
is most used) to the flash cache to speed up the OS boot sequence and
access to the most used applications or data. As OSDs don't have any
recognizable pattern this is useless in most cases.

So SSHD for OSD are almost certainly useless. You are better off saving
money by buying more ordinary HDD SATA drives or as many HDD and a few
good SSDs for journal if you can afford them.

In fact if the SSHD tries to cache writes and doesn't die early in the
process you may get even worse performance than a pure HDD setup because
most consumer-level SSD (and probably SSHD) are absolute crap for the
type of access Ceph OSD do with journals (see
http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
for the horror stories).

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco

Oliver Dzombic  writes:

> 
> Hi,
> 
> i dont know, but as it seems to me:
> 
> incomplete = not enough data
> 
> the only solution would be to drop it ( delete )
> 
> so the cluster get in active healthy state.
> 
> How many copies do you do from each data ?
> 

Do you mean dropping the pg not working or the entire pool?

It is a pool with replication=3 and I had alway at least two osd on.

Is replication=3 not enough?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Oliver Dzombic

Hi,

i dont know, but as it seems to me:

incomplete = not enough data

the only solution would be to drop it ( delete )

so the cluster get in active healthy state.

How many copies do you do from each data ?

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:i...@ip-interactive.de

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107


Am 29.02.2016 um 20:56 schrieb Mario Giammarco:
> Mario Giammarco  writes:
> 
> Sorry 
> ceph health detail is:
> 
> 
> HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
> pg 0.0 is stuck inactive for 4836623.776873, current state incomplete, last
> acting [0,1,3]
> pg 0.40 is stuck inactive for 2773379.028048, current state incomplete, last
> acting [1,0,3]
> pg 0.3f is stuck inactive for 4836763.332907, current state incomplete, last
> acting [0,3,1]
> pg 0.3b is stuck inactive for 4836777.230337, current state incomplete, last
> acting [0,3,1]
> pg 0.0 is stuck unclean for 4850437.633464, current state incomplete, last
> acting [0,1,3]
> pg 0.40 is stuck unclean for 4850437.633467, current state incomplete, last
> acting [1,0,3]
> pg 0.3f is stuck unclean for 4850456.399217, current state incomplete, last
> acting [0,3,1]
> pg 0.3b is stuck unclean for 4850490.534154, current state incomplete, last
> acting [0,3,1]
> pg 0.40 is incomplete, acting [1,0,3]
> pg 0.3f is incomplete, acting [0,3,1]
> pg 0.3b is incomplete, acting [0,3,1]
> pg 0.0 is incomplete, acting [0,1,3]
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco

Mario Giammarco  writes:

Sorry 
ceph health detail is:


HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean
pg 0.0 is stuck inactive for 4836623.776873, current state incomplete, last
acting [0,1,3]
pg 0.40 is stuck inactive for 2773379.028048, current state incomplete, last
acting [1,0,3]
pg 0.3f is stuck inactive for 4836763.332907, current state incomplete, last
acting [0,3,1]
pg 0.3b is stuck inactive for 4836777.230337, current state incomplete, last
acting [0,3,1]
pg 0.0 is stuck unclean for 4850437.633464, current state incomplete, last
acting [0,1,3]
pg 0.40 is stuck unclean for 4850437.633467, current state incomplete, last
acting [1,0,3]
pg 0.3f is stuck unclean for 4850456.399217, current state incomplete, last
acting [0,3,1]
pg 0.3b is stuck unclean for 4850490.534154, current state incomplete, last
acting [0,3,1]
pg 0.40 is incomplete, acting [1,0,3]
pg 0.3f is incomplete, acting [0,3,1]
pg 0.3b is incomplete, acting [0,3,1]
pg 0.0 is incomplete, acting [0,1,3]



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco

Thank you for your time.
Dimitar Boichev  writes:

> 
> I am sure that I speak for the majority of people reading this, when I say
that I didn't get anything from your emails.
> Could you provide more debug information ?
> Like (but not limited to):
> ceph -s 
> ceph health details
> ceph osd tree

I asked infact what I need to provide because honestly I do not know.

Here is ceph -s:

cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
 health HEALTH_WARN
4 pgs incomplete
4 pgs stuck inactive
4 pgs stuck unclean
 monmap e8: 3 mons at
{0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
election epoch 832, quorum 0,1,2 0,1,2
 osdmap e2400: 3 osds: 3 up, 3 in
  pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
1090 GB used, 4481 GB / 5571 GB avail
 284 active+clean
   4 incomplete

ceph health detail:

cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
 health HEALTH_WARN
4 pgs incomplete
4 pgs stuck inactive
4 pgs stuck unclean
 monmap e8: 3 mons at
{0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
election epoch 832, quorum 0,1,2 0,1,2
 osdmap e2400: 3 osds: 3 up, 3 in
  pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
1090 GB used, 4481 GB / 5571 GB avail
 284 active+clean
   4 incomplete

ceph osd tree:

ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 5.42999 root default 
-2 1.81000 host proxmox-quad3   
 0 1.81000 osd.0   up  1.0  1.0 
-3 1.81000 host proxmox-zotac   
 1 1.81000 osd.1   up  1.0  1.0 
-4 1.81000 host proxmox-hp  
 3 1.81000 osd.3   up  1.0  1.0 


> 
> I am really having a bad time trying to decode the exact problems.
> First you had network issues, then osd failed (in the same time or after?),
> Then the cluser did not have enough free space to recover I suppose  ?
> 
It is a three server/osd test/evaluation system with Ceph and Proxmox PVE.
The load is very light and there is a lot of free space.

So:

- I NEVER had network issues. People TOLD me that I must have network
problems. I changed cables and switches just in case but nothing improved. 
- One disk had bad sectors. So I added another disk/osd and then removed the
osd. Following official documentation. After that the cluster runned ok for
two months. So there was enough free space and the cluster has recovered.
- Then one day I discovered that proxmox backup was hanged and I see that it
was because ceph was not responding.


> Regarding the slow SSD disks, what disks are you using ?

I said SSHD that is a standard hdd with ssd cache. It is 7200rpms but in
benchmarks it is better than a 1rpm disk.

Thanks again,
Mario


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Dimitar Boichev

I am sure that I speak for the majority of people reading this, when I say that 
I didn't get anything from your emails.
Could you provide more debug information ?
Like (but not limited to):
ceph -s 
ceph health details
ceph osd tree
...

I am really having a bad time trying to decode the exact problems.
First you had network issues, then osd failed (in the same time or after?),
Then the cluser did not have enough free space to recover I suppose  ?

Regarding the slow SSD disks, what disks are you using ?
The majority of the issues with SSD disks are because people are using consumer 
grade disks that are not optimized for the load that ceph is producing.

Regards.

Dimitar Boichev
SysAdmin Team Lead
AXSMarine Sofia
Phone: +359 889 22 55 42
Skype: dimitar.boichev.axsmarine
E-mail: dimitar.boic...@axsmarine.com

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Shinobu Kinjo
Sent: Monday, February 29, 2016 1:32 PM
To: Mario Giammarco
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Help: pool not responding

> What can I do now? How can I debug?

I also would like to know more specific procedure to fix the issue under this 
situation.

Cheers,
Shinobu

- Original Message -
From: "Mario Giammarco" <mgiamma...@gmail.com>
To: ceph-users@lists.ceph.com
Sent: Monday, February 29, 2016 6:39:16 PM
Subject: Re: [ceph-users] Help: pool not responding

Ferhat Ozkasgarli <ozkasgarli@...> writes:

> 1-) One of the OSD nodes has network problem.
> 2-) Disk failure
> 3-) Not enough resource for OSD nodes
> 4-) Slow OSD Disks

I have replaced cables and switches. I am sure that there are no network 
problems. Disks are SSHD and so they are fast. Nodes memory is empty. I have a 
simple cluster with three nodes just to experiment. One disk brand new has 
failed some time ago and so I added a new osd and deleted the old one using 
official procedure in documentation.

What can I do now? How can I debug?

Thanks again,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Shinobu Kinjo

> What can I do now? How can I debug?

I also would like to know more specific procedure to fix the issue under this 
situation.

Cheers,
Shinobu

- Original Message -
From: "Mario Giammarco" <mgiamma...@gmail.com>
To: ceph-users@lists.ceph.com
Sent: Monday, February 29, 2016 6:39:16 PM
Subject: Re: [ceph-users] Help: pool not responding

Ferhat Ozkasgarli <ozkasgarli@...> writes:

> 1-) One of the OSD nodes has network problem.
> 2-) Disk failure
> 3-) Not enough resource for OSD nodes
> 4-) Slow OSD Disks

I have replaced cables and switches. I am sure that there are no network
problems. Disks are SSHD and so they are fast. Nodes memory is empty. I have
a simple cluster with three nodes just to experiment. One disk brand new has
failed some time ago and so I added a new osd and deleted the old one using
official procedure in documentation.

What can I do now? How can I debug?

Thanks again,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-29 Thread Mario Giammarco

Ferhat Ozkasgarli  writes:

> 1-) One of the OSD nodes has network problem.
> 2-) Disk failure
> 3-) Not enough resource for OSD nodes
> 4-) Slow OSD Disks

I have replaced cables and switches. I am sure that there are no network
problems. Disks are SSHD and so they are fast. Nodes memory is empty. I have
a simple cluster with three nodes just to experiment. One disk brand new has
failed some time ago and so I added a new osd and deleted the old one using
official procedure in documentation.

What can I do now? How can I debug?

Thanks again,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-16 Thread Mario Giammarco

Mark Nelson  writes:

> PGs are pool specific, so the other pool may be totally healthy while 
> the first is not.  If it turns out it's a hardware problem, it's also 
> possible that the 2nd pool may not hit all of the same OSDs as the first 
> pool, especially if it has a low PG count.
> 

Just to be clear: I have a cluster with three servers and three osds. The
replica count is three so it is impossible that I am not touching all osds.

How can I tell ceph to discard those pgs?

Thanks again for help,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Mark Nelson


On 02/15/2016 07:34 AM, Mario Giammarco wrote:

Karan Singh  writes:



Agreed to Ferhat.

Recheck your network ( bonds , interfaces , network switches , even cables

)

I use gigabit ethernet, I am checking the network.
But I am using another pool on the same cluster and it works perfectly: why?


PGs are pool specific, so the other pool may be totally healthy while 
the first is not.  If it turns out it's a hardware problem, it's also 
possible that the 2nd pool may not hit all of the same OSDs as the first 
pool, especially if it has a low PG count.


Mark



Thanks again,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Mario Giammarco

Karan Singh  writes:

> Agreed to Ferhat.
> 
> Recheck your network ( bonds , interfaces , network switches , even cables 
) 

I use gigabit ethernet, I am checking the network.
But I am using another pool on the same cluster and it works perfectly: why?

Thanks again,
Mario

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Karan Singh

Hey Mario

Agreed to Ferhat.

Recheck your network ( bonds , interfaces , network switches , even cables ) I 
have seen this several times before and in most of the cases its because of 
network.
BTW are you using Mellanox ?

- Karan -

> On 15 Feb 2016, at 10:12, Mario Giammarco  wrote:
> 
> koukou73gr  writes:
> 
>> 
>> Have you tried restarting  osd.0 ?
>> 
> Yes I have restarted all osds many times.
> Also launched repair and scrub.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-15 Thread Mario Giammarco

koukou73gr  writes:

> 
> Have you tried restarting  osd.0 ?
> 
Yes I have restarted all osds many times.
Also launched repair and scrub.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-14 Thread koukou73gr

Have you tried restarting  osd.0 ?

-K.

On 02/14/2016 09:56 PM, Mario Giammarco wrote:
> Hello,
> I am using ceph hammer under proxmox. 
> I have working cluster it is several month I am using it.
> For reasons yet to discover I am now in this situation:
> 
> HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean; 7 
> requests are blocked > 32 sec; 1 osds have slow requests
> pg 0.0 is stuck inactive for 3541712.92, current state incomplete, last 
> acting [0,1,3]
> pg 0.40 is stuck inactive for 1478467.695684, current state incomplete, 
> last acting [1,0,3]
> pg 0.3f is stuck inactive for 3541852.000546, current state incomplete, 
> last acting [0,3,1]
> pg 0.3b is stuck inactive for 3541865.897979, current state incomplete, 
> last acting [0,3,1]
> pg 0.0 is stuck unclean for 326.301120, current state incomplete, last 
> acting [0,1,3]
> pg 0.40 is stuck unclean for 326.301128, current state incomplete, last 
> acting [1,0,3]
> pg 0.3f is stuck unclean for 345.066879, current state incomplete, last 
> acting [0,3,1]
> pg 0.3b is stuck unclean for 379.201819, current state incomplete, last 
> acting [0,3,1]
> pg 0.40 is incomplete, acting [1,0,3]
> pg 0.3f is incomplete, acting [0,3,1]
> pg 0.3b is incomplete, acting [0,3,1]
> pg 0.0 is incomplete, acting [0,1,3]
> 7 ops are blocked > 2097.15 sec
> 7 ops are blocked > 2097.15 sec on osd.0
> 1 osds have slow requests
> 
> 
> Problem is that when I try to read or write to pool "rbd" (where I have all 
> my virtual machines) ceph starts to log "slow reads" and system hungs.
> If in the same cluster I create another pool and inside it I create an 
> image I can read and write correctly (and fast) so it seems the cluster is 
> working and only the pool is not working.
> 
> Can you help me?
> Thanks,
> Mario
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Help: pool not responding

2016-02-14 Thread Ferhat Ozkasgarli

Hello Mario,

This kind of problem usually happens for following reasons:

1-) One of the OSD nodes has network problem.
2-) Disk failure
3-) Not enough resource for OSD nodes
4-) Slow OSD Disks

This happened before me. The problem was network cable problem. As soon as
I replaced the cable, everything was fine and dandy.

On Sun, Feb 14, 2016 at 9:56 PM, Mario Giammarco 
wrote:

> Hello,
> I am using ceph hammer under proxmox.
> I have working cluster it is several month I am using it.
> For reasons yet to discover I am now in this situation:
>
> HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean; 7
> requests are blocked > 32 sec; 1 osds have slow requests
> pg 0.0 is stuck inactive for 3541712.92, current state incomplete, last
> acting [0,1,3]
> pg 0.40 is stuck inactive for 1478467.695684, current state incomplete,
> last acting [1,0,3]
> pg 0.3f is stuck inactive for 3541852.000546, current state incomplete,
> last acting [0,3,1]
> pg 0.3b is stuck inactive for 3541865.897979, current state incomplete,
> last acting [0,3,1]
> pg 0.0 is stuck unclean for 326.301120, current state incomplete, last
> acting [0,1,3]
> pg 0.40 is stuck unclean for 326.301128, current state incomplete, last
> acting [1,0,3]
> pg 0.3f is stuck unclean for 345.066879, current state incomplete, last
> acting [0,3,1]
> pg 0.3b is stuck unclean for 379.201819, current state incomplete, last
> acting [0,3,1]
> pg 0.40 is incomplete, acting [1,0,3]
> pg 0.3f is incomplete, acting [0,3,1]
> pg 0.3b is incomplete, acting [0,3,1]
> pg 0.0 is incomplete, acting [0,1,3]
> 7 ops are blocked > 2097.15 sec
> 7 ops are blocked > 2097.15 sec on osd.0
> 1 osds have slow requests
>
>
> Problem is that when I try to read or write to pool "rbd" (where I have all
> my virtual machines) ceph starts to log "slow reads" and system hungs.
> If in the same cluster I create another pool and inside it I create an
> image I can read and write correctly (and fast) so it seems the cluster is
> working and only the pool is not working.
>
> Can you help me?
> Thanks,
> Mario
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Help: pool not responding

2016-02-14 Thread Mario Giammarco

Hello,
I am using ceph hammer under proxmox. 
I have working cluster it is several month I am using it.
For reasons yet to discover I am now in this situation:

HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck unclean; 7 
requests are blocked > 32 sec; 1 osds have slow requests
pg 0.0 is stuck inactive for 3541712.92, current state incomplete, last 
acting [0,1,3]
pg 0.40 is stuck inactive for 1478467.695684, current state incomplete, 
last acting [1,0,3]
pg 0.3f is stuck inactive for 3541852.000546, current state incomplete, 
last acting [0,3,1]
pg 0.3b is stuck inactive for 3541865.897979, current state incomplete, 
last acting [0,3,1]
pg 0.0 is stuck unclean for 326.301120, current state incomplete, last 
acting [0,1,3]
pg 0.40 is stuck unclean for 326.301128, current state incomplete, last 
acting [1,0,3]
pg 0.3f is stuck unclean for 345.066879, current state incomplete, last 
acting [0,3,1]
pg 0.3b is stuck unclean for 379.201819, current state incomplete, last 
acting [0,3,1]
pg 0.40 is incomplete, acting [1,0,3]
pg 0.3f is incomplete, acting [0,3,1]
pg 0.3b is incomplete, acting [0,3,1]
pg 0.0 is incomplete, acting [0,1,3]
7 ops are blocked > 2097.15 sec
7 ops are blocked > 2097.15 sec on osd.0
1 osds have slow requests


Problem is that when I try to read or write to pool "rbd" (where I have all 
my virtual machines) ceph starts to log "slow reads" and system hungs.
If in the same cluster I create another pool and inside it I create an 
image I can read and write correctly (and fast) so it seems the cluster is 
working and only the pool is not working.

Can you help me?
Thanks,
Mario



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Help: pool not responding] Now osd crash

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

Re: [ceph-users] Help: pool not responding

[ceph-users] Help: pool not responding

24 matches

Site Navigation

Mail list logo

Footer information