Hi,
Something interesting, osd whith problems eats much more memory.
Standard is about 300m,
This osd eats even 30G.
Can i do any tests to help find where the problem is?
--
Regards
Dominik
2013/7/16 Dominik Mostowiec :
> Hi,
> I noticed that problem is more frequent at nigth where traffic is sm
Hi,
I noticed that problem is more frequent at nigth where traffic is smaller.
Mabye it is caused by scrubbing (multiple scrubbings on one osd) and
to small "filestore op threads" or something other num threads
settings in my config:
osd heartbeat grace = 15
filestore flush min = 0
I reported bug: http://tracker.ceph.com/issues/5504
--
Regards
Dominik
2013/7/2 Dominik Mostowiec :
> Hi,
> Some osd.87 performance graphs:
> https://www.dropbox.com/s/o07wae2041hu06l/osd_87_performance.PNG
> After 11.05 I have restarted it.
>
> Mons .., maybe this is the problem.
>
> --
> Regard
Hi,
Some osd.87 performance graphs:
https://www.dropbox.com/s/o07wae2041hu06l/osd_87_performance.PNG
After 11.05 I have restarted it.
Mons .., maybe this is the problem.
--
Regards
Dominik
2013/7/2 Andrey Korolyov :
> Hi Dominik,
>
> What`s about performance on the osd.87 at this moment, do you
Hi,
I got it.
ceph health details
HEALTH_WARN 3 pgs peering; 3 pgs stuck inactive; 5 pgs stuck unclean;
recovery 64/38277874 degraded (0.000%)
pg 5.df9 is stuck inactive for 138669.746512, current state peering,
last acting [87,2,151]
pg 5.a82 is stuck inactive for 138638.121867, current state pee
Hi Dominik,
What`s about performance on the osd.87 at this moment, do you have any
related measurements?
As for mine version of this issue, seems that quorum has some kind of
degradation over time - when I restarted mons, problem has gone and peering
time lowered by factor of ten or so. Also see
That`s not a loop as it looks, sorry - I had reproduced issue many
times and there is no such cpu-eating behavior in most cases, only
locked pgs are presented. Also I may celebrate returning of 'wrong
down mark' bug, at least for the 0.61.4 tag. For first one, I`ll send
a link with core as quick a
I have only 'ceph healht details' from previous crash.
ceph health details
HEALTH_WARN 6 pgs peering; 9 pgs stuck unclean
pg 3.c62 is stuck unclean for 583.220063, current state active, last
acting [57,23,51]
pg 4.269 is stuck unclean for 4842.519837, current state peering, last
acting [23,57,106]
> Ver. 0.56.6
> Hmm, osd not died, 1 or more pg stack on peereng on it.
Can you get a pgid from 'ceph health detail' and then do 'ceph pg
query' and attach that output?
Thanks!
sage
>
> Regards
> Dominik
>
> On Jun 28, 2013 11:28 PM, "Sage Weil" wrote:
> On Sat, 29 Jun 2013, Andrey K
Ver. 0.56.6
Hmm, osd not died, 1 or more pg stack on peereng on it.
Regards
Dominik
On Jun 28, 2013 11:28 PM, "Sage Weil" wrote:
> On Sat, 29 Jun 2013, Andrey Korolyov wrote:
> > There is almost same problem with the 0.61 cluster, at least with same
> > symptoms. Could be reproduced quite easily
On Sat, 29 Jun 2013, Andrey Korolyov wrote:
> There is almost same problem with the 0.61 cluster, at least with same
> symptoms. Could be reproduced quite easily - remove an osd and then
> mark it as out and with quite high probability one of neighbors will
> be stuck at the end of peering process
Today I have peereng problem not when I put osd.71 out, but in normal CEPH work.
Regards
Dominik
2013/6/28 Andrey Korolyov :
> There is almost same problem with the 0.61 cluster, at least with same
> symptoms. Could be reproduced quite easily - remove an osd and then
> mark it as out and with qui
There is almost same problem with the 0.61 cluster, at least with same
symptoms. Could be reproduced quite easily - remove an osd and then
mark it as out and with quite high probability one of neighbors will
be stuck at the end of peering process with couple of peering pgs with
primary copy on it.
Hi,
We took osd.71 out and now problem is on osd.57.
Something curious, op_rw on osd.57 is much higher than other.
See here: https://www.dropbox.com/s/o5q0xi9wbvpwyiz/op_rw_osd57.PNG
On data on this osd I found:
> data/osd.57/current# du -sh omap/
> 2.3Gomap/
That much higher op_rw on one osd
On Thu, Jun 13, 2013 at 6:33 AM, SÅ‚awomir Skowron wrote:
> Hi, sorry for late response.
>
> https://docs.google.com/file/d/0B9xDdJXMieKEdHFRYnBfT3lCYm8/view
>
> Logs in attachment, and on google drive, from today.
>
> https://docs.google.com/file/d/0B9xDdJXMieKEQzVNVHJ1RXFXZlU/view
>
> We have suc
We don't have your logs (vger doesn't forward them). Can you describe
the situation more completely in terms of what failures occurred and
what steps you took?
(Also, this should go on ceph-users. Adding that to the recipients list.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.co
16 matches
Mail list logo