On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> Thanks for the log dump command, I'll keep that in the back pocket, it
> would have been helpful in a few situations.
>
> I'm trying to microbenchmark the new Weighted Round Robin queue I've
> been working on and just trying to dump the info to the logs
Thanks for the log dump command, I'll keep that in the back pocket, it
would have been helpful in a few situations.
I'm trying to microbenchmark the new Weighted Round Robin queue I've
been working on and just trying to dump the info to the logs so that I
can see it at runtime. So this is in a bra
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
I saw posts about that in the mailing lists. According to SAR, there
wasn't an abnormal amount of page faults. We have swap disabled and
have min_kbytes_free set to 6GB which has worked well for us so far.
We kicked around still setting swappiness to
On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> Is there a way through the admin socket or inject args that can tell
> the OSD process to dump the in memory logs without crashing? Do you
Yep, 'ceph daemon osd.NN log dump'.
> have an idea of the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
Is there a way through the admin socket or inject args that can tell
the OSD process to dump the in memory logs without crashing? Do you
have an idea of the overhead? From the code it looks like it is always
evaluated, just depends on if it is stored
FWIW, if you've got collectl per-process logs, you might look for major
pagefaults associated with the osd processes. I've seen process
swapping cause heartbeat timeouts in the past. Not to say that's the
issue, but worth confirming it's not happening.
Mark
On 11/23/2015 01:03 PM, Robert Le
On Mon, 23 Nov 2015, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> We set the debugging to 0/0, but are you talking about lines like:
>
>-12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
> heartbeat_check: no reply from osd.133 since back 2015-11-2
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
We set the debugging to 0/0, but are you talking about lines like:
-12> 2015-11-20 20:59:47.138746 7f70067de700 -1 osd.177 103793
heartbeat_check: no reply from osd.133 since back 2015-11-20
20:57:32.413156 front 2015-11-20 20:57:32.413156 (cutof
On Mon, Nov 23, 2015 at 12:03 PM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> This is one of our production clusters which is dual 40 Gb Ethernet
> using VLANs for cluster and public networks. I don't think this is
> unusual, not like my dev cluster which runs Inf
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
This is one of our production clusters which is dual 40 Gb Ethernet
using VLANs for cluster and public networks. I don't think this is
unusual, not like my dev cluster which runs Infiniband and IPoIB. The
client nodes are connected at 10 GB Ethernet.
On Mon, Nov 23, 2015 at 11:27 AM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> I checked the SAR data and the disks for all the OSDs showed usual
> performance until 20:57:32 when over the next few minutes the I/OPs,
> bandwidth and latency all decreased. The only
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
I checked the SAR data and the disks for all the OSDs showed usual
performance until 20:57:32 when over the next few minutes the I/OPs,
bandwidth and latency all decreased. The only thing that I can think
of is that some replies to the client got hun
On Mon, Nov 23, 2015 at 11:03 AM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> The backtrace is:
>
> 2015-11-20 20:59:48.856679 7f7012ff7700 -1 common/HeartbeatMap.cc: In
> function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*,
> const char*, time_t)'
On Sat, Nov 21, 2015 at 1:34 AM, Robert LeBlanc wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> We had two interesting issues today. In both cases multiple OSDs
> suicided at the exact same moment. The first incident had four OSDs,
> the second had 12.
>
> First set:
> 145,159,79,17
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256
We had two interesting issues today. In both cases multiple OSDs
suicided at the exact same moment. The first incident had four OSDs,
the second had 12.
First set:
145,159,79,176
Second Set:
osd.177 down at 20:59:48,
osd.131, osd.136, osd.133, osd.
15 matches
Mail list logo