Success! Hopefully my notes from the process will help:
In the event of multiple disk failures the cluster could lose PGs. Should this
occur it is best to attempt to restart the OSD process and have the drive
marked as up+out. Marking the drive as out will cause data to flow off the
drive to
On Apr 7, 2015, at 7:44 PM, Francois Lafont wrote:
Chris Kitzmiller wrote:
I graph aggregate stats for `ceph --admin-daemon
/var/run/ceph/ceph-osd.$osdid.asok perf dump`. If the max latency strays too
far
outside of my mean latency I know to go look for the troublemaker. My graphs
look
I'm not having much luck here. Is there a possibility that the imported PGs
aren't being picked up because the MONs think that they're older than the empty
PGs I find on the up OSDs?
I feel that I'm so close to *not* losing my RBD volume because I only have two
bad PGs and I've successfully
and size = 2.
On Thu, Apr 2, 2015 at 10:20 PM, Chris Kitzmiller ca...@hampshire.edu wrote:
On Apr 3, 2015, at 12:37 AM, LOPEZ Jean-Charles jelo...@redhat.com wrote:
according to your ceph osd tree capture, although the OSD reweight is set
to 1, the OSD CRUSH weight is set to 0 (2nd column
On Apr 6, 2015, at 7:04 PM, Robert LeBlanc rob...@leblancnet.us wrote:
I see that ceph has 'ceph osd perf' that gets the latency of the OSDs.
Is there a similar command that would provide some performance data
about RBDs in use? I'm concerned about out ability to determine which
RBD(s) may be
On Apr 3, 2015, at 12:37 AM, LOPEZ Jean-Charles jelo...@redhat.com wrote:
according to your ceph osd tree capture, although the OSD reweight is set to
1, the OSD CRUSH weight is set to 0 (2nd column). You need to assign the OSD
a CRUSH weight so that it can be selected by CRUSH: ceph osd
On Apr 3, 2015, at 12:37 AM, LOPEZ Jean-Charles jelo...@redhat.com wrote:
according to your ceph osd tree capture, although the OSD reweight is set to
1, the OSD CRUSH weight is set to 0 (2nd column). You need to assign the OSD
a CRUSH weight so that it can be selected by CRUSH: ceph osd
On Oct 28, 2014, at 5:20 PM, Lincoln Bryant wrote:
Hi Greg, Loic,
I think we have seen this as well (sent a mail to the list a week or so ago
about incomplete pgs). I ended up giving up on the data and doing a
force_create_pgs after doing a find on my OSDs and deleting the relevant pg
I have a number of PGs which are marked as incomplete. I'm at a loss for how to
go about recovering these PGs and believe they're suffering from the lost
time symptom. How do I recover these PGs? I'd settle for sacrificing the lost
time and just going with what I've got. I've lost the ability
On Oct 22, 2014, at 8:22 PM, Craig Lewis wrote:
Shot in the dark: try manually deep-scrubbing the PG. You could also try
marking various osd's OUT, in an attempt to get the acting set to include
osd.25 again, then do the deep-scrub again. That probably won't help though,
because the pg
, what's up with osd.-1?
On Tue, Oct 21, 2014 at 7:04 PM, Chris Kitzmiller ckitzmil...@hampshire.edu
wrote:
I've gotten myself into the position of having ~100 incomplete PGs. All of
my OSDs are up+in (and I've restarted them all one by one).
I was in the process of rebalancing after
On Oct 22, 2014, at 7:51 PM, Craig Lewis wrote:
On Wed, Oct 22, 2014 at 3:09 PM, Chris Kitzmiller ckitzmil...@hampshire.edu
wrote:
On Oct 22, 2014, at 1:50 PM, Craig Lewis wrote:
Incomplete means Ceph detects that a placement group is missing a
necessary period of history from its log
On Aug 5, 2014, at 12:43 PM, Mark Nelson wrote:
On 08/05/2014 08:42 AM, Mariusz Gronczewski wrote:
On Mon, 04 Aug 2014 15:32:50 -0500, Mark Nelson mark.nel...@inktank.com
wrote:
On 08/04/2014 03:28 PM, Chris Kitzmiller wrote:
On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
I got
On Aug 2, 2014, at 12:03 AM, Christian Balzer wrote:
On Fri, 1 Aug 2014 14:23:28 -0400 Chris Kitzmiller wrote:
I have 3 nodes each running a MON and 30 OSDs.
Given the HW you list below, that might be a tall order, particular CPU
wise in certain situations.
I'm not seeing any dramatic
On Aug 1, 2014, at 1:31 PM, Mariusz Gronczewski wrote:
I got weird stalling during writes, sometimes I got same write speed
for few minutes and after some time it starts stalling with 0 MB/s for
minutes
I'm getting very similar behavior on my cluster. My writes start well but then
just kinda
I have 3 nodes each running a MON and 30 OSDs. When I test my cluster with
either rados bench or with fio via a 10GbE client using RBD I get great initial
speeds 900MBps and I max out my 10GbE links for a while. Then, something goes
wrong the performance falters and the cluster stops responding
I found this article very interesting:
http://techreport.com/review/26523/the-ssd-endurance-experiment-casualties-on-the-way-to-a-petabyte
I've got Samsung 840 Pros and while I'm thinking that I wouldn't go with them
again I am interested in the fact that (in this anecdotal experiment) it
I've got a 3 node cluster where ceph osd perf reports reasonable
fs_apply_latency for 2 out of 3 of my nodes (~30ms). But on the third node I've
got latencies averaging 15000+ms for all OSDs.
Running ceph 72.2 on Ubuntu 10.13. Each node has 30 HDDs with 6 SSDs for
journals. iperf reports full
18 matches
Mail list logo