Agreed on not going the disks until your cluster is healthy again. Making
them out and seeing how healthy you can get in the meantime is a good idea.
On Sun, Sep 2, 2018, 1:18 PM Ronny Aasen wrote:
> On 02.09.2018 17:12, Lee wrote:
> > Should I just out the OSD's first or completely zap them
On 02.09.2018 17:12, Lee wrote:
Should I just out the OSD's first or completely zap them and recreate?
Or delete and let the cluster repair itself?
On the second node when it started back up I had problems with the
Journals for ID 5 and 7 they were also recreated all the rest are
still the
Ok, rather than going gunhoe at this..
1. I have set out, 31,24,21,18,15,14,13,6 and 7,5 (10 is a new OSD)
Which gives me
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 23.65970 root default
-5 8.18990 host data33-a4
13 0.90999 osd.13 up0
Should I just out the OSD's first or completely zap them and recreate? Or
delete and let the cluster repair itself?
On the second node when it started back up I had problems with the Journals
for ID 5 and 7 they were also recreated all the rest are still the
originals.
I know that some PG's are
The problem is with never getting a successful run of `ceph-osd
--flush-journal` on the old SSD journal drive. All of the OSDs that used
the dead journal need to be removed from the cluster, wiped, and added back
in. The data on them is not 100% consistent because the old journal died.
Any word
I followed:
$ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid)
$ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal'
--partition-guid=1:$journal_uuid
--typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk
Then
$ sudo ceph-osd --mkjournal -i 20
$ sudo
>
>
> Hi David,
>
> Yes heath detail outputs all the errors etc and recovery / backfill is
> going on, just taking time 25% misplaced and 1.5 degraded.
>
> I can list out the pools and see sizes etc..
>
> My main problem is I have no client IO from a read perspective, I cannot
> start vms I'm
Hi David,
Yes heath detail outputs all the errors etc and recovery / backfill is
going on, just taking time 25% misplaced and 1.5 degraded.
I can list out the pools and see sizes etc..
My main problem is I have no client IO from a read perspective, I cannot
start vms I'm openstack and ceph -w
When the first node went offline with a dead SSD journal, all of the dates
on the OSDs was useless. Unless you could flush the journals, you can't
guarantee that a wire the cluster think happened actually made it to the
disk. The proper procedure here is to remove those OSDs and add them again
as
Does "ceph health detail" work?
Have you manually confirmed the OSDs on the nodes are working?
What was the replica size of the pools?
Are you seeing any progress with the recovery?
On Sun, Sep 2, 2018 at 9:42 AM Lee wrote:
> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is
Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x OSD
Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
with 2 nodes running correctly, 2 hours later a second OSD node failed
complaining
11 matches
Mail list logo