On Fri, Mar 17, 2017 at 7:43 PM, Laszlo Budai wrote:
> Hello Brad,
>
> I've fond the reason for the segfault. On the OSD servers the
> /etc/cep/ceph.cilent.admin.keyring file was missing. This showed up when
> I've set the debugging parameters you've suggested.
That
Hello Brad,
I've fond the reason for the segfault. On the OSD servers the
/etc/cep/ceph.cilent.admin.keyring file was missing. This showed up when I've
set the debugging parameters you've suggested.
Once I've copied the file from the monitor, the import-rados has worked out.
Now the cluster
So I've tested this procedure locally and it works successfully for me.
$ ./ceph -v
*** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
$ ./ceph-objectstore-tool import-rados rbd 0.3.export
Importing from pgid 0.3
My mistake, I've run it on a wrong system ...
I've attached the terminal output.
I've run this on a test system where I was getting the same segfault when
trying import-rados.
Kind regards,
Laszlo
On 16.03.2017 07:41, Laszlo Budai wrote:
[root@storage2 ~]# gdb -ex 'r' -ex 't a a bt full'
[root@storage2 ~]# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args
ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
Can you install the debuginfo for ceph (how this works depends on your
distro) and run the following?
# gdb -ex 'r' -ex 't a a bt full' -ex 'q' --args ceph-objectstore-tool
import-rados volumes pg.3.367.export.OSD.35
On Thu, Mar 16, 2017 at 12:02 AM, Laszlo Budai wrote:
Hello,
the ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35 command
crashes.
~# ceph-objectstore-tool import-rados volumes pg.3.367.export.OSD.35
*** Caught signal (Segmentation fault) **
in thread 7f85b60e28c0
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
Ok.
Delete the dirs using the ceph-objectstore-tool. DONE
ceph pg force_create_pg 3.367 lead me to this state:
HEALTH_WARN 1 pgs stuck inactive; 1 pgs stuck unclean; 16 requests are blocked
> 32 sec; 2 osds have slow requests; noout flag(s) set
pg 3.367 is stuck inactive since forever, current
Hello,
So, I've done the following seps:
1. set noout
2. stop osd2
3. ceph-objectstore-tool remove
4. start osd2
5. repeat step 2-4 on osd 28 and 35
then I've run the ceph pg force_create_pg 3.367.
This has left the PG in creating state:
# ceph -s
cluster
Decide which copy you want to keep and export that with ceph-objectstore-tool
Delete all copies on all OSDs with ceph-objectstore-tool (not by
deleting the directory on the disk).
Use force_create_pg to recreate the pg empty.
Use ceph-objectstore-tool to do a rados import on the exported pg
Hello,
I have tried to recover the pg using the following steps:
Preparation:
1. set noout
2. stop osd.2
3. use ceph-objectstore-tool to export from osd2
4. start osd.2
5. repeat step 2-4 on osd 35,28, 63 (I've done these hoping to be able to use
one of those exports to recover the PG)
First
On Sun, Mar 12, 2017 at 7:51 PM, Laszlo Budai wrote:
> Hello,
>
> I have already done the export with ceph_objectstore_tool. I just have to
> decide which OSDs to keep.
> Can you tell me why the directory structure in the OSDs is different for the
> same PG when checking
Hello,
I have already done the export with ceph_objectstore_tool. I just have to
decide which OSDs to keep.
Can you tell me why the directory structure in the OSDs is different for the
same PG when checking on different OSDs?
For instance, in OSD 2 and 63 there are NO subdirectories in the
On Sat, Mar 11, 2017 at 7:43 PM, Laszlo Budai wrote:
> Hello,
>
> Thank you for your answer.
>
> indeed the min_size is 1:
>
> # ceph osd pool get volumes size
> size: 3
> # ceph osd pool get volumes min_size
> min_size: 1
> #
> I'm gonna try to find the mentioned
So this is why it happened I guess.
pool 3 'volumes' replicated size 3 min_size 1
min_size = 1 is a recipe for disasters like this and there are plenty
of ML threads about not setting it below 2.
The past intervals in the pg query show several intervals where a
single OSD may have gone rw.
How
The OSDs are all there.
$ sudo ceph osd stat
osdmap e60609: 72 osds: 72 up, 72 in
an I have attached the result of ceph osd tree, and ceph osd dump commands.
I got some extra info about the network problem. A faulty network device has
flooded the network eating up all the bandwidth so the
To me it looks like someone may have done an "rm" on these OSDs but
not removed them from the crushmap. This does not happen
automatically.
Do these OSDs show up in "ceph osd tree" and "ceph osd dump" ? If so,
paste the output.
Without knowing what exactly happened here it may be difficult to
Hello,
I was informed that due to a networking issue the ceph cluster network was
affected. There was a huge packet loss, and network interfaces were flipping.
That's all I got.
This outage has lasted a longer period of time. So I assume that some OSD may
have been considered dead and the
Can you explain more about what happened?
The query shows progress is blocked by the following OSDs.
"blocked_by": [
14,
17,
51,
58,
63,
64,
Hello,
On Thu, 8 Oct 2015 11:27:46 +0800 (CST) wikison wrote:
> Hi,
> I've removed the rbd pool and created it again. It picked up my
> default settings but there are still some problems. After running "sudo
> ceph -s", the output is as follow:
> cluster
Here, like this :
esta@monitorOne:~$ sudo ceph osd tree
ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
-3 4.39996 root defualt
-2 1.0 host storageTwo
0 0.0 osd.0 up 1.0 1.0
1 1.0 osd.1 up 1.0
Hello,
On Thu, 8 Oct 2015 12:21:40 +0800 (CST) wikison wrote:
> Here, like this :
> esta@monitorOne:~$ sudo ceph osd tree
> ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY
> -3 4.39996 root defualt
That's your problem. It should be "default"
Your
Hi,
I've removed the rbd pool and created it again. It picked up my default
settings but there are still some problems.
After running "sudo ceph -s", the output is as follow:
cluster 0b9b05db-98fe-49e6-b12b-1cce0645c015
health HEALTH_WARN
512 pgs stuck
One possibility, it may be that the crush map is not creating. Look at your
/etc/ceph/ceph.conf file and see if you have something under the OSD
section (actually could be in global too) that looks like the following:
osd crush update on start = false
If that line is there and if you're not
Hello,
On Wed, 7 Oct 2015 12:57:58 +0800 (CST) wikison wrote:
This is a very old bug, misfeature.
And creeps up every week or so here, google is your friend.
> Hi,
> I have a cluster of one monitor and eight OSDs. These OSDs are running
> on four hosts(each host has two OSDs). When I set up
25 matches
Mail list logo