Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-04 Thread Bryan Henderson
>You can export and import PG's using ceph_objectstore_tool, but if the osd
>won't start you may have trouble exporting a PG.

I believe the very purpose of ceph-objectstore-tool is to manipulate OSDs
while they aren't running.

If the crush map says these PGs that are on the broken OSD belong on another
OSD (which I guess it ought to, since the OSD is out), ceph-objecstore-tool is
what you would use to move them over there manually, since ordinary peering
can't do it.

-- 
Bryan Henderson   San Jose, California
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
Forgive the wall of text, i shortened it a little here is the osd
log when I attempt to start the osd:

2018-08-04 03:53:28.917418 7f3102aa87c0  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-21) detect_feature: extsize is
disabled by conf
2018-08-04 03:53:28.977564 7f3102aa87c0  0
filestore(/var/lib/ceph/osd/ceph-21) mount: WRITEAHEAD journal mode
explicitly enabled in conf
2018-08-04 03:53:29.001967 7f3102aa87c0 -1 journal FileJournal::_open:
disabling aio for non-block journal.  Use journal_force_aio to force use of
aio anyway
2018-08-04 03:53:29.001981 7f3102aa87c0  1 journal _open
/var/lib/ceph/osd/ceph-21/journal fd 21: 2147483648 bytes, block size 4096
bytes, directio = 1, aio = 0
2018-08-04 03:53:29.002030 7f3102aa87c0  1 journal _open
/var/lib/ceph/osd/ceph-21/journal fd 21: 2147483648 bytes, block size 4096
bytes, directio = 1, aio = 0
2018-08-04 03:53:29.255501 7f3102aa87c0  0 
cls/hello/cls_hello.cc:271: loading cls_hello
2018-08-04 03:53:29.335038 7f3102aa87c0  0 osd.21 19579 crush map has
features 1107558400, adjusting msgr requires for clients
2018-08-04 03:53:29.335058 7f3102aa87c0  0 osd.21 19579 crush map has
features 1107558400, adjusting msgr requires for mons
2018-08-04 03:53:29.335062 7f3102aa87c0  0 osd.21 19579 crush map has
features 1107558400, adjusting msgr requires for osds
2018-08-04 03:53:29.335077 7f3102aa87c0  0 osd.21 19579 load_pgs
2018-08-04 03:54:00.275885 7f3102aa87c0 -1 osd/PG.cc: In function 'static
epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::bufferlist*)' thread 7f3102aa87c0 time 2018-08-04 03:54:00.274454
osd/PG.cc: 2577: FAILED assert(values.size() == 1)

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 2: (OSD::load_pgs()+0x1993) [0x655d13]
 3: (OSD::init()+0x1ba1) [0x65fff1]
 4: (main()+0x1ea7) [0x602fd7]
 5: (__libc_start_main()+0xed) [0x7f31008a276d]
 6: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
 -3406> 2018-08-04 03:53:24.680985 7f3102aa87c0  5 asok(0x3c40230)
register_command perfcounters_dump hook 0x3c1c010
 -3405> 2018-08-04 03:53:24.681040 7f3102aa87c0  5 asok(0x3c40230)
register_command 1 hook 0x3c1c010
 -3404> 2018-08-04 03:53:24.681046 7f3102aa87c0  5 asok(0x3c40230)
register_command perf dump hook 0x3c1c010
 -3403> 2018-08-04 03:53:24.681052 7f3102aa87c0  5 asok(0x3c40230)
register_command perfcounters_schema hook 0x3c1c010
 -3402> 2018-08-04 03:53:24.681055 7f3102aa87c0  5 asok(0x3c40230)
register_command 2 hook 0x3c1c010
 -3401> 2018-08-04 03:53:24.681058 7f3102aa87c0  5 asok(0x3c40230)
register_command perf schema hook 0x3c1c010
 -3400> 2018-08-04 03:53:24.681061 7f3102aa87c0  5 asok(0x3c40230)
register_command config show hook 0x3c1c010
 -3399> 2018-08-04 03:53:24.681064 7f3102aa87c0  5 asok(0x3c40230)
register_command config set hook 0x3c1c010
 -3398> 2018-08-04 03:53:24.681095 7f3102aa87c0  5 asok(0x3c40230)
register_command config get hook 0x3c1c010
 -3397> 2018-08-04 03:53:24.681101 7f3102aa87c0  5 asok(0x3c40230)
register_command log flush hook 0x3c1c010
 -3396> 2018-08-04 03:53:24.681108 7f3102aa87c0  5 asok(0x3c40230)
register_command log dump hook 0x3c1c010
 -3395> 2018-08-04 03:53:24.681116 7f3102aa87c0  5 asok(0x3c40230)
register_command log reopen hook 0x3c1c010
 -3394> 2018-08-04 03:53:24.689976 7f3102aa87c0  0 ceph version 0.80.4
(7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f), process ceph-osd, pid 51827
 -3393> 2018-08-04 03:53:24.727583 7f3102aa87c0  1 -- 192.168.0.4:0/0
learned my addr 192.168.0.4:0/0
 -3392> 2018-08-04 03:53:24.727613 7f3102aa87c0  1 accepter.accepter.bind
my_inst.addr is 192.168.0.4:6801/51827 need_addr=0
 -3391> 2018-08-04 03:53:24.727638 7f3102aa87c0  1 -- 192.168.1.3:0/0
learned my addr 192.168.1.3:0/0
 -3390> 2018-08-04 03:53:24.727652 7f3102aa87c0  1 accepter.accepter.bind
my_inst.addr is 192.168.1.3:6800/51827 need_addr=0
 -3389> 2018-08-04 03:53:24.727676 7f3102aa87c0  1 -- 192.168.1.3:0/0
learned my addr 192.168.1.3:0/0
 -3388> 2018-08-04 03:53:24.727687 7f3102aa87c0  1 accepter.accepter.bind
my_inst.addr is 192.168.1.3:6801/51827 need_addr=0
 -3387> 2018-08-04 03:53:24.727722 7f3102aa87c0  1 -- 192.168.0.4:0/0
learned my addr 192.168.0.4:0/0
 -3386> 2018-08-04 03:53:24.727732 7f3102aa87c0  1 accepter.accepter.bind
my_inst.addr is 192.168.0.4:6810/51827 need_addr=0
 -3385> 2018-08-04 03:53:24.727767 7f3102aa87c0  1 -- 192.168.0.4:0/0
learned my addr 192.168.0.4:0/0
 -3384> 2018-08-04 03:53:24.72 7f3102aa87c0  1 accepter.accepter.bind
my_inst.addr is 192.168.0.4:6811/51827 need_addr=0
 -3383> 2018-08-04 03:53:24.728871 7f3102aa87c0  1 finished
global_init_daemonize
 -3382> 2018-08-04 03:53:24.761702 7f3102aa87c0  5 asok(0x3c40230) init
/var/run/ceph/ceph-osd.21.asok
 -3381> 2018-08-04 03:53:24.761737 7f3102aa87c0  5 asok(0x3c40230)
bind_and_listen 

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Redmond
Hi,

You can export and import PG's using ceph_objectstore_tool, but if the osd
won't start you may have trouble exporting a PG.

It maybe useful to share the errors you get when trying to start the osd.

Thanks

On Fri, Aug 3, 2018 at 10:13 PM, Sean Patronis  wrote:

>
>
> Hi all.
>
> We have an issue with some down+peering PGs (I think), when I try to mount or 
> access data the requests are blocked:
>
> 114891/7509353 objects degraded (1.530%)
>  887 stale+active+clean
>1 peering
>   54 active+recovery_wait
>19609 active+clean
>   91 active+remapped+wait_backfill
>   10 active+recovering
>1 active+clean+scrubbing+deep
>9 down+peering
>   10 active+remapped+backfilling
> recovery io 67324 kB/s, 10 objects/s
>
> when I query one of these down+peering PGs, I can see the following:
>
>  "peering_blocked_by": [
> { "osd": 7,
>   "current_lost_at": 0,
>   "comment": "starting or marking this osd lost may let us 
> proceed"},
> { "osd": 21,
>   "current_lost_at": 0,
>   "comment": "starting or marking this osd lost may let us 
> proceed"}]},
> { "name": "Started",
>   "enter_time": "2018-08-01 07:06:16.806339"}],
>
>
>
> Both of these OSDs (7 and 21) will not come back up and in with ceph due
> to some errors, but I can mount the disks and read data off of them.  Can I
> manually move/copy these PGs off of these down and out OSDs and put them on
> a good OSD?
>
> This is an older ceph cluster running firefly.
>
> Thanks.
>
>
>
>
> This email message may contain privileged or confidential information, and
> is for the use of intended recipients only. Do not share with or forward to
> additional parties except as necessary to conduct the business for which
> this email (and attachments) was clearly intended. If you have received
> this message in error, please immediately advise the sender by reply email
> and then delete this message.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
Hi all.

We have an issue with some down+peering PGs (I think), when I try to
mount or access data the requests are blocked:

114891/7509353 objects degraded (1.530%)
 887 stale+active+clean
   1 peering
  54 active+recovery_wait
   19609 active+clean
  91 active+remapped+wait_backfill
  10 active+recovering
   1 active+clean+scrubbing+deep
   9 down+peering
  10 active+remapped+backfilling
recovery io 67324 kB/s, 10 objects/s

when I query one of these down+peering PGs, I can see the following:

 "peering_blocked_by": [
{ "osd": 7,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"},
{ "osd": 21,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"}]},
{ "name": "Started",
  "enter_time": "2018-08-01 07:06:16.806339"}],



Both of these OSDs (7 and 21) will not come back up and in with ceph due to
some errors, but I can mount the disks and read data off of them.  Can I
manually move/copy these PGs off of these down and out OSDs and put them on
a good OSD?

This is an older ceph cluster running firefly.

Thanks.

-- 
This email message may contain privileged or confidential information, and 
is for the use of intended recipients only. Do not share with or forward to 
additional parties except as necessary to conduct the business for which 
this email (and attachments) was clearly intended. If you have received 
this message in error, please immediately advise the sender by reply email 
and then delete this message.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com