Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
89/16452/16452) [21,16] r=0 lpr=0
crt=16449'19948380 lcod 0'0 mlcod 0'0 inactive] enter Reset
-3> 2018-08-04 03:54:00.195526 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab(unlocked)] enter Initial
-2> 2018-08-04 03:54:00.254812 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab( v 19579'1116897 (18464'1113896,19579'1116897] local-les=13378
n=217 ec=5 les/c 13378/13378 13286/13377/13377) [4,21] r=1 lpr=0
pi=12038-13376/4 crt=709'35663 lcod 0'0 inactive NOTIFY] exit Initial
0.059287 0 0.00
-1> 2018-08-04 03:54:00.254842 7f3102aa87c0  5 osd.21 pg_epoch: 19579
pg[6.ab( v 19579'1116897 (18464'1113896,19579'1116897] local-les=13378
n=217 ec=5 les/c 13378/13378 13286/13377/13377) [4,21] r=1 lpr=0
pi=12038-13376/4 crt=709'35663 lcod 0'0 inactive NOTIFY] enter Reset
 0> 2018-08-04 03:54:00.275885 7f3102aa87c0 -1 osd/PG.cc: In function
'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::bufferlist*)' thread 7f3102aa87c0 time 2018-08-04 03:54:00.274454
osd/PG.cc: 2577: FAILED assert(values.size() == 1)

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 2: (OSD::load_pgs()+0x1993) [0x655d13]
 3: (OSD::init()+0x1ba1) [0x65fff1]
 4: (main()+0x1ea7) [0x602fd7]
 5: (__libc_start_main()+0xed) [0x7f31008a276d]
 6: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/ceph-osd.21.log
--- end dump of recent events ---
2018-08-04 03:54:00.314451 7f3102aa87c0 -1 *** Caught signal (Aborted) **
 in thread 7f3102aa87c0

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /usr/bin/ceph-osd() [0x98aa3a]
 2: (()+0xfcb0) [0x7f3101cd0cb0]
 3: (gsignal()+0x35) [0x7f31008b70d5]
 4: (abort()+0x17b) [0x7f31008ba83b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f310120869d]
 6: (()+0xb5846) [0x7f3101206846]
 7: (()+0xb5873) [0x7f3101206873]
 8: (()+0xb596e) [0x7f310120696e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0xa6adcf]
 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 11: (OSD::load_pgs()+0x1993) [0x655d13]
 12: (OSD::init()+0x1ba1) [0x65fff1]
 13: (main()+0x1ea7) [0x602fd7]
 14: (__libc_start_main()+0xed) [0x7f31008a276d]
 15: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- begin dump of recent events ---
 0> 2018-08-04 03:54:00.314451 7f3102aa87c0 -1 *** Caught signal
(Aborted) **
 in thread 7f3102aa87c0

 ceph version 0.80.4 (7c241cfaa6c8c068bc9da8578ca00b9f4fc7567f)
 1: /usr/bin/ceph-osd() [0x98aa3a]
 2: (()+0xfcb0) [0x7f3101cd0cb0]
 3: (gsignal()+0x35) [0x7f31008b70d5]
 4: (abort()+0x17b) [0x7f31008ba83b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f310120869d]
 6: (()+0xb5846) [0x7f3101206846]
 7: (()+0xb5873) [0x7f3101206873]
 8: (()+0xb596e) [0x7f310120696e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1df) [0xa6adcf]
 10: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
ceph::buffer::list*)+0x578) [0x741a18]
 11: (OSD::load_pgs()+0x1993) [0x655d13]
 12: (OSD::init()+0x1ba1) [0x65fff1]
 13: (main()+0x1ea7) [0x602fd7]
 14: (__libc_start_main()+0xed) [0x7f31008a276d]
 15: /usr/bin/ceph-osd() [0x607119]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1000

[ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-03 Thread Sean Patronis
Hi all.

We have an issue with some down+peering PGs (I think), when I try to
mount or access data the requests are blocked:

114891/7509353 objects degraded (1.530%)
 887 stale+active+clean
   1 peering
  54 active+recovery_wait
   19609 active+clean
  91 active+remapped+wait_backfill
  10 active+recovering
   1 active+clean+scrubbing+deep
   9 down+peering
  10 active+remapped+backfilling
recovery io 67324 kB/s, 10 objects/s

when I query one of these down+peering PGs, I can see the following:

 "peering_blocked_by": [
{ "osd": 7,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"},
{ "osd": 21,
  "current_lost_at": 0,
  "comment": "starting or marking this osd lost may
let us proceed"}]},
{ "name": "Started",
  "enter_time": "2018-08-01 07:06:16.806339"}],



Both of these OSDs (7 and 21) will not come back up and in with ceph due to
some errors, but I can mount the disks and read data off of them.  Can I
manually move/copy these PGs off of these down and out OSDs and put them on
a good OSD?

This is an older ceph cluster running firefly.

Thanks.

-- 
This email message may contain privileged or confidential information, and 
is for the use of intended recipients only. Do not share with or forward to 
additional parties except as necessary to conduct the business for which 
this email (and attachments) was clearly intended. If you have received 
this message in error, please immediately advise the sender by reply email 
and then delete this message.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Remove PGs that are stuck-unclean-stale

2013-12-03 Thread Sean Patronis

Background

New ceph setup with 3 nodes and a mon running on each node.  OSDs are 
split up across the nodes.  This is a brand new cluster and no data has 
been added.


I zapped osd.0 and re-added it and now I am stuck with:

health HEALTH_WARN 12 pgs degraded; 12 pgs stale; 12 pgs stuck stale; 12 
pgs stuck unclean



What is the best way to clean this up so everything reads hunky-dory?  I 
do not care about data loss since there is no data in the cluster yet.


Thanks.

--Sean

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com