[ceph-users] Scrub shutdown the OSD process
Hi, I have an OSD process which is regulary shutdown by scrub, if I well understand that trace : 0 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) ** in thread 7f5a8e3cc700 ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55) 1: /usr/bin/ceph-osd() [0x7a6289] 2: (()+0xeff0) [0x7f5aa08faff0] 3: (gsignal()+0x35) [0x7f5a9f3841b5] 4: (abort()+0x180) [0x7f5a9f386fc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5] 6: (()+0xcb166) [0x7f5a9fc17166] 7: (()+0xcb193) [0x7f5a9fc17193] 8: (()+0xcb28e) [0x7f5a9fc1728e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x8f9549] 10: (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038] 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18] 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9] 13: (PG::scrub()+0x145) [0x6c4e55] 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c] 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179] 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980] 17: (()+0x68ca) [0x7f5aa08f28ca] 18: (clone()+0x6d) [0x7f5a9f421b6d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -1/-1 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/osd.25.log --- end dump of recent events --- I tried to format that OSD, and re-inject it in the cluster, but after the recovery the problem still occur. Since I don't see any hard drive error in kernel logs, what can be the problem ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SL4500 as a storage machine
On 04/15/2013 03:25 AM, Stas Oskin wrote: Hi, Like I said, it's just my instinct. For a 180TB (raw) cluster you've got some tough choices to make. Some options might include: 1) high density and low cost by just stick a bunch of 3GB drives in 5 2U nodes and make sure you don't fill the cluster past ~75% (which you probably don't want to do from a performance perspective anyway). Just acknowledge that during failure/recovery there's going to be a ton of traffic flying around between the remaining 4 nodes. 2) Lower density (1-2GB) drives and more 2U nodes for higher performance but lower density and greater expense. 3) high eventual density and low eventual cost by buying 2U nodes that are only partially filled with 3TB drives with the assumption that the cluster is going to grow larger down the road. 4) 15 4-drive 1U nodes for less impact during recovery but greater expense and lower density. All of these options have benefits and downsides. For production cluster I'd want more than 5 nodes, but it wouldn't be the only consideration (cost, density, performance, etc all would play a part). To summarize, you recommend to focus on 2U servers, rather then 4U (HP, SuperMicro and so), and the best strategy seems to be start filling them with 3TB disks, spreading over the servers evenly. It's not so much about the chassis size, but how much capacity you lose (and data you have to re-replicate) on the rest of the cluster during an outage. A SL4500 chassis with 2 nodes is going to be different than a SL4500 chassis with 1 node even though in both cases the package is still 4U. You lose density with 2 nodes per chassis but double the overall number of nodes and potentially improve performance. If you'd like more in-depth recommendations about your cluster design, we (Inktank) do provide consulting service to look at your specific requirements and help you weigh all of these different factors when building your cluster. By the way, why 5 servers are so important? Why not 3 or 7 for the matter? It's not really. You just said 180TB, so with 2U servers and 3TB drives you can do that in 5 nodes. You could also do that in 15-20 1U nodes, or 2 36-drive 4U nodes. The fewer servers you have, the greater the impact of a server outage is. IE if you have 2 36 drive servers and you lose one, you've lost half the cluster capacity which is a big deal if you were already over 50% disk utilization. The trade-off is that 20 1U servers takes up a lot more space and costs more than 2 4U boxes. Mark Thanks again, Stas. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RBD snapshots are not «readable», because of LVM ?
Hi, I'm trying to map a RBD snapshot, which contains an LVM PV. I can do the «map» : rbd map hdd3copies/jason@20130415-065314 --id alg Then pvscan works : pvscan | grep rbd PV /dev/rbd58 VG vg-jason lvm2 [19,94 GiB / 1,44 GiB free] But enabling LV doesn't work : # vgchange -ay vg-jason device-mapper: reload ioctl failed: Argument invalide Internal error: Maps lock 13746176 unlock 13889536 device-mapper: reload ioctl failed: Argument invalide device-mapper: reload ioctl failed: Argument invalide device-mapper: reload ioctl failed: Argument invalide device-mapper: reload ioctl failed: Argument invalide device-mapper: reload ioctl failed: Argument invalide device-mapper: reload ioctl failed: Argument invalide 7 logical volume(s) in volume group vg-jason now active # blockdev --getsize64 /dev/mapper/vg--jason-* 0 0 0 0 0 0 0 Is it a problem from LVM, or I should not use snapshots like that ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -w question
Can you post the output of ceph osd tree? -Sam On Mon, Apr 15, 2013 at 9:52 AM, Jeppesen, Nelson nelson.jeppe...@disney.com wrote: Thanks for the help but how do I track down this issue? If data is inaccessible, that's a very bad thing given this is production. # ceph osd dump | grep pool pool 13 '.rgw.buckets' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 4800 pgp_num 4800 last_change 1198 owner 0 pool 14 '.rgw' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 242 owner 18446744073709551615 pool 15 '.rgw.gc' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 243 owner 18446744073709551615 pool 16 '.rgw.control' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 244 owner 18446744073709551615 pool 17 '.users.uid' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 246 owner 0 pool 18 '.users.email' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 248 owner 0 pool 19 '.users' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 250 owner 0 pool 20 '.usage' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 256 owner 18446744073709551615 pool 21 '.users.swift' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 1138 owner 0 Nelson Jeppesen Disney Technology Solutions and Services Phone 206-588-5001 -Original Message- From: Gregory Farnum [mailto:g...@inktank.com] Sent: Monday, April 15, 2013 9:34 AM To: Jeppesen, Nelson Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph -w question Incomplete means that there are fewer than the minimum copies of the placement group (by default, half of the requested size, rounded up). In general rebooting one node shouldn't do that unless you've changed your minimum size on the pool, and it does mean that data in those PGs is unaccessible. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Apr 15, 2013 at 9:01 AM, Jeppesen, Nelson nelson.jeppe...@disney.com wrote: When I reboot any node in my prod environment with no activity I see incomplete pgs. Is that a concern? Does that mean some data is unavailable? Thank you. # ceph -v ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) # ceph -w 2013-04-15 08:57:27.712065 mon.0 [INF] pgmap v585220: 4864 pgs: 4443 active+clean, 1 active+degraded, 420 incomplete; 3177 GB data, 6504 GB active+used, 38186 GB / 44691 GB avail; 252/8168154 degraded (0.003%) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Scrub shutdown the OSD process
On Mon, Apr 15, 2013 at 2:42 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Hi, I have an OSD process which is regulary shutdown by scrub, if I well understand that trace : 0 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) ** in thread 7f5a8e3cc700 ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55) 1: /usr/bin/ceph-osd() [0x7a6289] 2: (()+0xeff0) [0x7f5aa08faff0] 3: (gsignal()+0x35) [0x7f5a9f3841b5] 4: (abort()+0x180) [0x7f5a9f386fc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5] 6: (()+0xcb166) [0x7f5a9fc17166] 7: (()+0xcb193) [0x7f5a9fc17193] 8: (()+0xcb28e) [0x7f5a9fc1728e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x7c9) [0x8f9549] 10: (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038] 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18] 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9] 13: (PG::scrub()+0x145) [0x6c4e55] 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c] 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179] 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980] 17: (()+0x68ca) [0x7f5aa08f28ca] 18: (clone()+0x6d) [0x7f5a9f421b6d] NOTE: a copy of the executable, or `objdump -rdS executable` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 0/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 hadoop 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -1/-1 (syslog threshold) -1/-1 (stderr threshold) max_recent 1 max_new 1000 log_file /var/log/ceph/osd.25.log --- end dump of recent events --- I tried to format that OSD, and re-inject it in the cluster, but after the recovery the problem still occur. Since I don't see any hard drive error in kernel logs, what can be the problem ? Are you saying you saw this problem more than once, and so you completely wiped the OSD in question, then brought it back into the cluster, and now it's seeing this error again? Are any other OSDs experiencing this issue? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph -w question
Which host did you reboot to cause the incomplete pgs? Do you happen to have the output of ceph -s or, even better, ceph pg dump from the period with the incomplete pgs? From what i can see, the pgs should not have gone incomplete (at least, not for long). -Sam On Mon, Apr 15, 2013 at 10:17 AM, Jeppesen, Nelson nelson.jeppe...@disney.com wrote: OSD tree and OSD map inline. Thanks for the help. OSD 32 was removed due to a bad drive. # idweight type name up/down reweight -1 96 root default -3 48 rack rack_aa12 -2 4 host osdhost001 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 -4 4 host osdhost003 10 1 osd.10 up 1 11 1 osd.11 up 1 8 1 osd.8 up 1 9 1 osd.9 up 1 -5 4 host osdhost004 12 1 osd.12 up 1 13 1 osd.13 up 1 14 1 osd.14 up 1 15 1 osd.15 up 1 -6 4 host osdhost005 16 1 osd.16 up 1 17 1 osd.17 up 1 18 1 osd.18 up 1 19 1 osd.19 up 1 -7 4 host osdhost006 20 1 osd.20 up 1 21 1 osd.21 up 1 22 1 osd.22 up 1 23 1 osd.23 up 1 -8 4 host osdhost007 24 1 osd.24 up 1 25 1 osd.25 up 1 26 1 osd.26 up 1 27 1 osd.27 up 1 -9 4 host osdhost008 28 1 osd.28 up 1 29 1 osd.29 up 1 30 1 osd.30 up 1 31 1 osd.31 up 1 -10 4 host osdhost009 33 1 osd.33 up 1 34 1 osd.34 up 1 35 1 osd.35 up 1 -11 4 host osdhost010 36 1 osd.36 up 1 37 1 osd.37 up 1 38 1 osd.38 up 1 39 1 osd.39 up 1 -12 4 host osdhost002 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up 1 -13 4 host osdhost011 40 1 osd.40 up 1 41 1 osd.41 up 1 42 1 osd.42 up 1 43 1 osd.43 up 1 -14 4 host osdhost012 44 1 osd.44 up 1 45 1 osd.45 up 1 46 1 osd.46 up 1 47 1 osd.47 up 1 -27 48 rack rack_aa20 -15 4 host osdhost013 48 1 osd.48 up 1 49 1 osd.49 up 1 50 1 osd.50 up 1 51 1 osd.51 up 1 -16 4 host osdhost014 52 1 osd.52 up 1 53 1 osd.53 up 1 54 1 osd.54 up 1 55 1 osd.55 up 1 -17 4 host osdhost015 56 1 osd.56 up 1 57 1 osd.57 up 1 58 1 osd.58 up 1 59 1 osd.59 up 1 -18 4 host osdhost016 60 1 osd.60 up 1 61 1 osd.61 up 1 62 1
Re: [ceph-users] Scrub shutdown the OSD process
Le lundi 15 avril 2013 à 10:57 -0700, Gregory Farnum a écrit : On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit : Are you saying you saw this problem more than once, and so you completely wiped the OSD in question, then brought it back into the cluster, and now it's seeing this error again? Yes, it's exactly that. Are any other OSDs experiencing this issue? No, only this one have the problem. Did you run scrubs while this node was out of the cluster? If you wiped the data and this is recurring then this is apparently an issue with the cluster state, not just one node, and any other primary for the broken PG(s) should crash as well. Can you verify by taking this one down and then doing a full scrub? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Also note that no PG is marked corrupted. I have only PG in active +remapped or active+degraded. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade stale PG
Ping, Any ideas? A week later and it is still the same, 300 pgs stuck stale. I have seen a few references since recommending that there are no gaps in the OSD numbers. Mine has gaps. Might this the be cause of my problem. Darryl On 04/05/13 07:27, Darryl Bond wrote: I have a 3 node ceph cluster with 6 disks in each node. I upgraded from Bobtail 0.56.3 to 0.56.4 last night. Before I started the upgrade, ceph status reported HEALTH_OK. After upgrading and restarting the first node the status ended up at HEALTH_WARN 133 pgs stale; 133 pgs stuck stale After checking ceph health detail I checked a few random stuck pgs, all said # ceph pg 3.8 query pgid currently maps to no osd I decided to continue with the upgrade and after upgrading the second node there were 200 total stuck and after the 3rd 300. The cluster is now at 0.56.4 but still reports 300 pgs stuck stale after 12 hours # ceph status health HEALTH_WARN 300 pgs stale; 300 pgs stuck stale monmap e1: 3 mons at {a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0}, election epoch 8668, quorum 0,1,2 a,b,c osdmap e976: 18 osds: 18 up, 18 in pgmap v428986: 5148 pgs: 4848 active+clean, 300 stale+active+clean; 5643 GB data, 11305 GB used, 35831 GB / 47137 GB avail; 0B/s rd, 1136KB/s wr, 146op/s mdsmap e1: 0/0/1 up Strangely, the stuck pgs all start with 3 eg HEALTH_WARN 300 pgs stale; 300 pgs stuck stale pg 3.f is stuck stale for 41062.735988, current state stale+active+clean, last acting [11,13] pg 3.8 is stuck stale for 46905.375678, current state stale+active+clean, last acting [21,35] pg 3.9 is stuck stale for 46905.375680, current state stale+active+clean, last acting [21,14] pg 3.a is stuck stale for 46905.375681, current state stale+active+clean, last acting [21,24] pg 3.b is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,33] pg 3.4 is stuck stale for 46905.375683, current state stale+active+clean, last acting [21,33] pg 3.5 is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,34] pg 3.6 is stuck stale for 46905.375682, current state stale+active+clean, last acting [21,35] pg 3.7 is stuck stale for 46905.375680, current state stale+active+clean, last acting [21,13] pg 3.0 is stuck stale for 46905.375681, current state stale+active+clean, last acting [20,32] pg 3.1 is stuck stale for 46905.375683, current state stale+active+clean, last acting [20,35] pg 3.2 is stuck stale for 41965.928295, current state stale+active+clean, last acting [31,13] pg 3.3 is stuck stale for 46905.375685, current state stale+active+clean, last acting [20,34] pg 3.128 is stuck stale for 41965.928924, current state stale+active+clean, last acting [31,22] pg 3.129 is stuck stale for 41062.736776, current state stale+active+clean, last acting [11,32] pg 3.12a is stuck stale for 41062.736779, current state stale+active+clean, last acting [10,34] pg 3.12b is stuck stale for 46905.376313, current state stale+active+clean, last acting [21,15] pg 3.124 is stuck stale for 46905.376315, current state stale+active+clean, last acting [21,14] pg 3.125 is stuck stale for 41062.736787, current state stale+active+clean, last acting [11,34] pg 3.126 is stuck stale for 41062.736788, current state stale+active+clean, last acting [10,15] pg 3.127 is stuck stale for 41965.928942, current state stale+active+clean, last acting [31,35] pg 3.120 is stuck stale for 41965.928944, current state stale+active+clean, last acting [30,35] pg 3.121 is stuck stale for 41062.736795, current state stale+active+clean, last acting [10,33] pg 3.122 is stuck stale for 41062.736796, current state stale+active+clean, last acting [10,12] pg 3.123 is stuck stale for 41965.928918, current state stale+active+clean, last acting [30,13] pg 3.11c is stuck stale for 41965.928921, current state stale+active+clean, last acting [30,33] pg 3.11d is stuck stale for 41965.928921, current state stale+active+clean, last acting [30,24] pg 3.11e is stuck stale for 46905.376347, current state stale+active+clean, last acting [21,32] pg 3.11f is stuck stale for 41965.928927, current state stale+active+clean, last acting [31,33] pg 3.118 is stuck stale for 41062.736804, current state stale+active+clean, last acting [10,14] pg 3.119 is stuck stale for 41062.736804, current state stale+active+clean, last acting [10,15] etc [root@ceph1 ~]# ceph pg dump_stuck stale ok pg_statobjectsmipdegrunfbyteslog disklog statestate_stampvreportedupacting last_scrub scrub_stamplast_deep_scrubdeep_scrub_stamp 3.f2500010485760081628162 stale+active+clean2013-03-21 10:08:15.88839957'53 46'1250 [11,13][11,13]57'532013-03-21 10:08:15.8883470'0 2013-03-20 10:08:04.434172 3.8210008808038463146314 stale+active+clean2013-03-21 16:09:05.50131157'41 67'1544 [21,35][21,35]57'412013-03-21 11:56:20.557416
Re: [ceph-users] mon crash
Two is a strange choice for number of monitors; you really want an odd number. With two, if either one fails (or you have a network fault), the cluster is dead because there's no majority. That said, we certainly don't expect monitors to die when the network fault goes away. Searching the bug database reveals http://tracker.ceph.com/issues/4175, but this should have been included in v0.60. I'll ask Joao to have a look. Joao, can you have a look? :) On 04/15/2013 01:56 PM, Craig Lewis wrote: I'm doing a test of Ceph in two colo facilities. Since it's just a test, I only have 2 VMs running, one in each colo. Both VMs are runing mon, mds, a single osd, and the RADOS gw. Cephx is disabled. I'm testing if the latency between the two facilities (~20ms) is low enough that I can run a single Ceph cluster in both locations. If it doesn't work out, I'll run two independent Ceph clusters with manual replication. This weekend, the connection between the two locations was degraded. The link had 37% packet loss, for less than a minute. When the link returned to normal, the re-elected mon leader crashed. Is this a real bug, or did this happen because I'm only running 2 nodes? I'm trying to avoid bringing more nodes into this test. My VM infrastructure is pretty weak, and I'm afraid that more nodes would introduce more noise in the test. I saw this happen once before (the primary colo had a UPS failure, causing a switch reboot). The same process crashed, with the same stack trace. When that happened, I ran sudo service ceph restart on the machine with the crashed mon, and everything started up fine. I haven't restarted anything this time. I tried to recreate the problem by stopping and starting the VPN between the two locations, but that didn't trigger the crash. I have some more ideas on how to trigger, I'll continue trying today. arnulf@ceph0:~$ lsb_release -a Distributor ID:Ubuntu Description:Ubuntu 12.04.2 LTS Release:12.04 Codename:precise arnulf@ceph0:~$ uname -a Linux ceph0 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux arnulf@ceph0:~$ cat /etc/apt/sources.list.d/ceph.list deb http://ceph.com/debian-testing/ precise main arnulf@ceph0:~$ ceph -v ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) ceph-mon.log from the non-elected master, mon.b: 2013-04-13 07:57:39.445098 7fde958f4700 0 mon.b@1(peon).data_health(20) update_stats avail 85% total 17295768 used 1679152 avail 14738024 2013-04-13 07:58:35.150603 7fde950f3700 0 log [INF] : mon.b calling new monitor election 2013-04-13 07:58:35.150876 7fde950f3700 1 mon.b@1(electing).elector(20) init, last seen epoch 20 2013-04-13 07:58:39.445355 7fde958f4700 0 mon.b@1(electing).data_health(20) update_stats avail 85% total 17295768 used 1679152 avail 14738024 2013-04-13 07:58:40.192514 7fde958f4700 1 mon.b@1(electing).elector(21) init, last seen epoch 21 2013-04-13 07:58:43.748907 7fde93dee700 0 -- 192.168.22.62:6789/0 192.168.2.62:6789/0 pipe(0x2c56500 sd=25 :6789 s=2 pgs=108 cs=1 l=0).fault, initiating reconnect 2013-04-13 07:58:43.786209 7fde93ff0700 0 -- 192.168.22.62:6789/0 192.168.2.62:6789/0 pipe(0x2c56500 sd=8 :6789 s=1 pgs=108 cs=2 l=0).fault 2013-04-13 07:59:13.050245 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-13 07:59:13.050277 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 34 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-13 07:59:13.050285 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum ... ceph-mon.log from the elected master, mon.a: 2013-04-13 07:57:41.756844 7f162be82700 0 mon.a@0(leader).data_health(20) update_stats avail 84% total 17295768 used 1797312 avail 14619864 2013-04-13 07:58:35.210875 7f162b681700 0 log [INF] : mon.a calling new monitor election 2013-04-13 07:58:35.211081 7f162b681700 1 mon.a@0(electing).elector(20) init, last seen epoch 20 2013-04-13 07:58:40.270547 7f162be82700 1 mon.a@0(electing).elector(21) init, last seen epoch 21 2013-04-13 07:58:41.757032 7f162be82700 0 mon.a@0(electing).data_health(20) update_stats avail 84% total 17295768 used 1797312 avail 14619864 2013-04-13 07:58:43.441306 7f162b681700 0 log [INF] : mon.a@0 won leader election with quorum 0,1 2013-04-13 07:58:43.560319 7f162b681700 0 log [INF] : pgmap v1684: 632 pgs: 632 active+clean; 9982 bytes data, 2079 MB used, 100266 MB / 102346 MB avail; 0B/s rd, 0B/s wr, 0op/s 2013-04-13 07:58:43.561722 7f162b681700 -1 mon/PaxosService.cc: In function 'void PaxosService::propose_pending()' thread 7f162b681700 time 2013-04-13 07:58:43.560456 mon/PaxosService.cc: 127: FAILED assert(have_pending) ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) 1: (PaxosService::propose_pending()+0x46d)
Re: [ceph-users] mon crash
I'd bet that's 3495, it looks and sounds really, really similar. A lot of the devs are at a conference, but if you see Joao on IRC he'd know for sure. On 04/15/2013 04:56 PM, Craig Lewis wrote: I'm doing a test of Ceph in two colo facilities. Since it's just a test, I only have 2 VMs running, one in each colo. Both VMs are runing mon, mds, a single osd, and the RADOS gw. Cephx is disabled. I'm testing if the latency between the two facilities (~20ms) is low enough that I can run a single Ceph cluster in both locations. If it doesn't work out, I'll run two independent Ceph clusters with manual replication. This weekend, the connection between the two locations was degraded. The link had 37% packet loss, for less than a minute. When the link returned to normal, the re-elected mon leader crashed. Is this a real bug, or did this happen because I'm only running 2 nodes? I'm trying to avoid bringing more nodes into this test. My VM infrastructure is pretty weak, and I'm afraid that more nodes would introduce more noise in the test. I saw this happen once before (the primary colo had a UPS failure, causing a switch reboot). The same process crashed, with the same stack trace. When that happened, I ran sudo service ceph restart on the machine with the crashed mon, and everything started up fine. I haven't restarted anything this time. I tried to recreate the problem by stopping and starting the VPN between the two locations, but that didn't trigger the crash. I have some more ideas on how to trigger, I'll continue trying today. arnulf@ceph0:~$ lsb_release -a Distributor ID:Ubuntu Description:Ubuntu 12.04.2 LTS Release:12.04 Codename:precise arnulf@ceph0:~$ uname -a Linux ceph0 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux arnulf@ceph0:~$ cat /etc/apt/sources.list.d/ceph.list deb http://ceph.com/debian-testing/ precise main arnulf@ceph0:~$ ceph -v ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) ceph-mon.log from the non-elected master, mon.b: 2013-04-13 07:57:39.445098 7fde958f4700 0 mon.b@1(peon).data_health(20) update_stats avail 85% total 17295768 used 1679152 avail 14738024 2013-04-13 07:58:35.150603 7fde950f3700 0 log [INF] : mon.b calling new monitor election 2013-04-13 07:58:35.150876 7fde950f3700 1 mon.b@1(electing).elector(20) init, last seen epoch 20 2013-04-13 07:58:39.445355 7fde958f4700 0 mon.b@1(electing).data_health(20) update_stats avail 85% total 17295768 used 1679152 avail 14738024 2013-04-13 07:58:40.192514 7fde958f4700 1 mon.b@1(electing).elector(21) init, last seen epoch 21 2013-04-13 07:58:43.748907 7fde93dee700 0 -- 192.168.22.62:6789/0 192.168.2.62:6789/0 pipe(0x2c56500 sd=25 :6789 s=2 pgs=108 cs=1 l=0).fault, initiating reconnect 2013-04-13 07:58:43.786209 7fde93ff0700 0 -- 192.168.22.62:6789/0 192.168.2.62:6789/0 pipe(0x2c56500 sd=8 :6789 s=1 pgs=108 cs=2 l=0).fault 2013-04-13 07:59:13.050245 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-13 07:59:13.050277 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 34 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum 2013-04-13 07:59:13.050285 7fde958f4700 1 mon.b@1(probing) e1 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client elsewhere; we are not in quorum ... ceph-mon.log from the elected master, mon.a: 2013-04-13 07:57:41.756844 7f162be82700 0 mon.a@0(leader).data_health(20) update_stats avail 84% total 17295768 used 1797312 avail 14619864 2013-04-13 07:58:35.210875 7f162b681700 0 log [INF] : mon.a calling new monitor election 2013-04-13 07:58:35.211081 7f162b681700 1 mon.a@0(electing).elector(20) init, last seen epoch 20 2013-04-13 07:58:40.270547 7f162be82700 1 mon.a@0(electing).elector(21) init, last seen epoch 21 2013-04-13 07:58:41.757032 7f162be82700 0 mon.a@0(electing).data_health(20) update_stats avail 84% total 17295768 used 1797312 avail 14619864 2013-04-13 07:58:43.441306 7f162b681700 0 log [INF] : mon.a@0 won leader election with quorum 0,1 2013-04-13 07:58:43.560319 7f162b681700 0 log [INF] : pgmap v1684: 632 pgs: 632 active+clean; 9982 bytes data, 2079 MB used, 100266 MB / 102346 MB avail; 0B/s rd, 0B/s wr, 0op/s 2013-04-13 07:58:43.561722 7f162b681700 -1 mon/PaxosService.cc: In function 'void PaxosService::propose_pending()' thread 7f162b681700 time 2013-04-13 07:58:43.560456 mon/PaxosService.cc: 127: FAILED assert(have_pending) ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c) 1: (PaxosService::propose_pending()+0x46d) [0x4dee3d] 2: (MDSMonitor::tick()+0x1c62) [0x51cdd2] 3: (MDSMonitor::on_active()+0x1a) [0x512ada] 4: (PaxosService::_active()+0x31d) [0x4e067d] 5: (Context::complete(int)+0xa) [0x4b7b4a] 6:
[ceph-users] Ceph error: active+clean+scrubbing+deep
Dear all, I use Ceph Storage, Recently, I often get an error: mon.0 [INF] pgmap v277690: 640 pgs: 639 active+clean, 1 active+clean+scrubbing+deep; 14384 GB data, 14409 GB used, 90007 GB / 107 TB avail. It seems that it is not correct. I tried to restart. But not ok. It lows my system. I user ceph 0.56.4, kernel 3.8.6-1.el6.elrepo.x86_64 How to fix it ?! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com