[ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Olivier Bonvalet
Hi,

I have an OSD process which is regulary shutdown by scrub, if I well
understand that trace :

 0 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal (Aborted) 
**
 in thread 7f5a8e3cc700

 ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)
 1: /usr/bin/ceph-osd() [0x7a6289]
 2: (()+0xeff0) [0x7f5aa08faff0]
 3: (gsignal()+0x35) [0x7f5a9f3841b5]
 4: (abort()+0x180) [0x7f5a9f386fc0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5]
 6: (()+0xcb166) [0x7f5a9fc17166]
 7: (()+0xcb193) [0x7f5a9fc17193]
 8: (()+0xcb28e) [0x7f5a9fc1728e]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x7c9) [0x8f9549]
 10: (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]
 11: (PG::scrub_compare_maps()+0xeb8) [0x696c18]
 12: (PG::chunky_scrub()+0x2d9) [0x6c37f9]
 13: (PG::scrub()+0x145) [0x6c4e55]
 14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]
 15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]
 16: (ThreadPool::WorkThread::entry()+0x10) [0x817980]
 17: (()+0x68ca) [0x7f5aa08f28ca]
 18: (clone()+0x6d) [0x7f5a9f421b6d]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   0/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 hadoop
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -1/-1 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent 1
  max_new 1000
  log_file /var/log/ceph/osd.25.log
--- end dump of recent events ---


I tried to format that OSD, and re-inject it in the cluster, but after
the recovery the problem still occur.

Since I don't see any hard drive error in kernel logs, what can be the
problem ?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SL4500 as a storage machine

2013-04-15 Thread Mark Nelson

On 04/15/2013 03:25 AM, Stas Oskin wrote:

Hi,

Like I said, it's just my instinct.  For a 180TB (raw) cluster
you've got some tough choices to make.  Some options might include:

1) high density and low cost by just stick a bunch of 3GB drives in
5 2U nodes and make sure you don't fill the cluster past ~75% (which
you probably don't want to do from a performance perspective
anyway).  Just acknowledge that during failure/recovery there's
going to be a ton of traffic flying around between the remaining 4
nodes.

2) Lower density (1-2GB) drives and more 2U nodes for higher
performance but lower density and greater expense.

3) high eventual density and low eventual cost by buying 2U nodes
that are only partially filled with 3TB drives with the assumption
that the cluster is going to grow larger down the road.

4) 15 4-drive 1U nodes for less impact during recovery but greater
expense and lower density.

All of these options have benefits and downsides.  For production
cluster I'd want more than 5 nodes, but it wouldn't be the only
consideration (cost, density, performance, etc all would play a part).


  To summarize, you recommend to focus on 2U servers, rather then 4U
(HP, SuperMicro and so), and the best strategy seems to be start filling
them with 3TB disks, spreading over the servers evenly.


It's not so much about the chassis size, but how much capacity you lose 
(and data you have to re-replicate) on the rest of the cluster during an 
outage.  A SL4500 chassis with 2 nodes is going to be different than a 
SL4500 chassis with 1 node even though in both cases the package is 
still 4U.  You lose density with 2 nodes per chassis but double the 
overall number of nodes and potentially improve performance.  If you'd 
like more in-depth recommendations about your cluster design, we 
(Inktank) do provide consulting service to look at your specific 
requirements and help you weigh all of these different factors when 
building your cluster.




By the way, why 5 servers are so important? Why not 3 or 7 for the matter?


It's not really.  You just said 180TB, so with 2U servers and 3TB drives 
you can do that in 5 nodes.  You could also do that in 15-20 1U nodes, 
or 2 36-drive 4U nodes.  The fewer servers you have, the greater the 
impact of a server outage is.  IE if you have 2 36 drive servers and you 
lose one, you've lost half the cluster capacity which is a big deal if 
you were already over 50% disk utilization.  The trade-off is that 20 1U 
servers takes up a lot more space and costs more than 2 4U boxes.


Mark



Thanks again,
Stas.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RBD snapshots are not «readable», because of LVM ?

2013-04-15 Thread Olivier Bonvalet
Hi,

I'm trying to map a RBD snapshot, which contains an LVM PV.

I can do the «map» : 
rbd map hdd3copies/jason@20130415-065314 --id alg

Then pvscan works :
pvscan | grep rbd
  PV /dev/rbd58   VG vg-jason   lvm2 [19,94 GiB / 1,44 GiB free]

But enabling LV doesn't work :
# vgchange -ay vg-jason
  device-mapper: reload ioctl failed: Argument invalide
  Internal error: Maps lock 13746176  unlock 13889536
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  device-mapper: reload ioctl failed: Argument invalide
  7 logical volume(s) in volume group vg-jason now active

# blockdev --getsize64 /dev/mapper/vg--jason-*
0
0
0
0
0
0
0

Is it a problem from LVM, or I should not use snapshots like that ?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -w question

2013-04-15 Thread Samuel Just
Can you post the output of ceph osd tree?
-Sam

On Mon, Apr 15, 2013 at 9:52 AM, Jeppesen, Nelson
nelson.jeppe...@disney.com wrote:
 Thanks for the help but how do I track down this issue? If data is 
 inaccessible, that's a very bad thing given this is production.

 # ceph osd dump | grep pool
 pool 13 '.rgw.buckets' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 
 4800 pgp_num 4800 last_change 1198 owner 0
 pool 14 '.rgw' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 
 pgp_num 8 last_change 242 owner 18446744073709551615
 pool 15 '.rgw.gc' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 
 pgp_num 8 last_change 243 owner 18446744073709551615
 pool 16 '.rgw.control' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 
 8 pgp_num 8 last_change 244 owner 18446744073709551615
 pool 17 '.users.uid' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 
 pgp_num 8 last_change 246 owner 0
 pool 18 '.users.email' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 
 8 pgp_num 8 last_change 248 owner 0
 pool 19 '.users' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 
 pgp_num 8 last_change 250 owner 0
 pool 20 '.usage' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 
 pgp_num 8 last_change 256 owner 18446744073709551615
 pool 21 '.users.swift' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 
 8 pgp_num 8 last_change 1138 owner 0

 Nelson Jeppesen
Disney Technology Solutions and Services
Phone 206-588-5001

 -Original Message-
 From: Gregory Farnum [mailto:g...@inktank.com]
 Sent: Monday, April 15, 2013 9:34 AM
 To: Jeppesen, Nelson
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] ceph -w question

 Incomplete means that there are fewer than the minimum copies of the 
 placement group (by default, half of the requested size, rounded up).
 In general rebooting one node shouldn't do that unless you've changed your 
 minimum size on the pool, and it does mean that data in those PGs is 
 unaccessible.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Apr 15, 2013 at 9:01 AM, Jeppesen, Nelson 
 nelson.jeppe...@disney.com wrote:
 When I reboot any node in my prod environment with no activity I see
 incomplete pgs. Is that a concern? Does that mean some data is unavailable?
 Thank you.



 # ceph -v

 ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca)



 # ceph -w

 2013-04-15 08:57:27.712065 mon.0 [INF] pgmap v585220: 4864 pgs: 4443
 active+clean, 1 active+degraded, 420 incomplete; 3177 GB data, 6504 GB
 active+used,
 38186 GB / 44691 GB avail; 252/8168154 degraded (0.003%)


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Gregory Farnum
On Mon, Apr 15, 2013 at 2:42 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote:
 Hi,

 I have an OSD process which is regulary shutdown by scrub, if I well
 understand that trace :

  0 2013-04-15 09:29:53.708141 7f5a8e3cc700 -1 *** Caught signal 
 (Aborted) **
  in thread 7f5a8e3cc700

  ceph version 0.56.4-4-gd89ab0e (d89ab0ea6fa8d0961cad82f6a81eccbd3bbd3f55)
  1: /usr/bin/ceph-osd() [0x7a6289]
  2: (()+0xeff0) [0x7f5aa08faff0]
  3: (gsignal()+0x35) [0x7f5a9f3841b5]
  4: (abort()+0x180) [0x7f5a9f386fc0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f5a9fc18dc5]
  6: (()+0xcb166) [0x7f5a9fc17166]
  7: (()+0xcb193) [0x7f5a9fc17193]
  8: (()+0xcb28e) [0x7f5a9fc1728e]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
 const*)+0x7c9) [0x8f9549]
  10: (ReplicatedPG::_scrub(ScrubMap)+0x1a78) [0x57a038]
  11: (PG::scrub_compare_maps()+0xeb8) [0x696c18]
  12: (PG::chunky_scrub()+0x2d9) [0x6c37f9]
  13: (PG::scrub()+0x145) [0x6c4e55]
  14: (OSD::ScrubWQ::_process(PG*)+0xc) [0x64048c]
  15: (ThreadPool::worker(ThreadPool::WorkThread*)+0x879) [0x815179]
  16: (ThreadPool::WorkThread::entry()+0x10) [0x817980]
  17: (()+0x68ca) [0x7f5aa08f28ca]
  18: (clone()+0x6d) [0x7f5a9f421b6d]
  NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
 interpret this.

 --- logging levels ---
0/ 5 none
0/ 1 lockdep
0/ 1 context
1/ 1 crush
1/ 5 mds
1/ 5 mds_balancer
1/ 5 mds_locker
1/ 5 mds_log
1/ 5 mds_log_expire
1/ 5 mds_migrator
0/ 1 buffer
0/ 1 timer
0/ 1 filer
0/ 1 striper
0/ 1 objecter
0/ 5 rados
0/ 5 rbd
0/ 5 journaler
0/ 5 objectcacher
0/ 5 client
0/ 5 osd
0/ 5 optracker
0/ 5 objclass
1/ 3 filestore
1/ 3 journal
0/ 5 ms
1/ 5 mon
0/10 monc
0/ 5 paxos
0/ 5 tp
1/ 5 auth
1/ 5 crypto
1/ 1 finisher
1/ 5 heartbeatmap
1/ 5 perfcounter
1/ 5 rgw
1/ 5 hadoop
1/ 5 javaclient
1/ 5 asok
1/ 1 throttle
   -1/-1 (syslog threshold)
   -1/-1 (stderr threshold)
   max_recent 1
   max_new 1000
   log_file /var/log/ceph/osd.25.log
 --- end dump of recent events ---


 I tried to format that OSD, and re-inject it in the cluster, but after
 the recovery the problem still occur.

 Since I don't see any hard drive error in kernel logs, what can be the
 problem ?

Are you saying you saw this problem more than once, and so you
completely wiped the OSD in question, then brought it back into the
cluster, and now it's seeing this error again?
Are any other OSDs experiencing this issue?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -w question

2013-04-15 Thread Samuel Just
Which host did you reboot to cause the incomplete pgs?  Do you happen
to have the output of ceph -s or, even better, ceph pg dump from the
period with the incomplete pgs?  From what i can see, the pgs should
not have gone incomplete (at least, not for long).
-Sam

On Mon, Apr 15, 2013 at 10:17 AM, Jeppesen, Nelson
nelson.jeppe...@disney.com wrote:
 OSD tree and OSD map inline. Thanks for the help. OSD 32 was removed due to a 
 bad drive.

 # idweight  type name   up/down reweight
 -1  96  root default
 -3  48  rack rack_aa12
 -2  4   host osdhost001
 0   1   osd.0   up  1
 1   1   osd.1   up  1
 2   1   osd.2   up  1
 3   1   osd.3   up  1
 -4  4   host osdhost003
 10  1   osd.10  up  1
 11  1   osd.11  up  1
 8   1   osd.8   up  1
 9   1   osd.9   up  1
 -5  4   host osdhost004
 12  1   osd.12  up  1
 13  1   osd.13  up  1
 14  1   osd.14  up  1
 15  1   osd.15  up  1
 -6  4   host osdhost005
 16  1   osd.16  up  1
 17  1   osd.17  up  1
 18  1   osd.18  up  1
 19  1   osd.19  up  1
 -7  4   host osdhost006
 20  1   osd.20  up  1
 21  1   osd.21  up  1
 22  1   osd.22  up  1
 23  1   osd.23  up  1
 -8  4   host osdhost007
 24  1   osd.24  up  1
 25  1   osd.25  up  1
 26  1   osd.26  up  1
 27  1   osd.27  up  1
 -9  4   host osdhost008
 28  1   osd.28  up  1
 29  1   osd.29  up  1
 30  1   osd.30  up  1
 31  1   osd.31  up  1
 -10 4   host osdhost009
 33  1   osd.33  up  1
 34  1   osd.34  up  1
 35  1   osd.35  up  1
 -11 4   host osdhost010
 36  1   osd.36  up  1
 37  1   osd.37  up  1
 38  1   osd.38  up  1
 39  1   osd.39  up  1
 -12 4   host osdhost002
 4   1   osd.4   up  1
 5   1   osd.5   up  1
 6   1   osd.6   up  1
 7   1   osd.7   up  1
 -13 4   host osdhost011
 40  1   osd.40  up  1
 41  1   osd.41  up  1
 42  1   osd.42  up  1
 43  1   osd.43  up  1
 -14 4   host osdhost012
 44  1   osd.44  up  1
 45  1   osd.45  up  1
 46  1   osd.46  up  1
 47  1   osd.47  up  1
 -27 48  rack rack_aa20
 -15 4   host osdhost013
 48  1   osd.48  up  1
 49  1   osd.49  up  1
 50  1   osd.50  up  1
 51  1   osd.51  up  1
 -16 4   host osdhost014
 52  1   osd.52  up  1
 53  1   osd.53  up  1
 54  1   osd.54  up  1
 55  1   osd.55  up  1
 -17 4   host osdhost015
 56  1   osd.56  up  1
 57  1   osd.57  up  1
 58  1   osd.58  up  1
 59  1   osd.59  up  1
 -18 4   host osdhost016
 60  1   osd.60  up  1
 61  1   osd.61  up  1
 62  1   

Re: [ceph-users] Scrub shutdown the OSD process

2013-04-15 Thread Olivier Bonvalet
Le lundi 15 avril 2013 à 10:57 -0700, Gregory Farnum a écrit :
 On Mon, Apr 15, 2013 at 10:19 AM, Olivier Bonvalet ceph.l...@daevel.fr 
 wrote:
  Le lundi 15 avril 2013 à 10:16 -0700, Gregory Farnum a écrit :
  Are you saying you saw this problem more than once, and so you
  completely wiped the OSD in question, then brought it back into the
  cluster, and now it's seeing this error again?
 
  Yes, it's exactly that.
 
 
  Are any other OSDs experiencing this issue?
 
  No, only this one have the problem.
 
 Did you run scrubs while this node was out of the cluster? If you
 wiped the data and this is recurring then this is apparently an issue
 with the cluster state, not just one node, and any other primary for
 the broken PG(s) should crash as well. Can you verify by taking this
 one down and then doing a full scrub?
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

Also note that no PG is marked corrupted. I have only PG in active
+remapped or active+degraded.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade stale PG

2013-04-15 Thread Darryl Bond

Ping,
Any ideas? A week later and it is still the same, 300 pgs stuck stale.
I have seen a few references since recommending that there are no gaps 
in the OSD numbers. Mine has gaps. Might this the be cause of my problem.


Darryl

On 04/05/13 07:27, Darryl Bond wrote:

I have a 3 node ceph cluster with 6 disks in each node.
I upgraded from Bobtail 0.56.3 to 0.56.4 last night.
Before I started the upgrade, ceph status reported HEALTH_OK.
After upgrading and restarting the first node the status ended up at
HEALTH_WARN 133 pgs stale; 133 pgs stuck stale
After checking ceph health detail I checked a few random stuck pgs, all said
# ceph pg 3.8 query
pgid currently maps to no osd

I decided to continue with the upgrade and after upgrading the second
node there were 200 total stuck and after the 3rd 300.

The cluster is now at 0.56.4 but still reports 300 pgs stuck stale after
12 hours
# ceph status
 health HEALTH_WARN 300 pgs stale; 300 pgs stuck stale
 monmap e1: 3 mons at
{a=192.168.6.101:6789/0,b=192.168.6.102:6789/0,c=192.168.6.103:6789/0},
election epoch 8668, quorum 0,1,2 a,b,c
 osdmap e976: 18 osds: 18 up, 18 in
  pgmap v428986: 5148 pgs: 4848 active+clean, 300 stale+active+clean;
5643 GB data, 11305 GB used, 35831 GB / 47137 GB avail; 0B/s rd,
1136KB/s wr, 146op/s
 mdsmap e1: 0/0/1 up

Strangely, the stuck pgs all start with 3
eg
HEALTH_WARN 300 pgs stale; 300 pgs stuck stale
pg 3.f is stuck stale for 41062.735988, current state
stale+active+clean, last acting [11,13]
pg 3.8 is stuck stale for 46905.375678, current state
stale+active+clean, last acting [21,35]
pg 3.9 is stuck stale for 46905.375680, current state
stale+active+clean, last acting [21,14]
pg 3.a is stuck stale for 46905.375681, current state
stale+active+clean, last acting [21,24]
pg 3.b is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,33]
pg 3.4 is stuck stale for 46905.375683, current state
stale+active+clean, last acting [21,33]
pg 3.5 is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,34]
pg 3.6 is stuck stale for 46905.375682, current state
stale+active+clean, last acting [21,35]
pg 3.7 is stuck stale for 46905.375680, current state
stale+active+clean, last acting [21,13]
pg 3.0 is stuck stale for 46905.375681, current state
stale+active+clean, last acting [20,32]
pg 3.1 is stuck stale for 46905.375683, current state
stale+active+clean, last acting [20,35]
pg 3.2 is stuck stale for 41965.928295, current state
stale+active+clean, last acting [31,13]
pg 3.3 is stuck stale for 46905.375685, current state
stale+active+clean, last acting [20,34]
pg 3.128 is stuck stale for 41965.928924, current state
stale+active+clean, last acting [31,22]
pg 3.129 is stuck stale for 41062.736776, current state
stale+active+clean, last acting [11,32]
pg 3.12a is stuck stale for 41062.736779, current state
stale+active+clean, last acting [10,34]
pg 3.12b is stuck stale for 46905.376313, current state
stale+active+clean, last acting [21,15]
pg 3.124 is stuck stale for 46905.376315, current state
stale+active+clean, last acting [21,14]
pg 3.125 is stuck stale for 41062.736787, current state
stale+active+clean, last acting [11,34]
pg 3.126 is stuck stale for 41062.736788, current state
stale+active+clean, last acting [10,15]
pg 3.127 is stuck stale for 41965.928942, current state
stale+active+clean, last acting [31,35]
pg 3.120 is stuck stale for 41965.928944, current state
stale+active+clean, last acting [30,35]
pg 3.121 is stuck stale for 41062.736795, current state
stale+active+clean, last acting [10,33]
pg 3.122 is stuck stale for 41062.736796, current state
stale+active+clean, last acting [10,12]
pg 3.123 is stuck stale for 41965.928918, current state
stale+active+clean, last acting [30,13]
pg 3.11c is stuck stale for 41965.928921, current state
stale+active+clean, last acting [30,33]
pg 3.11d is stuck stale for 41965.928921, current state
stale+active+clean, last acting [30,24]
pg 3.11e is stuck stale for 46905.376347, current state
stale+active+clean, last acting [21,32]
pg 3.11f is stuck stale for 41965.928927, current state
stale+active+clean, last acting [31,33]
pg 3.118 is stuck stale for 41062.736804, current state
stale+active+clean, last acting [10,14]
pg 3.119 is stuck stale for 41062.736804, current state
stale+active+clean, last acting [10,15]
etc
[root@ceph1 ~]# ceph pg dump_stuck stale
ok
pg_statobjectsmipdegrunfbyteslog disklog
statestate_stampvreportedupacting last_scrub
scrub_stamplast_deep_scrubdeep_scrub_stamp
3.f2500010485760081628162
stale+active+clean2013-03-21 10:08:15.88839957'53 46'1250
[11,13][11,13]57'532013-03-21 10:08:15.8883470'0
2013-03-20 10:08:04.434172
3.8210008808038463146314
stale+active+clean2013-03-21 16:09:05.50131157'41 67'1544
[21,35][21,35]57'412013-03-21 11:56:20.557416

Re: [ceph-users] mon crash

2013-04-15 Thread Dan Mick
Two is a strange choice for number of monitors; you really want an odd 
number.  With two, if either one fails (or you have a network fault),

the cluster is dead because there's no majority.

That said, we certainly don't expect monitors to die when the network 
fault goes away.  Searching the bug database reveals 
http://tracker.ceph.com/issues/4175, but this should have been included 
in v0.60.


I'll ask Joao to have a look.  Joao, can you have a look?  :)



On 04/15/2013 01:56 PM, Craig Lewis wrote:


I'm doing a test of Ceph in two colo facilities.  Since it's just a
test, I only have 2 VMs running, one in each colo.  Both VMs are runing
mon, mds, a single osd, and the RADOS gw.  Cephx is disabled.  I'm
testing if the latency between the two facilities (~20ms) is low enough
that I can run a single Ceph cluster in both locations.  If it doesn't
work out, I'll run two independent Ceph clusters with manual replication.

This weekend, the connection between the two locations was degraded.
The link had 37% packet loss, for less than a minute. When the link
returned to normal, the re-elected mon leader crashed.

Is this a real bug, or did this happen because I'm only running 2
nodes?  I'm trying to avoid bringing more nodes into this test. My VM
infrastructure is pretty weak, and I'm afraid that more nodes would
introduce more noise in the test.

I saw this happen once before (the primary colo had a UPS failure,
causing a switch reboot).  The same process crashed, with the same stack
trace.  When that happened, I ran sudo service ceph restart on the
machine with the crashed mon, and everything started up fine.  I haven't
restarted anything this time.

I tried to recreate the problem by stopping and starting the VPN between
the two locations, but that didn't trigger the crash.  I have some more
ideas on how to trigger, I'll continue trying today.



arnulf@ceph0:~$ lsb_release -a
Distributor ID:Ubuntu
Description:Ubuntu 12.04.2 LTS
Release:12.04
Codename:precise

arnulf@ceph0:~$ uname -a
Linux ceph0 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21
UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

arnulf@ceph0:~$ cat /etc/apt/sources.list.d/ceph.list
deb http://ceph.com/debian-testing/ precise main

arnulf@ceph0:~$ ceph -v
ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)


ceph-mon.log from the non-elected master, mon.b:
2013-04-13 07:57:39.445098 7fde958f4700  0 mon.b@1(peon).data_health(20)
update_stats avail 85% total 17295768 used 1679152 avail 14738024
2013-04-13 07:58:35.150603 7fde950f3700  0 log [INF] : mon.b calling new
monitor election
2013-04-13 07:58:35.150876 7fde950f3700  1 mon.b@1(electing).elector(20)
init, last seen epoch 20
2013-04-13 07:58:39.445355 7fde958f4700  0
mon.b@1(electing).data_health(20) update_stats avail 85% total 17295768
used 1679152 avail 14738024
2013-04-13 07:58:40.192514 7fde958f4700  1 mon.b@1(electing).elector(21)
init, last seen epoch 21
2013-04-13 07:58:43.748907 7fde93dee700  0 -- 192.168.22.62:6789/0 
192.168.2.62:6789/0 pipe(0x2c56500 sd=25 :6789 s=2 pgs=108 cs=1
l=0).fault, initiating reconnect
2013-04-13 07:58:43.786209 7fde93ff0700  0 -- 192.168.22.62:6789/0 
192.168.2.62:6789/0 pipe(0x2c56500 sd=8 :6789 s=1 pgs=108 cs=2 l=0).fault
2013-04-13 07:59:13.050245 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
2013-04-13 07:59:13.050277 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 34 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
2013-04-13 07:59:13.050285 7fde958f4700  1 mon.b@1(probing) e1
discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
elsewhere; we are not in quorum
...

ceph-mon.log from the elected master, mon.a:
2013-04-13 07:57:41.756844 7f162be82700  0
mon.a@0(leader).data_health(20) update_stats avail 84% total 17295768
used 1797312 avail 14619864
2013-04-13 07:58:35.210875 7f162b681700  0 log [INF] : mon.a calling new
monitor election
2013-04-13 07:58:35.211081 7f162b681700  1 mon.a@0(electing).elector(20)
init, last seen epoch 20
2013-04-13 07:58:40.270547 7f162be82700  1 mon.a@0(electing).elector(21)
init, last seen epoch 21
2013-04-13 07:58:41.757032 7f162be82700  0
mon.a@0(electing).data_health(20) update_stats avail 84% total 17295768
used 1797312 avail 14619864
2013-04-13 07:58:43.441306 7f162b681700  0 log [INF] : mon.a@0 won
leader election with quorum 0,1
2013-04-13 07:58:43.560319 7f162b681700  0 log [INF] : pgmap v1684: 632
pgs: 632 active+clean; 9982 bytes data, 2079 MB used, 100266 MB / 102346
MB avail; 0B/s rd, 0B/s wr, 0op/s
2013-04-13 07:58:43.561722 7f162b681700 -1 mon/PaxosService.cc: In
function 'void PaxosService::propose_pending()' thread 7f162b681700 time
2013-04-13 07:58:43.560456
mon/PaxosService.cc: 127: FAILED assert(have_pending)

  ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
  1: (PaxosService::propose_pending()+0x46d) 

Re: [ceph-users] mon crash

2013-04-15 Thread Matthew Roy
I'd bet that's 3495, it looks and sounds really, really similar. A lot
of the devs are at a conference, but if you see Joao on IRC he'd know
for sure.


On 04/15/2013 04:56 PM, Craig Lewis wrote:
 
 I'm doing a test of Ceph in two colo facilities.  Since it's just a
 test, I only have 2 VMs running, one in each colo.  Both VMs are runing
 mon, mds, a single osd, and the RADOS gw.  Cephx is disabled.  I'm
 testing if the latency between the two facilities (~20ms) is low enough
 that I can run a single Ceph cluster in both locations.  If it doesn't
 work out, I'll run two independent Ceph clusters with manual replication.
 
 This weekend, the connection between the two locations was degraded. 
 The link had 37% packet loss, for less than a minute.  When the link
 returned to normal, the re-elected mon leader crashed.
 
 Is this a real bug, or did this happen because I'm only running 2
 nodes?  I'm trying to avoid bringing more nodes into this test.  My VM
 infrastructure is pretty weak, and I'm afraid that more nodes would
 introduce more noise in the test.
 
 I saw this happen once before (the primary colo had a UPS failure,
 causing a switch reboot).  The same process crashed, with the same stack
 trace.  When that happened, I ran sudo service ceph restart on the
 machine with the crashed mon, and everything started up fine.  I haven't
 restarted anything this time.
 
 I tried to recreate the problem by stopping and starting the VPN between
 the two locations, but that didn't trigger the crash.  I have some more
 ideas on how to trigger, I'll continue trying today.
 
 
 
 arnulf@ceph0:~$ lsb_release -a
 Distributor ID:Ubuntu
 Description:Ubuntu 12.04.2 LTS
 Release:12.04
 Codename:precise
 
 arnulf@ceph0:~$ uname -a
 Linux ceph0 3.5.0-27-generic #46~precise1-Ubuntu SMP Tue Mar 26 19:33:21
 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
 
 arnulf@ceph0:~$ cat /etc/apt/sources.list.d/ceph.list
 deb http://ceph.com/debian-testing/ precise main
 
 arnulf@ceph0:~$ ceph -v
 ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
 
 
 ceph-mon.log from the non-elected master, mon.b:
 2013-04-13 07:57:39.445098 7fde958f4700  0 mon.b@1(peon).data_health(20)
 update_stats avail 85% total 17295768 used 1679152 avail 14738024
 2013-04-13 07:58:35.150603 7fde950f3700  0 log [INF] : mon.b calling new
 monitor election
 2013-04-13 07:58:35.150876 7fde950f3700  1 mon.b@1(electing).elector(20)
 init, last seen epoch 20
 2013-04-13 07:58:39.445355 7fde958f4700  0
 mon.b@1(electing).data_health(20) update_stats avail 85% total 17295768
 used 1679152 avail 14738024
 2013-04-13 07:58:40.192514 7fde958f4700  1 mon.b@1(electing).elector(21)
 init, last seen epoch 21
 2013-04-13 07:58:43.748907 7fde93dee700  0 -- 192.168.22.62:6789/0 
 192.168.2.62:6789/0 pipe(0x2c56500 sd=25 :6789 s=2 pgs=108 cs=1
 l=0).fault, initiating reconnect
 2013-04-13 07:58:43.786209 7fde93ff0700  0 -- 192.168.22.62:6789/0 
 192.168.2.62:6789/0 pipe(0x2c56500 sd=8 :6789 s=1 pgs=108 cs=2 l=0).fault
 2013-04-13 07:59:13.050245 7fde958f4700  1 mon.b@1(probing) e1
 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
 elsewhere; we are not in quorum
 2013-04-13 07:59:13.050277 7fde958f4700  1 mon.b@1(probing) e1
 discarding message auth(proto 0 34 bytes epoch 1) v1 and sending client
 elsewhere; we are not in quorum
 2013-04-13 07:59:13.050285 7fde958f4700  1 mon.b@1(probing) e1
 discarding message auth(proto 0 26 bytes epoch 1) v1 and sending client
 elsewhere; we are not in quorum
 ...
 
 ceph-mon.log from the elected master, mon.a:
 2013-04-13 07:57:41.756844 7f162be82700  0
 mon.a@0(leader).data_health(20) update_stats avail 84% total 17295768
 used 1797312 avail 14619864
 2013-04-13 07:58:35.210875 7f162b681700  0 log [INF] : mon.a calling new
 monitor election
 2013-04-13 07:58:35.211081 7f162b681700  1 mon.a@0(electing).elector(20)
 init, last seen epoch 20
 2013-04-13 07:58:40.270547 7f162be82700  1 mon.a@0(electing).elector(21)
 init, last seen epoch 21
 2013-04-13 07:58:41.757032 7f162be82700  0
 mon.a@0(electing).data_health(20) update_stats avail 84% total 17295768
 used 1797312 avail 14619864
 2013-04-13 07:58:43.441306 7f162b681700  0 log [INF] : mon.a@0 won
 leader election with quorum 0,1
 2013-04-13 07:58:43.560319 7f162b681700  0 log [INF] : pgmap v1684: 632
 pgs: 632 active+clean; 9982 bytes data, 2079 MB used, 100266 MB / 102346
 MB avail; 0B/s rd, 0B/s wr, 0op/s
 2013-04-13 07:58:43.561722 7f162b681700 -1 mon/PaxosService.cc: In
 function 'void PaxosService::propose_pending()' thread 7f162b681700 time
 2013-04-13 07:58:43.560456
 mon/PaxosService.cc: 127: FAILED assert(have_pending)
 
  ceph version 0.60 (f26f7a39021dbf440c28d6375222e21c94fe8e5c)
  1: (PaxosService::propose_pending()+0x46d) [0x4dee3d]
  2: (MDSMonitor::tick()+0x1c62) [0x51cdd2]
  3: (MDSMonitor::on_active()+0x1a) [0x512ada]
  4: (PaxosService::_active()+0x31d) [0x4e067d]
  5: (Context::complete(int)+0xa) [0x4b7b4a]
  6: 

[ceph-users] Ceph error: active+clean+scrubbing+deep

2013-04-15 Thread kakito
Dear all,

I use Ceph Storage,

Recently, I often get an error: 

mon.0 [INF] pgmap v277690: 640 pgs: 639 active+clean, 1 
active+clean+scrubbing+deep; 14384 GB data, 14409 GB used, 90007 GB / 107 TB 
avail. 

It seems that it is not correct.

I tried to restart. But not ok.

It lows my system.

I user ceph 0.56.4, kernel 3.8.6-1.el6.elrepo.x86_64

How to fix it ?!



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com