Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Gregory Farnum
Has the OSD actually been detected as down yet?

You'll also need to set that min size on your existing pools (ceph
osd pool pool set min_size 1 or similar) to change their behavior;
the config option only takes effect for newly-created pools. (Thus the
default.)

On Thu, Mar 26, 2015 at 1:29 PM, Lee Revell rlrev...@gmail.com wrote:
 I added the osd pool default min size = 1 to test the behavior when 2 of 3
 OSDs are down, but the behavior is exactly the same as without it: when the
 2nd OSD is killed, all client writes start to block and these
 pipe.(stuff).fault messages begin:

 2015-03-26 16:08:50.775848 7fce177fe700  0 monclient: hunting for new mon
 2015-03-26 16:08:53.781133 7fce1c2f9700  0 -- 192.168.122.111:0/1011003 
 192.168.122.131:6789/0 pipe(0x7fce0c01d260 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fce0c01d4f0).fault
 2015-03-26 16:09:00.009092 7fce1c3fa700  0 -- 192.168.122.111:0/1011003 
 192.168.122.141:6789/0 pipe(0x7fce1802dab0 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fce1802dd40).fault
 2015-03-26 16:09:12.013147 7fce1c2f9700  0 -- 192.168.122.111:0/1011003 
 192.168.122.131:6789/0 pipe(0x7fce1802e740 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fce1802e9d0).fault
 2015-03-26 16:10:06.013113 7fce1c2f9700  0 -- 192.168.122.111:0/1011003 
 192.168.122.131:6789/0 pipe(0x7fce1802df80 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fce1801e600).fault
 2015-03-26 16:10:36.013166 7fce1c3fa700  0 -- 192.168.122.111:0/1011003 
 192.168.122.141:6789/0 pipe(0x7fce1802ebc0 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fce1802ee50).fault

 Here is my ceph.conf:

 [global]
 fsid = db460aa2-5129-4aaa-8b2e-43eac727124e
 mon_initial_members = ceph-node-1
 mon_host = 192.168.122.121
 auth_cluster_required = cephx
 auth_service_required = cephx
 auth_client_required = cephx
 filestore_xattr_use_omap = true
 osd pool default size = 3
 osd pool default min size = 1
 public network = 192.168.122.0/24


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Lee Revell
​Ah, thanks, got it, I wasn't thinking that mons and osds on the same node
isn't a likely real world thing.

You have to admit that pipe/fault log message is a bit cryptic.

Thanks,

Lee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Lee Revell
On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


I believe it has, however I can't directly check because ceph health
starts to hang when I down the second node.


 You'll also need to set that min size on your existing pools (ceph
 osd pool pool set min_size 1 or similar) to change their behavior;
 the config option only takes effect for newly-created pools. (Thus the
 default.)


I've done this, however the behavior is the same:

$ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do ceph
osd pool set $f min_size 1; done
set pool 0 min_size to 1
set pool 1 min_size to 1
set pool 2 min_size to 1
set pool 3 min_size to 1
set pool 4 min_size to 1
set pool 5 min_size to 1
set pool 6 min_size to 1
set pool 7 min_size to 1

$ ceph -w
cluster db460aa2-5129-4aaa-8b2e-43eac727124e
 health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
 monmap e3: 3 mons at {ceph-node-1=
192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0,ceph-node-3=192.168.122.141:6789/0},
election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
 mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
 osdmap e362: 3 osds: 2 up, 2 in
  pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
25329 MB used, 12649 MB / 40059 MB avail
 840 active+clean

2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail; 0 B/s
rd, 260 kB/s wr, 13 op/s
2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail; 0 B/s
rd, 943 kB/s wr, 38 op/s
2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail; 0 B/s
rd, 10699 kB/s wr, 621 op/s

this is where i kill the second OSD

2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new mon
2015-03-26 17:26:30.701099 7f4ec45f5700  0 -- 192.168.122.111:0/1007741 
192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7f4ec0023490).fault
2015-03-26 17:26:42.701154 7f4ec44f4700  0 -- 192.168.122.111:0/1007741 
192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1
c=0x7f4ec0025440).fault

And all writes block until I bring back an OSD.

Lee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Gregory Farnum
On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

Oh. You need to keep a quorum of your monitors running (just the
monitor processes, not of everything in the system) or nothing at all
is going to work. That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph
 osd pool pool set min_size 1 or similar) to change their behavior;
 the config option only takes effect for newly-created pools. (Thus the
 default.)


 I've done this, however the behavior is the same:

 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do ceph osd
 pool set $f min_size 1; done
 set pool 0 min_size to 1
 set pool 1 min_size to 1
 set pool 2 min_size to 1
 set pool 3 min_size to 1
 set pool 4 min_size to 1
 set pool 5 min_size to 1
 set pool 6 min_size to 1
 set pool 7 min_size to 1

 $ ceph -w
 cluster db460aa2-5129-4aaa-8b2e-43eac727124e
  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
  monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
  mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
  osdmap e362: 3 osds: 2 up, 2 in
   pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
 25329 MB used, 12649 MB / 40059 MB avail
  840 active+clean

 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail; 0 B/s
 rd, 260 kB/s wr, 13 op/s
 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail; 0 B/s
 rd, 943 kB/s wr, 38 op/s
 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
 active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail; 0 B/s
 rd, 10699 kB/s wr, 621 op/s

 this is where i kill the second OSD

 2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new mon
 2015-03-26 17:26:30.701099 7f4ec45f5700  0 -- 192.168.122.111:0/1007741 
 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0023490).fault
 2015-03-26 17:26:42.701154 7f4ec44f4700  0 -- 192.168.122.111:0/1007741 
 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0025440).fault

 And all writes block until I bring back an OSD.

 Lee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Steffen W Sørensen

 On 26/03/2015, at 23.36, Somnath Roy somnath@sandisk.com wrote:
 
 Got most portion of it, thanks !
 But, still not able to get when second node is down why with single monitor 
 in the cluster client is not able to connect ? 
 1 monitor can form a quorum and should be sufficient for a cluster to run.
To have quorum you need more than 50% of monitors, which isn’t possible with 
one out of two, since 1  (0.5*2 + 1) hence at least 3 monitors.

 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com] 
 Sent: Thursday, March 26, 2015 3:29 PM
 To: Somnath Roy
 Cc: Lee Revell; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down
 
 On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.
 
 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?
 
 A quorum is a strict majority of the total membership. 2 monitors can form a 
 quorum just fine if there are either 2 or 3 total membership.
 (As long as those two agree on every action, it cannot be lost.)
 
 We don't *recommend* configuring systems with an even number of monitors, 
 because it increases the number of total possible failures without increasing 
 the number of failures that can be tolerated. (3 monitors requires 2 in 
 quorum, 4 does too. Same for 5 and 6, 7 and 8, etc etc.)
 
 
 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?
 
 Well, the remaining OSD won't be able to process IO because it's lost its 
 peers, and it can't reach any monitors to do updates or get new maps. 
 (Monitors which are not in quorum will not allow clients to
 connect.)
 The clients will eventually stop serving IO if they know they can't reach a 
 monitor, although I don't remember exactly how that's triggered.
 
 In this particular case, though, the client probably just tried to do an op 
 against the dead osd, realized it couldn't, and tried to fetch a map from the 
 monitors. When that failed it went into search mode, which is what the logs 
 are showing you.
 -Greg
 
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs 
 down
 
 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:
 
 Has the OSD actually been detected as down yet?
 
 
 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.
 
 Oh. You need to keep a quorum of your monitors running (just the monitor 
 processes, not of everything in the system) or nothing at all is going to 
 work. That's how we prevent split brain issues.
 
 
 
 You'll also need to set that min size on your existing pools (ceph 
 osd pool pool set min_size 1 or similar) to change their 
 behavior; the config option only takes effect for newly-created 
 pools. (Thus the
 default.)
 
 
 I've done this, however the behavior is the same:
 
 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do 
 ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set 
 pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 
 1 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 
 min_size to 1 set pool 7 min_size to 1
 
 $ ceph -w
cluster db460aa2-5129-4aaa-8b2e-43eac727124e
 health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
 monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/
 0 ,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
 mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
 osdmap e362: 3 osds: 2 up, 2 in
  pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
25329 MB used, 12649 MB / 40059 MB avail
 840 active+clean
 
 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
 active+0 B/s
 rd, 260 kB/s wr, 13 op/s
 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
 active+0 B/s
 rd, 943 kB/s

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Gregory Farnum
On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.

 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?

A quorum is a strict majority of the total membership. 2 monitors can
form a quorum just fine if there are either 2 or 3 total membership.
(As long as those two agree on every action, it cannot be lost.)

We don't *recommend* configuring systems with an even number of
monitors, because it increases the number of total possible failures
without increasing the number of failures that can be tolerated. (3
monitors requires 2 in quorum, 4 does too. Same for 5 and 6, 7 and 8,
etc etc.)


 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?

Well, the remaining OSD won't be able to process IO because it's lost
its peers, and it can't reach any monitors to do updates or get new
maps. (Monitors which are not in quorum will not allow clients to
connect.)
The clients will eventually stop serving IO if they know they can't
reach a monitor, although I don't remember exactly how that's
triggered.

In this particular case, though, the client probably just tried to do
an op against the dead osd, realized it couldn't, and tried to fetch a
map from the monitors. When that failed it went into search mode,
which is what the logs are showing you.
-Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

 Oh. You need to keep a quorum of your monitors running (just the monitor 
 processes, not of everything in the system) or nothing at all is going to 
 work. That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph
 osd pool pool set min_size 1 or similar) to change their behavior;
 the config option only takes effect for newly-created pools. (Thus
 the
 default.)


 I've done this, however the behavior is the same:

 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do
 ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set
 pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 1
 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 min_size
 to 1 set pool 7 min_size to 1

 $ ceph -w
 cluster db460aa2-5129-4aaa-8b2e-43eac727124e
  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
  monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0
 ,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
  mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
  osdmap e362: 3 osds: 2 up, 2 in
   pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
 25329 MB used, 12649 MB / 40059 MB avail
  840 active+clean

 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
 active+0 B/s
 rd, 260 kB/s wr, 13 op/s
 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
 active+0 B/s
 rd, 943 kB/s wr, 38 op/s
 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
 active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail;
 active+0 B/s
 rd, 10699 kB/s wr, 621 op/s

 this is where i kill the second OSD

 2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new
 mon
 2015-03-26 17:26:30.701099 7f4ec45f5700  0 --
 192.168.122.111:0/1007741 
 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0023490).fault
 2015-03-26 17:26:42.701154 7f4ec44f4700  0 --
 192.168.122.111:0/1007741 
 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0025440).fault

 And all writes block until I bring back an OSD.

 Lee
 ___
 ceph-users mailing list
 ceph

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Gregory Farnum
On Thu, Mar 26, 2015 at 3:36 PM, Somnath Roy somnath@sandisk.com wrote:
 Got most portion of it, thanks !
 But, still not able to get when second node is down why with single monitor 
 in the cluster client is not able to connect ?
 1 monitor can form a quorum and should be sufficient for a cluster to run.

The whole point of the monitor cluster is to ensure a globally
consistent view of the cluster state that will never be reversed by a
different group of up nodes. If one monitor (out of three) could make
changes to the maps by itself, then there's nothing to prevent all
three monitors from staying up but getting a net split, and then each
issuing different versions of the osdmaps to whichever clients or OSDs
happen to be connected to them.

If you want to get down into the math proofs and things then the Paxos
papers do all the proofs. Or you can look at the CAP theorem about the
tradeoff between consistency and availability. The monitors are a
Paxos cluster and Ceph is a 100% consistent system.
-Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Thursday, March 26, 2015 3:29 PM
 To: Somnath Roy
 Cc: Lee Revell; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

 On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.

 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?

 A quorum is a strict majority of the total membership. 2 monitors can form a 
 quorum just fine if there are either 2 or 3 total membership.
 (As long as those two agree on every action, it cannot be lost.)

 We don't *recommend* configuring systems with an even number of monitors, 
 because it increases the number of total possible failures without increasing 
 the number of failures that can be tolerated. (3 monitors requires 2 in 
 quorum, 4 does too. Same for 5 and 6, 7 and 8, etc etc.)


 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?

 Well, the remaining OSD won't be able to process IO because it's lost its 
 peers, and it can't reach any monitors to do updates or get new maps. 
 (Monitors which are not in quorum will not allow clients to
 connect.)
 The clients will eventually stop serving IO if they know they can't reach a 
 monitor, although I don't remember exactly how that's triggered.

 In this particular case, though, the client probably just tried to do an op 
 against the dead osd, realized it couldn't, and tried to fetch a map from the 
 monitors. When that failed it went into search mode, which is what the logs 
 are showing you.
 -Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs
 down

 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

 Oh. You need to keep a quorum of your monitors running (just the monitor 
 processes, not of everything in the system) or nothing at all is going to 
 work. That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph
 osd pool pool set min_size 1 or similar) to change their
 behavior; the config option only takes effect for newly-created
 pools. (Thus the
 default.)


 I've done this, however the behavior is the same:

 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do
 ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set
 pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to
 1 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6
 min_size to 1 set pool 7 min_size to 1

 $ ceph -w
 cluster db460aa2-5129-4aaa-8b2e-43eac727124e
  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
  monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/
 0 ,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
  mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
  osdmap e362: 3 osds: 2 up, 2 in
   pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
 25329 MB

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Somnath Roy
Got most portion of it, thanks !
But, still not able to get when second node is down why with single monitor in 
the cluster client is not able to connect ? 
1 monitor can form a quorum and should be sufficient for a cluster to run.

Thanks  Regards
Somnath

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.com] 
Sent: Thursday, March 26, 2015 3:29 PM
To: Somnath Roy
Cc: Lee Revell; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.

 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?

A quorum is a strict majority of the total membership. 2 monitors can form a 
quorum just fine if there are either 2 or 3 total membership.
(As long as those two agree on every action, it cannot be lost.)

We don't *recommend* configuring systems with an even number of monitors, 
because it increases the number of total possible failures without increasing 
the number of failures that can be tolerated. (3 monitors requires 2 in quorum, 
4 does too. Same for 5 and 6, 7 and 8, etc etc.)


 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?

Well, the remaining OSD won't be able to process IO because it's lost its 
peers, and it can't reach any monitors to do updates or get new maps. (Monitors 
which are not in quorum will not allow clients to
connect.)
The clients will eventually stop serving IO if they know they can't reach a 
monitor, although I don't remember exactly how that's triggered.

In this particular case, though, the client probably just tried to do an op 
against the dead osd, realized it couldn't, and tried to fetch a map from the 
monitors. When that failed it went into search mode, which is what the logs are 
showing you.
-Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs 
 down

 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

 Oh. You need to keep a quorum of your monitors running (just the monitor 
 processes, not of everything in the system) or nothing at all is going to 
 work. That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph 
 osd pool pool set min_size 1 or similar) to change their 
 behavior; the config option only takes effect for newly-created 
 pools. (Thus the
 default.)


 I've done this, however the behavior is the same:

 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do 
 ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set 
 pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 
 1 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 
 min_size to 1 set pool 7 min_size to 1

 $ ceph -w
 cluster db460aa2-5129-4aaa-8b2e-43eac727124e
  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
  monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/
 0 ,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
  mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
  osdmap e362: 3 osds: 2 up, 2 in
   pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
 25329 MB used, 12649 MB / 40059 MB avail
  840 active+clean

 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
 active+0 B/s
 rd, 260 kB/s wr, 13 op/s
 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
 active+0 B/s
 rd, 943 kB/s wr, 38 op/s
 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
 active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail;
 active+0 B/s
 rd, 10699 kB/s wr, 621 op/s

 this is where i kill the second OSD

 2015-03-26 17:26:26.778461

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Somnath Roy
Greg,
I think you got me wrong. I am not saying each monitor of a group of 3 should 
be able to change the map. Here is the scenario.

1. Cluster up and running with 3 mons (quorum of 3), all fine.

2. One node (and mon) is down, quorum of 2 , still connecting.

3. 2 nodes (and 2 mons) are down, should be quorum of 1 now and client should 
still be able to connect. Isn't it ?

Cluster with single monitor is able to form a quorum and should be working 
fine. So, why not in case of point 3 ?
If this is the way Paxos works, should we say that in a cluster with say 3 
monitors it should be able to tolerate only one mon failure ?

Let me know if I am missing a point here.

Thanks  Regards
Somnath

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.com] 
Sent: Thursday, March 26, 2015 3:41 PM
To: Somnath Roy
Cc: Lee Revell; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

On Thu, Mar 26, 2015 at 3:36 PM, Somnath Roy somnath@sandisk.com wrote:
 Got most portion of it, thanks !
 But, still not able to get when second node is down why with single monitor 
 in the cluster client is not able to connect ?
 1 monitor can form a quorum and should be sufficient for a cluster to run.

The whole point of the monitor cluster is to ensure a globally consistent view 
of the cluster state that will never be reversed by a different group of up 
nodes. If one monitor (out of three) could make changes to the maps by itself, 
then there's nothing to prevent all three monitors from staying up but getting 
a net split, and then each issuing different versions of the osdmaps to 
whichever clients or OSDs happen to be connected to them.

If you want to get down into the math proofs and things then the Paxos papers 
do all the proofs. Or you can look at the CAP theorem about the tradeoff 
between consistency and availability. The monitors are a Paxos cluster and Ceph 
is a 100% consistent system.
-Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Thursday, March 26, 2015 3:29 PM
 To: Somnath Roy
 Cc: Lee Revell; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs 
 down

 On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.

 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?

 A quorum is a strict majority of the total membership. 2 monitors can form a 
 quorum just fine if there are either 2 or 3 total membership.
 (As long as those two agree on every action, it cannot be lost.)

 We don't *recommend* configuring systems with an even number of 
 monitors, because it increases the number of total possible failures 
 without increasing the number of failures that can be tolerated. (3 
 monitors requires 2 in quorum, 4 does too. Same for 5 and 6, 7 and 8, 
 etc etc.)


 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?

 Well, the remaining OSD won't be able to process IO because it's lost 
 its peers, and it can't reach any monitors to do updates or get new 
 maps. (Monitors which are not in quorum will not allow clients to
 connect.)
 The clients will eventually stop serving IO if they know they can't reach a 
 monitor, although I don't remember exactly how that's triggered.

 In this particular case, though, the client probably just tried to do an op 
 against the dead osd, realized it couldn't, and tried to fetch a map from the 
 monitors. When that failed it went into search mode, which is what the logs 
 are showing you.
 -Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs 
 down

 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

 Oh. You need to keep a quorum of your monitors running (just the monitor 
 processes, not of everything in the system) or nothing at all is going to 
 work. That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph 
 osd pool pool set min_size 1 or similar) to change

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Gregory Farnum
On Thu, Mar 26, 2015 at 3:54 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 I think you got me wrong. I am not saying each monitor of a group of 3 should 
 be able to change the map. Here is the scenario.

 1. Cluster up and running with 3 mons (quorum of 3), all fine.

 2. One node (and mon) is down, quorum of 2 , still connecting.

 3. 2 nodes (and 2 mons) are down, should be quorum of 1 now and client should 
 still be able to connect. Isn't it ?

No. The monitors can't tell the difference between dead monitors, and
monitors they can't reach over the network. So they say there are
three monitors in my map; therefore it requires two to make any
change. That's the case regardless of whether all of them are
running, or only one.


 Cluster with single monitor is able to form a quorum and should be working 
 fine. So, why not in case of point 3 ?
 If this is the way Paxos works, should we say that in a cluster with say 3 
 monitors it should be able to tolerate only one mon failure ?

Yes, that is the case.


 Let me know if I am missing a point here.

 Thanks  Regards
 Somnath

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Thursday, March 26, 2015 3:41 PM
 To: Somnath Roy
 Cc: Lee Revell; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

 On Thu, Mar 26, 2015 at 3:36 PM, Somnath Roy somnath@sandisk.com wrote:
 Got most portion of it, thanks !
 But, still not able to get when second node is down why with single monitor 
 in the cluster client is not able to connect ?
 1 monitor can form a quorum and should be sufficient for a cluster to run.

 The whole point of the monitor cluster is to ensure a globally consistent 
 view of the cluster state that will never be reversed by a different group of 
 up nodes. If one monitor (out of three) could make changes to the maps by 
 itself, then there's nothing to prevent all three monitors from staying up 
 but getting a net split, and then each issuing different versions of the 
 osdmaps to whichever clients or OSDs happen to be connected to them.

 If you want to get down into the math proofs and things then the Paxos papers 
 do all the proofs. Or you can look at the CAP theorem about the tradeoff 
 between consistency and availability. The monitors are a Paxos cluster and 
 Ceph is a 100% consistent system.
 -Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: Gregory Farnum [mailto:g...@gregs42.com]
 Sent: Thursday, March 26, 2015 3:29 PM
 To: Somnath Roy
 Cc: Lee Revell; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs
 down

 On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote:
 Greg,
 Couple of dumb question may be.

 1. If you see , the clients are connecting fine with two monitors in the 
 cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 
 monitor  (which is I guess happening after making 2 nodes down) it is not 
 able to connect ?

 A quorum is a strict majority of the total membership. 2 monitors can form a 
 quorum just fine if there are either 2 or 3 total membership.
 (As long as those two agree on every action, it cannot be lost.)

 We don't *recommend* configuring systems with an even number of
 monitors, because it increases the number of total possible failures
 without increasing the number of failures that can be tolerated. (3
 monitors requires 2 in quorum, 4 does too. Same for 5 and 6, 7 and 8,
 etc etc.)


 2. Also, my understanding is while IO is going on *no* monitor interaction 
 will be on that path, so, why the client io will be stopped because the 
 monitor quorum is not there ? If the min_size =1 is properly set it should 
 able to serve IO as long as 1 OSD (node) is up, isn't it ?

 Well, the remaining OSD won't be able to process IO because it's lost
 its peers, and it can't reach any monitors to do updates or get new
 maps. (Monitors which are not in quorum will not allow clients to
 connect.)
 The clients will eventually stop serving IO if they know they can't reach a 
 monitor, although I don't remember exactly how that's triggered.

 In this particular case, though, the client probably just tried to do an op 
 against the dead osd, realized it couldn't, and tried to fetch a map from 
 the monitors. When that failed it went into search mode, which is what the 
 logs are showing you.
 -Greg


 Thanks  Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
 Of Gregory Farnum
 Sent: Thursday, March 26, 2015 2:40 PM
 To: Lee Revell
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs
 down

 On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly

Re: [ceph-users] All client writes block when 2 of 3 OSDs down

2015-03-26 Thread Somnath Roy
Greg,
Couple of dumb question may be.

1. If you see , the clients are connecting fine with two monitors in the 
cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 monitor 
 (which is I guess happening after making 2 nodes down) it is not able to 
connect ?

2. Also, my understanding is while IO is going on *no* monitor interaction will 
be on that path, so, why the client io will be stopped because the monitor 
quorum is not there ? If the min_size =1 is properly set it should able to 
serve IO as long as 1 OSD (node) is up, isn't it ?

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Gregory Farnum
Sent: Thursday, March 26, 2015 2:40 PM
To: Lee Revell
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down

On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote:
 On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote:

 Has the OSD actually been detected as down yet?


 I believe it has, however I can't directly check because ceph health
 starts to hang when I down the second node.

Oh. You need to keep a quorum of your monitors running (just the monitor 
processes, not of everything in the system) or nothing at all is going to work. 
That's how we prevent split brain issues.



 You'll also need to set that min size on your existing pools (ceph
 osd pool pool set min_size 1 or similar) to change their behavior;
 the config option only takes effect for newly-created pools. (Thus
 the
 default.)


 I've done this, however the behavior is the same:

 $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do
 ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set
 pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 1
 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 min_size
 to 1 set pool 7 min_size to 1

 $ ceph -w
 cluster db460aa2-5129-4aaa-8b2e-43eac727124e
  health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2
  monmap e3: 3 mons at
 {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/0
 ,ceph-node-3=192.168.122.141:6789/0},
 election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2
  mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active}
  osdmap e362: 3 osds: 2 up, 2 in
   pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects
 25329 MB used, 12649 MB / 40059 MB avail
  840 active+clean

 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail
 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840
 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail;
 active+0 B/s
 rd, 260 kB/s wr, 13 op/s
 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840
 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail;
 active+0 B/s
 rd, 943 kB/s wr, 38 op/s
 2015-03-26 17:26:01.058167 mon.0 [INF] pgmap v5916: 840 pgs: 840
 active+clean; 7441 MB data, 25335 MB used, 12643 MB / 40059 MB avail;
 active+0 B/s
 rd, 10699 kB/s wr, 621 op/s

 this is where i kill the second OSD

 2015-03-26 17:26:26.778461 7f4ebeffd700  0 monclient: hunting for new
 mon
 2015-03-26 17:26:30.701099 7f4ec45f5700  0 --
 192.168.122.111:0/1007741 
 192.168.122.141:6789/0 pipe(0x7f4ec0023200 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0023490).fault
 2015-03-26 17:26:42.701154 7f4ec44f4700  0 --
 192.168.122.111:0/1007741 
 192.168.122.131:6789/0 pipe(0x7f4ec00251b0 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f4ec0025440).fault

 And all writes block until I bring back an OSD.

 Lee
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com