Re: [ceph-users] ceph-mgr fails to restart after upgrade to mimic

2019-01-04 Thread Steve Taylor
I can't think of why the upgrade would have broken your keys, but have you 
verified that the mons still have the correct mgr keys configured? 'ceph auth 
ls' should list an mgr. key for each mgr with a key matching the contents 
of /var/lib/ceph/mgr/-/keyring on the mgr host and some caps 
that should minimally include '[mon] allow profile mgr' and '[osd] allow *' I 
would think.

Again, it seems unlikely that this would have broken with the upgrade if it had 
been working previously, but if you're seeing auth errors it might be something 
to check out.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Fri, 2019-01-04 at 07:26 -0700, Randall Smith wrote:
Greetings,

I'm upgrading my cluster from luminous to mimic. I've upgraded my monitors and 
am attempting to upgrade the mgrs. Unfortunately, after an upgrade the mgr 
daemon exits immediately with error code 1.

I've tried running ceph-mgr in debug mode to try to see what's happening but 
the output (below) is a bit cryptic for me. It looks like authentication might 
be failing but it was working prior to the upgrade.

I do have "auth supported = cephx" in the global section of ceph.conf.

What do I need to do to fix this?

Thanks.

/usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup ceph -d 
--debug_ms 5
2019-01-04 07:01:38.457 7f808f83f700  2 Event(0x30c42c0 nevent=5000 
time_id=1).set_owner idx=0 owner=140190140331776
2019-01-04 07:01:38.457 7f808f03e700  2 Event(0x30c4500 nevent=5000 
time_id=1).set_owner idx=1 owner=140190131939072
2019-01-04 07:01:38.457 7f808e83d700  2 Event(0x30c4e00 nevent=5000 
time_id=1).set_owner idx=2 owner=140190123546368
2019-01-04 07:01:38.457 7f809dd5b380  1  Processor -- start
2019-01-04 07:01:38.477 7f809dd5b380  1 -- - start start
2019-01-04 07:01:38.481 7f809dd5b380  1 -- - --> 
192.168.253.147:6789/0<http://192.168.253.147:6789/0> -- auth(proto 0 26 bytes 
epoch 0) v1 -- 0x32a6780 con 0
2019-01-04 07:01:38.481 7f809dd5b380  1 -- - --> 
192.168.253.148:6789/0<http://192.168.253.148:6789/0> -- auth(proto 0 26 bytes 
epoch 0) v1 -- 0x32a6a00 con 0
2019-01-04 07:01:38.481 7f808e83d700  1 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> learned_addr 
learned my addr 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487>
2019-01-04 07:01:38.481 7f808e83d700  2 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.148:6789/0<http://192.168.253.148:6789/0> conn(0x332d500 :-1 
s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_a$
ked_seq 0 vs out_seq 0
2019-01-04 07:01:38.481 7f808f03e700  2 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.147:6789/0<http://192.168.253.147:6789/0> conn(0x332ce00 :-1 
s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_a$
ked_seq 0 vs out_seq 0
2019-01-04 07:01:38.481 7f808f03e700  5 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.147:6789/0<http://192.168.253.147:6789/0> conn(0x332ce00 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 seq
1 0x30c5440 mon_map magic: 0 v1
2019-01-04 07:01:38.481 7f808e83d700  5 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.148:6789/0<http://192.168.253.148:6789/0> conn(0x332d500 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 seq
1 0x30c5680 mon_map magic: 0 v1
2019-01-04 07:01:38.481 7f808f03e700  5 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.147:6789/0<http://192.168.253.147:6789/0> conn(0x332ce00 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 seq
2 0x32a6780 auth_reply(proto 2 0 (0) Success) v1
2019-01-04 07:01:38.481 7f808e83d700  5 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> >> 
192.168.253.148:6789/0<http://192.168.253.148:6789/0> conn(0x332d500 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 seq
2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v1
2019-01-04 07:01:38.481 7f808e03c700  1 -- 
192.168.253.148:0/1359135487<http://192.168.253.148:0/1359135487> <== mon.1 
192.168.253.147:6789/0<http://192.168.253.147:6

Re: [ceph-users] Balancer module not balancing perfectly

2018-11-06 Thread Steve Taylor
I ended up balancing my osdmap myself offline to figure out why the balancer 
couldn't do better. I had similar issues with osdmaptool, which of course is 
what I expected, but it's a lot easier to run osdmaptool in a debugger to see 
what's happening. When I dug into the upmap code I discovered that my problem 
was due to the way that code balances OSDs. In my case the average PG count per 
OSD is 56.882, so as soon as any OSD had 56 PGs it wouldn't get any more no 
matter what I used as my max deviation. I got into a state where each OSD had 
56-61 PGs, and the upmap code wouldn't do any better because there were no 
"underfull" OSDs onto which to move PGs.

I made some changes to the osdmap code to insure the computed "overfull" and 
"underfull" OSD lists were the same size even if the least or most full OSDs 
were within the expected deviation in order to allow those outside of the 
expected deviation some relief, and it worked nicely. I have two independent, 
production pools that were both in this state, and now every OSD across both 
pools has 56 or 57 PGs as expected.

I intend to put together a pull request to push this upstream. I haven't 
reviewed the balancer module code to see how it's doing things, but assuming it 
uses osdmaptool or the same upmap code as osdmaptool this should also improve 
the balancer module.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Tue, 2018-11-06 at 12:23 +0700, Konstantin Shalygin wrote:

From the balancer module's code for v 12.2.7 I noticed [1] these lines

which reference [2] these 2 config options for upmap. You might try using

more max iterations or a smaller max deviation to see if you can get a

better balance in your cluster. I would try to start with [3] these

commands/values and see if it improves your balance and/or allows you to

generate a better map.


[1]

https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672

[2] upmap_max_iterations (default 10)

upmap_max_deviation (default .01)

[3] ceph config-key set mgr/balancer/upmap_max_iterations 50

ceph config-key set mgr/balancer/upmap_max_deviation .005


This was not help to my 12.2.8 cluster. When first iterations of balancing was 
performing I decreased max_misplaced from default 0.05 to 0.01. After this 
balancing operations was stopped.

After cluster is HEALTH_OK, I not see no any balancer run's. I'll try to lower 
balancer variables and restart mgr - message is still: "Error EALREADY: Unable 
to find further optimization,or distribution is already perfect"

# ceph config-key dump | grep balancer
"mgr/balancer/active": "1",
"mgr/balancer/max_misplaced": ".50",
"mgr/balancer/mode": "upmap",
"mgr/balancer/upmap_max_deviation": ".001",
"mgr/balancer/upmap_max_iterations": "100",


So may be I need delete upmaps and start over?


ID  CLASS WEIGHTREWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS TYPE NAME
 -1   414.0-  445TiB  129TiB  316TiB 29.01 1.00   - root default
 -7   414.0-  445TiB  129TiB  316TiB 29.01 1.00   - 
datacenter rtcloud
 -8   138.0-  148TiB 42.9TiB  105TiB 28.93 1.00   - 
rack rack2
 -269.0- 74.2TiB 21.5TiB 52.7TiB 28.93 1.00   - 
host ceph-osd0
  0   hdd   5.0  1.0 5.46TiB 1.64TiB 3.82TiB 30.06 1.04  62 
osd.0
  4   hdd   5.0  1.0 5.46TiB 1.65TiB 3.80TiB 30.29 1.04  64 
osd.4
  7   hdd   5.0  1.0 5.46TiB 1.61TiB 3.85TiB 29.44 1.01  63 
osd.7
  9   hdd   5.0  1.0 5.46TiB 1.68TiB 3.78TiB 30.77 1.06  63 
osd.9
 46   hdd   5.0  1.0 5.46TiB 1.68TiB 3.77TiB 30.86 1.06  65 
osd.46
 47   hdd   5.0  1.0 5.46TiB 1.68TiB 3.78TiB 30.73 1.06  66 
osd.47
 48   hdd   5.0  1.0 5.46TiB 1.65TiB 3.81TiB 30.22 1.04  66 
osd.48
 49   hdd   5.0  1.0 5.46TiB 1.71TiB 3.74TiB 31.41 1.08  65 
osd.49
 54   hdd   5.0  1.0 5.46TiB 1.64TiB 3.82TiB 30.08 1.04  65 
osd.54
 55   hdd   5.0  1.0 5.46TiB 1.65TiB 3.80TiB 30.30 1.04  64 
osd.55
 56   hdd   5.0  1.0 5.46TiB 1.66TiB 3.80TiB 30.35 1.05  64 
osd.56
 57   hdd   5

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-31 Thread Steve Taylor
I think I pretty well have things figured out at this point, but I'm not sure 
how to proceed.

The config-key settings were not effective because I had not restarted the 
active mgr after setting them. Once I restarted the mgr the settings became 
effective.

Once I had the config-key settings working I quickly discovered that they 
didn't make any difference, so I downloaded an osdmap and started trying to use 
osdmaptool offline to see if it would behave differently. It didn't, but when I 
specified '--debug-osd 20' on the osdmaptool command line things got 
interesting.

It looks like osdmaptool generates lists of overfull and underfull OSDs and 
then uses those lists to move PGs in order to achieve a perfect balance. In my 
case the expected PG count range per OSD is 56-57, but the actual range is 
56-61. The problem seems to lie in the fact that all of my OSDs have at least 
56 PGs and are therefore not considered underfull. The debug output from 
osdmaptool shows a decent list of overfull OSDs and an empty list of underfull 
OSDs, then says there is nothing to be done.

Perhaps the next step is to modify osdmaptool to allow OSDs that are not 
underfull but will not be made overfull by the move to take new PGs? That seems 
like it should be the expected behavior in this scenario.


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

-Original Message-
From: Steve Taylor 
Sent: Tuesday, October 30, 2018 1:40 PM
To: drakonst...@gmail.com
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Balancer module not balancing perfectly

I was having a difficult time getting debug logs from the active mgr, but I 
finally got it. Apparently injecting debug_mgr doesn't work, even when the 
change is reflected when you query the running config.
Modifying the config file and restarting the mgr got it to log for me.

Now that I have some debug logging, I think I may see the problem.

'ceph config-key dump'
...
"mgr/balancer/active": "1",
"mgr/balancer/max_misplaced": "1",
"mgr/balancer/mode": "upmap",
"mgr/balancer/upmap_max_deviation": "0.0001",
"mgr/balancer/upmap_max_iterations": "1000"

Mgr log excerpt:
2018-10-30 13:25:52.523117 7f08b47ff700  4 mgr[balancer] Optimize plan 
upmap-balance
2018-10-30 13:25:52.523135 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/mode
2018-10-30 13:25:52.523141 7f08b47ff700 10 ceph_config_get mode found:
upmap
2018-10-30 13:25:52.523144 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
2018-10-30 13:25:52.523145 7f08b47ff700 10 ceph_config_get max_misplaced found: 
1
2018-10-30 13:25:52.523178 7f08b47ff700  4 mgr[balancer] Mode upmap, max 
misplaced 1.00
2018-10-30 13:25:52.523241 7f08b47ff700 20 mgr[balancer] unknown
0.00 degraded 0.00 inactive 0.00 misplaced
0
2018-10-30 13:25:52.523288 7f08b47ff700  4 mgr[balancer] do_upmap
2018-10-30 13:25:52.523296 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
2018-10-30 13:25:52.523298 7f08b47ff700  4 ceph_config_get upmap_max_iterations 
not found
2018-10-30 13:25:52.523301 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
2018-10-30 13:25:52.523305 7f08b47ff700  4 ceph_config_get upmap_max_deviation 
not found
2018-10-30 13:25:52.523339 7f08b47ff700  4 mgr[balancer] pools ['rbd- data']
2018-10-30 13:25:52.523350 7f08b47ff700 10 osdmap_calc_pg_upmaps osdmap
0x7f08b1884280 inc 0x7f0898bda800 max_deviation
0.01 max_iterations 10 pools 3
2018-10-30 13:25:52.579669 7f08bbffc700  4 mgr ms_dispatch active mgrdigest v1
2018-10-30 13:25:52.579671 7f08bbffc700  4 mgr ms_dispatch mgrdigest v1
2018-10-30 13:25:52.579673 7f08bbffc700 10 mgr handle_mgr_digest 1364
2018-10-30 13:25:52.579674 7f08bbffc700 10 mgr handle_mgr_digest 501
2018-10-30 13:25:52.579677 7f08bbffc700 10 mgr notify_all notify_all:
notify_all mon_status
2018-10-30 13:25:52.579681 7f08bbffc700 10 mgr notify_all notify_all:
notify_all health
2018-10-30 13:25:52.579683 7f08bbffc700 10 mgr notify_all notify_all:
notify_all pg_summary
2018-10-30 13:25:52.579684 7f08bbffc700 10 mgr handle_mgr_digest done.
2018-10-30 13:25:52.603867 7f08b47ff700 10 osdmap_calc_pg_upmaps r = 0
2018-10-30 13:25:52.603982 7f08b47ff700  4 mgr[balancer] prepared 0/10 changes

The mgr claims that mgr/balancer/upmap_max_iterations and 
mgr/balancer/upmap_max_deviation aren't found in the config even though they 
have been set and appear in the config-key dump. It seems to be picking up the 
other config options correctly. Am I doing so

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I was having a difficult time getting debug logs from the active mgr,
but I finally got it. Apparently injecting debug_mgr doesn't work, even
when the change is reflected when you query the running config.
Modifying the config file and restarting the mgr got it to log for me.

Now that I have some debug logging, I think I may see the problem.

'ceph config-key dump'
...
"mgr/balancer/active": "1",
"mgr/balancer/max_misplaced": "1",
"mgr/balancer/mode": "upmap",
"mgr/balancer/upmap_max_deviation": "0.0001",
"mgr/balancer/upmap_max_iterations": "1000"

Mgr log excerpt:
2018-10-30 13:25:52.523117 7f08b47ff700  4 mgr[balancer] Optimize plan
upmap-balance
2018-10-30 13:25:52.523135 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/mode
2018-10-30 13:25:52.523141 7f08b47ff700 10 ceph_config_get mode found:
upmap
2018-10-30 13:25:52.523144 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/max_misplaced
2018-10-30 13:25:52.523145 7f08b47ff700 10 ceph_config_get
max_misplaced found: 1
2018-10-30 13:25:52.523178 7f08b47ff700  4 mgr[balancer] Mode upmap,
max misplaced 1.00
2018-10-30 13:25:52.523241 7f08b47ff700 20 mgr[balancer] unknown
0.00 degraded 0.00 inactive 0.00 misplaced 
0
2018-10-30 13:25:52.523288 7f08b47ff700  4 mgr[balancer] do_upmap
2018-10-30 13:25:52.523296 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_iterations
2018-10-30 13:25:52.523298 7f08b47ff700  4 ceph_config_get
upmap_max_iterations not found 
2018-10-30 13:25:52.523301 7f08b47ff700  4 mgr get_config
get_configkey: mgr/balancer/upmap_max_deviation
2018-10-30 13:25:52.523305 7f08b47ff700  4 ceph_config_get
upmap_max_deviation not found 
2018-10-30 13:25:52.523339 7f08b47ff700  4 mgr[balancer] pools ['rbd-
data']
2018-10-30 13:25:52.523350 7f08b47ff700 10 osdmap_calc_pg_upmaps osdmap
0x7f08b1884280 inc 0x7f0898bda800 max_deviation 
0.01 max_iterations 10 pools 3
2018-10-30 13:25:52.579669 7f08bbffc700  4 mgr ms_dispatch active
mgrdigest v1
2018-10-30 13:25:52.579671 7f08bbffc700  4 mgr ms_dispatch mgrdigest v1
2018-10-30 13:25:52.579673 7f08bbffc700 10 mgr handle_mgr_digest 1364
2018-10-30 13:25:52.579674 7f08bbffc700 10 mgr handle_mgr_digest 501
2018-10-30 13:25:52.579677 7f08bbffc700 10 mgr notify_all notify_all:
notify_all mon_status
2018-10-30 13:25:52.579681 7f08bbffc700 10 mgr notify_all notify_all:
notify_all health
2018-10-30 13:25:52.579683 7f08bbffc700 10 mgr notify_all notify_all:
notify_all pg_summary
2018-10-30 13:25:52.579684 7f08bbffc700 10 mgr handle_mgr_digest done.
2018-10-30 13:25:52.603867 7f08b47ff700 10 osdmap_calc_pg_upmaps r = 0
2018-10-30 13:25:52.603982 7f08b47ff700  4 mgr[balancer] prepared 0/10
changes

The mgr claims that mgr/balancer/upmap_max_iterations and
mgr/balancer/upmap_max_deviation aren't found in the config even though
they have been set and appear in the config-key dump. It seems to be
picking up the other config options correctly. Am I doing something
wrong? I feel like I must have a typo or something, but I'm not seeing
it.


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Tue, 2018-10-30 at 10:11 -0600, Steve Taylor wrote:
> I had played with those settings some already, but I just tried again
> with max_deviation set to 0.0001 and max_iterations set to 1000. Same
> result. Thanks for the suggestion though.
> 
> On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> 
> From the balancer module's code for v 12.2.7 I noticed [1] these
> lines which reference [2] these 2 config options for upmap. You might
> try using more max iterations or a smaller max deviation to see if
> you can get a better balance in your cluster. I would try to start
> with [3] these commands/values and see if it improves your balance
> and/or allows you to generate a better map.
> 
> [1] 
> 
https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50
> ceph config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor <
> steve.tay...@storagecraft.com> wrote:
> 
> I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8
> and
> m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> Each
> pool has 2048 PGs and is distributed across its 360 OSDs with host
> failure domains. The OSDs are 

Re: [ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I had played with those settings some already, but I just tried again
with max_deviation set to 0.0001 and max_iterations set to 1000. Same
result. Thanks for the suggestion though.


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Tue, 2018-10-30 at 12:06 -0400, David Turner wrote:
> From the balancer module's code for v 12.2.7 I noticed [1] these
> lines which reference [2] these 2 config options for upmap. You might
> try using more max iterations or a smaller max deviation to see if
> you can get a better balance in your cluster. I would try to start
> with [3] these commands/values and see if it improves your balance
> and/or allows you to generate a better map.
> 
> [1] 
> https://github.com/ceph/ceph/blob/v12.2.7/src/pybind/mgr/balancer/module.py#L671-L672
> [2] upmap_max_iterations (default 10)
> upmap_max_deviation (default .01)
> 
> [3] ceph config-key set mgr/balancer/upmap_max_iterations 50
> ceph config-key set mgr/balancer/upmap_max_deviation .005
> 
> On Tue, Oct 30, 2018 at 11:14 AM Steve Taylor <
> steve.tay...@storagecraft.com> wrote:
> > I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8
> > and
> > m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each.
> > Each
> > pool has 2048 PGs and is distributed across its 360 OSDs with host
> > failure domains. The OSDs are identical (4TB) and are weighted with
> > default weights (3.73).
> > 
> > Initially, and not surprisingly, the PG distribution was all over
> > the
> > place with PG counts per OSD ranging from 40 to 83. I enabled the
> > balancer module in upmap mode and let it work its magic, which
> > reduced
> > the range of the per-OSD PG counts to 56-61.
> > 
> > While 56-61 is obviously a whole lot better than 40-83, with upmap
> > I
> > expected the range to be 56-57. If I run 'ceph balancer optimize
> > ' again to attempt to create a new plan I get 'Error
> > EALREADY:
> > Unable to find further optimization,or distribution is already
> > perfect.' I set the balancer's max_misplaced value to 1 in case
> > that
> > was preventing further optimization, but I still get the same
> > error.
> > 
> > I'm sure I'm missing some config option or something that will
> > allow it
> > to do better, but thus far I haven't been able to find anything in
> > the
> > docs, mailing list archives, or balancer source code that helps.
> > Any
> > ideas?
> > 
> > 
> > Steve Taylor | Senior Software Engineer | StorageCraft Technology
> > Corporation
> > 380 Data Drive Suite 300 | Draper | Utah | 84020
> > Office: 801.871.2799 | 
> > 
> > If you are not the intended recipient of this message or received
> > it erroneously, please notify the sender and delete it, together
> > with any attachments, and be advised that any dissemination or
> > copying of this message is prohibited.
> > 
> > 
> > 
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Balancer module not balancing perfectly

2018-10-30 Thread Steve Taylor
I have a Luminous 12.2.7 cluster with 2 EC pools, both using k=8 and
m=2. Each pool lives on 20 dedicated OSD hosts with 18 OSDs each. Each
pool has 2048 PGs and is distributed across its 360 OSDs with host
failure domains. The OSDs are identical (4TB) and are weighted with
default weights (3.73).

Initially, and not surprisingly, the PG distribution was all over the
place with PG counts per OSD ranging from 40 to 83. I enabled the
balancer module in upmap mode and let it work its magic, which reduced
the range of the per-OSD PG counts to 56-61.

While 56-61 is obviously a whole lot better than 40-83, with upmap I
expected the range to be 56-57. If I run 'ceph balancer optimize
' again to attempt to create a new plan I get 'Error EALREADY:
Unable to find further optimization,or distribution is already
perfect.' I set the balancer's max_misplaced value to 1 in case that
was preventing further optimization, but I still get the same error.

I'm sure I'm missing some config option or something that will allow it
to do better, but thus far I haven't been able to find anything in the
docs, mailing list archives, or balancer source code that helps. Any
ideas?

 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Strange Ceph host behaviour

2018-10-02 Thread Steve Taylor
Unless this is related to load and OSDs really are unreponsive, it is
almost certainly some sort of network issue. Duplicate IP address
maybe?


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Tue, 2018-10-02 at 17:17 +0200, Vincent Godin wrote:
> Ceph cluster in Jewel 10.2.11
> Mons & Hosts are on CentOS 7.5.1804 kernel 3.10.0-862.6.3.el7.x86_64
> 
> Everyday, we can see in ceph.log on Monitor a lot of logs like these :
> 
> 2018-10-02 16:07:08.882374 osd.478 192.168.1.232:6838/7689 386 :
> cluster [WRN] map e612590 wrongly marked me down
> 2018-10-02 16:07:06.462653 osd.464 192.168.1.232:6830/6650 317 :
> cluster [WRN] map e612588 wrongly marked me down
> 2018-10-02 16:07:10.717673 osd.470 192.168.1.232:6836/7554 371 :
> cluster [WRN] map e612591 wrongly marked me down
> 2018-10-02 16:14:51.179945 osd.414 192.168.1.227:6808/4767 670 :
> cluster [WRN] map e612599 wrongly marked me down
> 2018-10-02 16:14:48.422442 osd.403 192.168.1.227:6832/6727 509 :
> cluster [WRN] map e612597 wrongly marked me down
> 2018-10-02 16:15:13.198180 osd.436 192.168.1.228:6828/6402 533 :
> cluster [WRN] map e612608 wrongly marked me down
> 2018-10-02 16:15:08.792369 osd.433 192.168.1.228:6832/6732 515 :
> cluster [WRN] map e612604 wrongly marked me down
> 2018-10-02 16:15:11.680405 osd.429 192.168.1.228:6838/7393 536 :
> cluster [WRN] map e612607 wrongly marked me down
> 2018-10-02 16:15:14.246717 osd.431 192.168.1.228:6822/5937 474 :
> cluster [WRN] map e612609 wrongly marked me down
> 
> On the server 192.168.1.228 for example, the /var/log/messages looks like :
> 
> Oct  2 16:15:02 bd-ceph-22 ceph-osd: 2018-10-02 16:15:02.935658
> 7f716f16e700 -1 osd.432 612603 heartbeat_check: no reply from
> 192.168.1.215:6815 osd.242 since back 2018-10-02 16:14:59.065582 front
> 2018-10-02 16:14:42.046092 (cutoff 2018-10-02 16:14:42.935642)
> Oct  2 16:15:03 bd-ceph-22 ceph-osd: 2018-10-02 16:15:03.935841
> 7f716f16e700 -1 osd.432 612603 heartbeat_check: no reply from
> 192.168.1.215:6815 osd.242 since back 2018-10-02 16:14:59.065582 front
> 2018-10-02 16:14:42.046092 (cutoff 2018-10-02 16:14:43.935824)
> Oct  2 16:15:04 bd-ceph-22 ceph-osd: 2018-10-02 16:15:04.283822
> 7fe378c13700 -1 osd.426 612603 heartbeat_check: no reply from
> 192.168.1.215:6807 osd.240 since back 2018-10-02 16:15:00.450196 front
> 2018-10-02 16:14:43.433054 (cutoff 2018-10-02 16:14:44.283811)
> Oct  2 16:15:04 bd-ceph-22 ceph-osd: 2018-10-02 16:15:04.353645
> 7f1110a32700 -1 osd.438 612603 heartbeat_check: no reply from
> 192.168.1.212:6807 osd.186 since back 2018-10-02 16:14:59.700105 front
> 2018-10-02 16:14:43.884248 (cutoff 2018-10-02 16:14:44.353612)
> Oct  2 16:15:04 bd-ceph-22 ceph-osd: 2018-10-02 16:15:04.373905
> 7f71375de700 -1 osd.432 612603 heartbeat_check: no reply from
> 192.168.1.215:6815 osd.242 since back 2018-10-02 16:14:59.065582 front
> 2018-10-02 16:14:42.046092 (cutoff 2018-10-02 16:14:44.373897)
> Oct  2 16:15:04 bd-ceph-22 ceph-osd: 2018-10-02 16:15:04.935997
> 7f716f16e700 -1 osd.432 612603 heartbeat_check: no reply from
> 192.168.1.215:6815 osd.242 since back 2018-10-02 16:15:04.369740 front
> 2018-10-02 16:14:42.046092 (cutoff 2018-10-02 16:14:44.935981)
> Oct  2 16:15:05 bd-ceph-22 ceph-osd: 2018-10-02 16:15:05.007484
> 7f10d97ec700 -1 osd.438 612603 heartbeat_check: no reply from
> 192.168.1.212:6807 osd.186 since back 2018-10-02 16:14:59.700105 front
> 2018-10-02 16:14:43.884248 (cutoff 2018-10-02 16:14:45.007477)
> Oct  2 16:15:05 bd-ceph-22 ceph-osd: 2018-10-02 16:15:05.017154
> 7fd4cee4d700 -1 osd.435 612603 heartbeat_check: no reply from
> 192.168.1.212:6833 osd.195 since back 2018-10-02 16:15:03.273909 front
> 2018-10-02 16:14:44.648411 (cutoff 2018-10-02 16:14:45.017106)
> Oct  2 16:15:05 bd-ceph-22 ceph-osd: 2018-10-02 16:15:05.158580
> 7fe343c96700 -1 osd.426 612603 heartbeat_check: no reply from
> 192.168.1.215:6807 osd.240 since back 2018-10-02 16:15:00.450196 front
> 2018-10-02 16:14:43.433054 (cutoff 2018-10-02 16:14:45.158567)
> Oct  2 16:15:05 bd-ceph-22 ceph-osd: 2018-10-02 16:15:05.283983
> 7fe378c13700 -1 osd.426 612603 heartbeat_check: no reply from
> 192.168.1.215:6807 osd.240 since back 2018-10-02 16:15:05.154458 front
> 2018-10-02 16:14:43.433054 (cutoff 2018-10-02 16:14:45.283975)
> 
> There is no network problem at that time (i checked the logs on the
> host and on the switch). OSD logs shows nothing but "wrongly marked me
> down" and

Re: [ceph-users] move rbd image (with snapshots) to different pool

2018-06-15 Thread Steve Taylor
I have done this with Luminous by deep-flattening a clone in a different pool. 
It seemed to do what I wanted, but the RBD appeared to lose its sparseness in 
the process. Can anyone verify that and/or comment on whether Mimic's "rbd deep 
copy" does the same?


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 |?Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

-Original Message-
From: ceph-users  On Behalf Of Jason Dillaman
Sent: Friday, June 15, 2018 7:45 AM
To: Marc Roos 
Cc: ceph-users 
Subject: Re: [ceph-users] move rbd image (with snapshots) to different pool

The "rbd clone" command will just create a copy-on-write cloned child of the 
source image. It will not copy any snapshots from the original image to the 
clone.

With the Luminous release, you can use "rbd export --export-format 2 
 - | rbd import --export-format 2 - " to export / 
import an image (and all its snapshots) to a different pool.
Additionally, with the Mimic release, you can run "rbd deep copy" to copy an 
image (and all its snapshots) to a different pool.

On Fri, Jun 15, 2018 at 3:26 AM, Marc Roos  wrote:
>
> If I would like to copy/move an rbd image, this is the only option I 
> have? (Want to move an image from a hdd pool to an ssd pool)
>
> rbd clone mypool/parent@snap otherpool/child
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance

2018-04-12 Thread Steve Taylor
I can't comment directly on the relation XFS fragmentation has to Bluestore, 
but I had a similar issue probably 2-3 years ago where XFS fragmentation was 
causing a significant degradation in cluster performance. The use case was RBDs 
with lots of snapshots created and deleted at regular intervals. XFS got pretty 
severely fragmented and the cluster slowed down quickly.

The solution I found was to set the XFS allocsize to match the RBD object size 
via osd_mount_options_xfs. Of course I also had to defragment XFS to clear up 
the existing fragmentation, but that was fairly painless. XFS fragmentation 
hasn't been an issue since. That solution isn't as applicable in an object 
store use case where the object size is more variable, but increasing the XFS 
allocsize could still help.

As far as Bluestore goes, I haven't deployed it in production yet, but I would 
expect that manipulating bluestore_min_alloc_size in a similar fashion would 
yield similar benefits. Of course you are then wasting some disk space for 
every object that ends up being smaller than that allocation size in both 
cases. That's the trade-off.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Thu, 2018-04-12 at 04:13 +0200, Marc Roos wrote:


Is that not obvious? The 8TB is handling twice as much as the 4TB. Afaik
there is not a linear relationship with the iops of a disk and its size.


But interesting about this xfs defragmentation, how does this
relate/compare to bluestore?





-Original Message-
From: ? ?? [mailto:yaozong...@outlook.com]
Sent: donderdag 12 april 2018 4:36
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: *SPAM* [ceph-users] osds with different disk sizes may
killing performance
Importance: High

Hi,

For anybody who may be interested, here I share a process of locating
the reason for ceph cluster performance slow down in our environment.

Internally, we have a cluster with capacity 1.1PB, used 800TB, and raw
user data is about 500TB. Each day, 3TB' data is uploaded and 3TB oldest
data is lifecycled (we are using s3 object store, and bucket lifecycle
is enabled). As time goes by, the cluster becomes some slower, we doubt
the xfs fragmentation is the fiend.

After some testing, we do find xfs fragmentation slow down filestore's
performance, for example, at 15% fragmentation, the performance is 85%
of the original, and at 25%, the performance is 74.73% of the original.

But the main reason for our cluster's deterioration of performance is
not the xfs fragmentation.

Initially, our ceph cluster contains only osds with 4TB's disk, as time
goes by, we scale out our cluster by adding some new osds with 8TB's
disk. And as the new disk's capacity is double times of the old disks,
so each new osd's weight is double of the old osd. And new osd has
double pgs than old osd, and new osd used double disk space than the old
osd. Everything looks good and fine.

But even though the new osd has double capacity than the old osd, the
new osd's performance is not double than the old osd. After digging into
our internal system stats, we find the new added's disk io util is about
two times than the old. And from time to time, the new disks' io util
rises up to 100%. The new added osds are the performance killer. They
slow down the whole cluster's performance.

As the reason is found, the solution is very simple. After lower new
added osds's weight, the annoying slow request warnings have died away.

So the conclusion is: in cluster with different osd's disk size, osd's
weight is not only determined by its capacity, we should also have a
look at its performance.

Best wishes,
Yao Zongyou
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reweight 0 - best way to backfill slowly?

2018-01-29 Thread Steve Taylor
There are two concerns with setting the reweight to 1.0. The first is peering 
and the second is backfilling. Peering is going to block client I/O on the 
affected OSDs, while backfilling will only potentially slow things down.

I don't know what your client I/O looks like, but personally I would probably 
set the norecover and nobackfill flags, slowly increment your reweight value by 
0.01 or whatever you deem to be appropriate for your environment, waiting for 
peering to complete in between each step. Also allow any resulting blocked 
requests to clear up before incrementing your reweight again.

When your reweight is all the way up to 1.0, inject osd_max_backfills to 
whatever you like (or don't if you're happy with it as is) and unset the 
norecover and nobackfill flags to let backfilling begin. If you are unable to 
handle the impact of backfilling with osd_max_backfills set to 1, then you need 
to add some new OSDs to your cluster before doing any of this. They will have 
to backfill too, but at least you'll have more spindles to handle it.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Mon, 2018-01-29 at 22:43 +0100, David Majchrzak wrote:

And so I totally forgot to add df tree to the mail.
Here's the interesting bit from two first nodes. where osd.11 has weight but is 
reweighted to 0.


root@osd1:~# ceph osd df tree
ID WEIGHTREWEIGHT SIZE   USEAVAIL  %USE  VAR  TYPE NAME
-1 181.7-   109T 50848G 60878G 00 root default
-2  36.3- 37242G 16792G 20449G 45.09 0.99 host osd1
 0   3.64000  1.0  3724G  1730G  1993G 46.48 1.02 osd.0
 1   3.64000  1.0  3724G  1666G  2057G 44.75 0.98 osd.1
 2   3.64000  1.0  3724G  1734G  1989G 46.57 1.02 osd.2
 3   3.64000  1.0  3724G  1387G  2336G 37.25 0.82 osd.3
 4   3.64000  1.0  3724G  1722G  2002G 46.24 1.01 osd.4
 6   3.64000  1.0  3724G  1840G  1883G 49.43 1.08 osd.6
 7   3.64000  1.0  3724G  1651G  2072G 44.34 0.97 osd.7
 8   3.64000  1.0  3724G  1747G  1976G 46.93 1.03 osd.8
 9   3.64000  1.0  3724G  1697G  2026G 45.58 1.00 osd.9
 5   3.64000  1.0  3724G  1614G  2109G 43.34 0.95 osd.5
-3  36.3-  0  0  0 00 host osd2
12   3.64000  1.0  3724G  1730G  1993G 46.46 1.02 osd.12
13   3.64000  1.0  3724G  1745G  1978G 46.88 1.03 osd.13
14   3.64000  1.0  3724G  1707G  2016G 45.84 1.01 osd.14
15   3.64000  1.0  3724G  1540G  2184G 41.35 0.91 osd.15
16   3.64000  1.0  3724G  1484G  2239G 39.86 0.87 osd.16
18   3.64000  1.0  3724G  1928G  1796G 51.77 1.14 osd.18
20   3.64000  1.0  3724G  1767G  1956G 47.45 1.04 osd.20
10   3.64000  1.0  3724G  1797G  1926G 48.27 1.06 osd.10
49   3.64000  1.0  3724G  1847G  1877G 49.60 1.09 osd.49
11   3.640000  0  0  0 00 osd.11



29 jan. 2018 kl. 22:40 skrev David Majchrzak 
<da...@visions.se<mailto:da...@visions.se>>:

Hi!

Cluster: 5 HW nodes, 10 HDDs with SSD journals, filestore, 0.94.9 hammer, 
debian wheezy (scheduled to upgrade once this is fixed).

I have a replaced HDD that another admin set to reweight 0 instead of weight 0 
(I can't remember the reason).
What would be the best way to slowly backfill it? Usually I'm using weight and 
slowly growing it to max size.

I guess if I just set reweight to 1.0, it will backfill as fast as I let it, 
that is max 1 backfill / osd but it will probably disrupt client io (this being 
on hammer).

And if I set the weight on it to 0, the node will get less weight, and will 
start moving data around everywhere right?

Can I use reweight the same way as weight here, slowly increasing it up to 1.0 
by increments of say 0.01?

Kind Regards,
David Majchrzak

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] upgrade Hammer>Jewel>Luminous OSD fail to start

2017-09-12 Thread Steve Taylor
It seems like I've seen similar behavior in the past with the changing of the 
osd user context between hammer and jewel. Hammer ran osds as root, and they 
switched to running as the ceph user in jewel. That doesn't really seem to 
match your scenario perfectly, but I think the errors you're seeing in the logs 
match what I've seen in that situation before.

If that's the issue, you need to chown everything under /var/lib/ceph/osd to be 
owned by ceph instead of root as documented in the jewel release notes.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Wed, 2017-09-13 at 00:52 +0530, kevin parrikar wrote:
Can some one please help me on this.I have no idea how to bring up the cluster 
to operational state.

Thanks,
Kev

On Tue, Sep 12, 2017 at 11:12 AM, kevin parrikar 
<kevin.parker...@gmail.com<mailto:kevin.parker...@gmail.com>> wrote:
hello All,
I am trying to upgrade a small test setup having one monitor and one osd node 
which is in hammer release .


I updating from hammer to jewel using package update commands and things are 
working.
How ever after updating from Jewel to Luminous, i am facing issues with osd 
failing to start .

upgraded packages on both nodes and i can see in "ceph mon versions" is 
successful

 ceph mon versions
{
"ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous 
(rc)": 1
}

but  ceph osd versions returns empty strig


ceph osd versions
{}


dpkg --list|grep ceph
ii  ceph 12.2.0-1trusty 
amd64distributed storage and file system
ii  ceph-base12.2.0-1trusty 
amd64common ceph daemon libraries and management tools
ii  ceph-common  12.2.0-1trusty 
amd64common utilities to mount and interact with a ceph storage 
cluster
ii  ceph-deploy  1.5.38 
all  Ceph-deploy is an easy to use configuration tool
ii  ceph-mgr 12.2.0-1trusty 
amd64manager for the ceph distributed storage system
ii  ceph-mon 12.2.0-1trusty 
amd64monitor server for the ceph storage system
ii  ceph-osd 12.2.0-1trusty 
amd64OSD server for the ceph storage system
ii  libcephfs1   10.2.9-1trusty 
amd64Ceph distributed file system client library
ii  libcephfs2   12.2.0-1trusty 
amd64Ceph distributed file system client library
ii  python-cephfs12.2.0-1trusty 
amd64Python 2 libraries for the Ceph libcephfs library

from OSD log:

2017-09-12 05:38:10.618023 7fc307a10d00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2017-09-12 05:38:10.618618 7fc307a10d00  0 ceph version 12.2.0 
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc), process (unknown), 
pid 21513
2017-09-12 05:38:10.624473 7fc307a10d00  0 pidfile_write: ignore empty 
--pid-file
2017-09-12 05:38:10.633099 7fc307a10d00  0 load: jerasure load: lrc load: isa
2017-09-12 05:38:10.633657 7fc307a10d00  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2017-09-12 05:38:10.635164 7fc307a10d00  0 filestore(/var/lib/ceph/osd/ceph-0) 
backend xfs (magic 0x58465342)
2017-09-12 05:38:10.637503 7fc307a10d00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2017-09-12 05:38:10.637833 7fc307a10d00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2017-09-12 05:38:10.637923 7fc307a10d00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice() is 
disabled via 'filestore splice' config option
2017-09-12 05:38:10.639047 7fc307a10d00  0 
genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2017-09-12 05:38:10.639501 7fc307a10d00  0 
xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is 
disabled by conf
2017-09-12 05:38:10.640417 7fc307a10d00  0 file

Re: [ceph-users] Power outages!!! help!

2017-08-30 Thread Steve Taylor
I'm not familiar with dd_rescue, but I've just been reading about it. I'm not 
seeing any features that would be beneficial in this scenario that aren't also 
available in dd. What specific features give it "really a far better chance of 
restoring a copy of your disk" than dd? I'm always interested in learning about 
new recovery tools.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Tue, 2017-08-29 at 21:49 +0200, Willem Jan Withagen wrote:

On 29-8-2017 19:12, Steve Taylor wrote:


Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.



Like somebody else already pointed out
In problem "cases like disk, use dd_rescue.
It has really a far better chance of restoring a copy of your disk

--WjW



I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺



Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:


Rule of thumb with batteries is:
- more “proper temperature” you run them at the more life you get out
of them
- more battery is overpowered for your application the longer it will
survive.

Get your self a LSI 94** controller and use it as HBA and you will be
fine. but get MORE DRIVES ! …


On 28 Aug 2017, at 23:10, hjcho616 
<hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>> wrote:

Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
try these out.  Car battery idea is nice!  I may try that.. =)  Do
they last longer?  Ones that fit the UPS original battery spec
didn't last very long... part of the reason why I gave up on them..
=P  My wife probably won't like the idea of car battery hanging out
though ha!

The OSD1 (one with mostly ok OSDs, except that smart failure)
motherboard doesn't have any additional SATA connectors available.
 Would it be safe to add another OSD host?

Regards,
Hong



On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
mail.com> wrote:


Sorry for being brutal … anyway
1. get the battery for UPS ( a car battery will do as well, I’ve
moded on ups in the past with truck battery and it was working like
a charm :D )
2. get spare drives and put those in because your cluster CAN NOT
get out of error due to lack of space
3. Follow advice of Ronny Aasen on hot to recover data from hard
drives
4 get cooling to drives or you will loose more !




On 28 Aug 2017, at 22:39, hjcho616 
<hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>> wrote:

Tomasz,

Those machines are behind a surge protector.  Doesn't appear to
be a good one!  I do have a UPS... but it is my fault... no
battery.  Power was pretty reliable for a while... and UPS was
just beeping every chance it had, disrupting some sleep.. =P  So
running on surge protector only.  I am running this in home
environment.   So far, HDD failures have been very rare for this
environment. =)  It just doesn't get loaded as much!  I am not
sure what to expect, seeing that "unfound" and just a feeling of
possibility of maybe getting OSD back made me excited about it.
=) Thanks for letting me know what should be the priority.  I
just lack experience and knowledge in this. =) Please do continue
to guide me though this.

Thank you for the decode of that smart messages!  I do agree that
looks like it is on its way out.  I would like to know 

Re: [ceph-users] Power outages!!! help!

2017-08-30 Thread Steve Taylor
Yes, if I had created the RBD in the same cluster I was trying to repair then I 
would have used rbd-fuse to "map" the RBD in order to avoid potential deadlock 
issues with the kernel client. I had another cluster available, so I copied its 
config file to the osd node, created the RBD in the second cluster, and used 
the kernel client for the dd, xfs_repair, and mount. Worked like a charm.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Tue, 2017-08-29 at 18:04 +, David Turner wrote:

But it was absolutely awesome to run an osd off of an rbd after the disk failed.

On Tue, Aug 29, 2017, 1:42 PM David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

To addend Steve's success, the rbd was created in a second cluster in the same 
datacenter so it didn't run the risk of deadlocking that mapping rbds on 
machines running osds has.  It is still theoretical to work on the same 
cluster, but more inherently dangerous for a few reasons.

On Tue, Aug 29, 2017, 1:15 PM Steve Taylor 
<steve.tay...@storagecraft.com<mailto:steve.tay...@storagecraft.com>> wrote:
Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.

I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺



Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |

If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
> Rule of thumb with batteries is:
> - more “proper temperature” you run them at the more life you get out
> of them
> - more battery is overpowered for your application the longer it will
> survive.
>
> Get your self a LSI 94** controller and use it as HBA and you will be
> fine. but get MORE DRIVES ! …
> > On 28 Aug 2017, at 23:10, hjcho616 
> > <hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>> wrote:
> >
> > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
> > try these out.  Car battery idea is nice!  I may try that.. =)  Do
> > they last longer?  Ones that fit the UPS original battery spec
> > didn't last very long... part of the reason why I gave up on them..
> > =P  My wife probably won't like the idea of car battery hanging out
> > though ha!
> >
> > The OSD1 (one with mostly ok OSDs, except that smart failure)
> > motherboard doesn't have any additional SATA connectors available.
> >  Would it be safe to add another OSD host?
> >
> > Regards,
> > Hong
> >
> >
> >
> > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
> > mail.com<http://mail.com>> wrote:
> >
> >
> > Sorry for being brutal … anyway
> > 1. get the battery for UPS ( a car battery will do as well, I’ve
> > moded on ups in the past with truck battery and it was working like
> > a charm :D )
> > 2. get spare drives and put those in because your cluster CAN NOT
> > get out of error due to lack of space
> > 3. Follow advice of Ronny Aasen on hot to recover data from hard
> > drives
> > 4 get cooling to drives or you will loose more !
> >
> >
> > > On 28 Aug 2017, at 22:39, hjcho616 
> > > <hjcho...@yahoo.com<mailto:hjcho...@yaho

Re: [ceph-users] Power outages!!! help!

2017-08-29 Thread Steve Taylor
Hong,

Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is failing. Then you could either re-attach the OSD
from the new source or attempt to retrieve objects from the filestore
on it.

I have actually done this before by creating an RBD that matches the
disk size, performing the dd, running xfs_repair, and eventually
adding it back to the cluster as an OSD. RBDs as OSDs is certainly a
temporary arrangement for repair only, but I'm happy to report that it
worked flawlessly in my case. I was able to weight the OSD to 0,
offload all of its data, then remove it for a full recovery, at which
point I just deleted the RBD.

The possibilities afforded by Ceph inception are endless. ☺


 
Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | 
 
If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

 

On Mon, 2017-08-28 at 23:17 +0100, Tomasz Kusmierz wrote:
> Rule of thumb with batteries is:
> - more “proper temperature” you run them at the more life you get out
> of them
> - more battery is overpowered for your application the longer it will
> survive. 
> 
> Get your self a LSI 94** controller and use it as HBA and you will be
> fine. but get MORE DRIVES ! … 
> > On 28 Aug 2017, at 23:10, hjcho616 <hjcho...@yahoo.com> wrote:
> > 
> > Thank you Tomasz and Ronny.  I'll have to order some hdd soon and
> > try these out.  Car battery idea is nice!  I may try that.. =)  Do
> > they last longer?  Ones that fit the UPS original battery spec
> > didn't last very long... part of the reason why I gave up on them..
> > =P  My wife probably won't like the idea of car battery hanging out
> > though ha!
> > 
> > The OSD1 (one with mostly ok OSDs, except that smart failure)
> > motherboard doesn't have any additional SATA connectors available.
> >  Would it be safe to add another OSD host?
> > 
> > Regards,
> > Hong
> > 
> > 
> > 
> > On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g
> > mail.com> wrote:
> > 
> > 
> > Sorry for being brutal … anyway 
> > 1. get the battery for UPS ( a car battery will do as well, I’ve
> > moded on ups in the past with truck battery and it was working like
> > a charm :D )
> > 2. get spare drives and put those in because your cluster CAN NOT
> > get out of error due to lack of space
> > 3. Follow advice of Ronny Aasen on hot to recover data from hard
> > drives 
> > 4 get cooling to drives or you will loose more ! 
> > 
> > 
> > > On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com> wrote:
> > > 
> > > Tomasz,
> > > 
> > > Those machines are behind a surge protector.  Doesn't appear to
> > > be a good one!  I do have a UPS... but it is my fault... no
> > > battery.  Power was pretty reliable for a while... and UPS was
> > > just beeping every chance it had, disrupting some sleep.. =P  So
> > > running on surge protector only.  I am running this in home
> > > environment.   So far, HDD failures have been very rare for this
> > > environment. =)  It just doesn't get loaded as much!  I am not
> > > sure what to expect, seeing that "unfound" and just a feeling of
> > > possibility of maybe getting OSD back made me excited about it.
> > > =) Thanks for letting me know what should be the priority.  I
> > > just lack experience and knowledge in this. =) Please do continue
> > > to guide me though this. 
> > > 
> > > Thank you for the decode of that smart messages!  I do agree that
> > > looks like it is on its way out.  I would like to know how to get
> > > good portion of it back if possible. =)
> > > 
> > > I think I just set the size and min_size to 1.
> > > # ceph osd lspools
> > > 0 data,1 metadata,2 rbd,
> > > # ceph osd pool set rbd size 1
> > > set pool 2 size to 1
> > > # ceph osd pool set rbd min_size 1
> > > set pool 2 min_size to 1
> > > 
> > > Seems to be doing some backfilling work.
> > > 
> > > # ceph health
> > > HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2
> > > pgs backfill_toofull; 7

Re: [ceph-users] Power outages!!! help!

2017-08-28 Thread Steve Taylor
I'm jumping in a little late here, but running xfs_repair on your partition 
can't frag your partition table. The partition table lives outside the 
partition block device and xfs_repair doesn't have access to it when run 
against /dev/sdb1. I haven't actually tested it, but it seems unlikely that 
running xfs_repair on /dev/sdb would do it either. I would assume it would just 
give you an error about /dev/sdb not containing an XFS filesystem. That's a 
guess though. I haven't ever tried anything like that.

Are you sure there isn't physical damage to the disk? I wouldn't say it's 
common, but power outages can do that. You can run 'dmesg | grep sdb' and 
'smartctl -a /dev/sdb' to see if there are kernel errors or SMART errors 
indicative of physical problems. If the disk is physically sound and the 
partition table really has been fragged, you may be able to restore it from the 
backup at the end of the disk, assuming it's GPT. If you can't find a partition 
or a filesystem somehow, then you're probably out of luck as far as retrieving 
any objects from that OSD. If the disk is physically damaged and your partition 
is gone, then it probably isn't worth wasting additional time on it.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Mon, 2017-08-28 at 19:18 +, hjcho616 wrote:
Tomasz,

Looks like when I did xfs_repair -L /dev/sdb1 it did something to partition 
table and I don't see /dev/sdb1 anymore... or maybe I missed 1 in the 
/dev/sdb1? =(. Yes.. that extra power outage did a pretty good damage... =P  I 
am hoping 0.007% is very small...=P  Any recommendations on fixing xfs 
partition I am missing? =)

Ronny,

Thank you for that link!

No I haven't done anything to osds... not touching them, hoping that I can 
revive some of them.. =)  Only thing done is trying to start and stop them..

Below are the links to newer files with just one start attempt. =)
ceph-osd.3_single.log<https://drive.google.com/open?id=0By7YztAJNGUWRUUtREZhY0NCVzQ>

<https://drive.google.com/open?id=0By7YztAJNGUWRUUtREZhY0NCVzQ>
[https://s.yimg.com/nq/storm/assets/enhancrV2/23/logos/google.png]
ceph-osd.3_single.log


ceph-osd.4_single.log<https://drive.google.com/open?id=0By7YztAJNGUWVzFxbEZ4UURLQzA>

<https://drive.google.com/open?id=0By7YztAJNGUWVzFxbEZ4UURLQzA>
[https://s.yimg.com/vv//api/res/1.2/6js1HPFw1ePUfgrZdK0glw--/YXBwaWQ9bWFpbDtmaT1maWxsO2g9ODA7dz04MA--/https://lh5.googleusercontent.com/dgHcOP6Na3RcgR0rOHgRjiyos_MOtlk-WjCp__L2nIJX7vwaLQj3QQ=w1200-h630-p.cf.jpg]
 [https://s.yimg.com/nq/storm/assets/enhancrV2/23/logos/google.png]
ceph-osd.4_single.log



ceph-osd.5_single.log<https://drive.google.com/open?id=0By7YztAJNGUWQ18wRUVwYkNMRW8>

<https://drive.google.com/open?id=0By7YztAJNGUWQ18wRUVwYkNMRW8>
[https://s.yimg.com/vv//api/res/1.2/TNJOwajiVQcd_mAnFDCqpQ--/YXBwaWQ9bWFpbDtmaT1maWxsO2g9ODA7dz04MA--/https://lh5.googleusercontent.com/KnCXt_G7jTuxtknlvz3gU5g_dozYNe_EwEdEwaAXoDAPf9bqZurrvw=w1200-h630-p.cf.jpg]
 [https://s.yimg.com/nq/storm/assets/enhancrV2/23/logos/google.png]
ceph-osd.5_single.log


ceph-osd.8_single.log<https://drive.google.com/open?id=0By7YztAJNGUWSk9XY01SQUo1Vmc>

<https://drive.google.com/open?id=0By7YztAJNGUWSk9XY01SQUo1Vmc>
[https://s.yimg.com/nq/storm/assets/enhancrV2/23/logos/google.png]
ceph-osd.8_single.log


Regards,
Hong


On Monday, August 28, 2017 12:53 PM, Ronny Aasen <ronny+ceph-us...@aasen.cx> 
wrote:


comments inline

On 28.08.2017 18:31, hjcho616 wrote:


I'll see what I can do on that... Looks like I may have to add another OSD host 
as I utilized all of the SATA ports on those boards. =P

Ronny,

I am running with size=2 min_size=1.  I created everything with ceph-deploy and 
didn't touch much of that pool settings...  I hope not, but sounds like I may 
have lost some files!  I do want some of those OSDs to come back online 
somehow... to get that confidence level up. =P


This is a bad idea as you have found out. once your cluster is healthy you 
should look at improving this.

The dead osd.3 message is probably me trying to stop and start the osd.  There 
were some cases where stop didn't kill the ceph-osd process.  I just started or 
restarted osd to try and see if that worked..  After that, there were some 
reboots and I am not seeing those messages after it...


when providing logs. try to move away the old one. do a single startup. and 
post that. it ma

Re: [ceph-users] how to fix X is an unexpected clone

2017-08-08 Thread Steve Taylor
I encountered this same issue on two different clusters running Hammer 0.94.9 
last week. In both cases I was able to resolve it by deleting (moving) all 
replicas of the unexpected clone manually and issuing a pg repair. Which 
version did you see this on? A call stack for the resulting crash would also be 
interesting, although troubleshooting further is probably less valid and less 
valuable now that you've resolved the problem. It's just a matter of curiosity 
at this point.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



On Tue, 2017-08-08 at 12:02 +0200, Stefan Priebe - Profihost AG wrote:

Hello Greg,

Am 08.08.2017 um 11:56 schrieb Gregory Farnum:


On Mon, Aug 7, 2017 at 11:55 PM Stefan Priebe - Profihost AG
<s.pri...@profihost.ag<mailto:s.pri...@profihost.ag> 
<mailto:s.pri...@profihost.ag>> wrote:

Hello,

how can i fix this one:

2017-08-08 08:42:52.265321 osd.20 [ERR] repair 3.61a
3:58654d3d:::rbd_data.106dd406b8b4567.018c:9d455 is an
unexpected clone
2017-08-08 08:43:04.914640 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
pgs repair; 1 scrub errors
2017-08-08 08:43:33.470246 osd.20 [ERR] 3.61a repair 1 errors, 0 fixed
2017-08-08 08:44:04.915148 mon.0 [INF] HEALTH_ERR; 1 pgs inconsistent; 1
scrub errors

If i just delete manually the relevant files ceph is crashing. rados
does not list those at all?

How can i fix this?


You've sent quite a few emails that have this story spread out, and I
think you've tried several different steps to repair it that have been a
bit difficult to track.

It would be helpful if you could put the whole story in one place and
explain very carefully exactly what you saw and how you responded. Stuff
like manually copying around the wrong files, or files without a
matching object info, could have done some very strange things.
Also, basic debugging stuff like what version you're running will help. :)

Also note that since you've said elsewhere you don't need this image, I
don't think it's going to hurt you to leave it like this for a bit
(though it will definitely mess up your monitoring).
-Greg



i'm sorry about that. You're correct.

I was able to fix this just a few minutes ago by using the
ceph-object-tool and the remove operation to remove all left over files.

I did this on all OSDs with the problematic pg. After that ceph was able
to fix itself.

A better approach might be that ceph can recover itself from an
unexpected clone by just deleting it.

Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Read errors on OSD

2017-06-01 Thread Steve Taylor
I've seen similar issues in the past with 4U Supermicro servers populated with 
spinning disks. In my case it turned out to be a specific firmware+BIOS 
combination on the disk controller card that was buggy. I fixed it by updating 
the firmware and BIOS on the card to the latest versions.

I saw this on several servers, and it took a while to track down as you can 
imagine. Same symptoms you're reporting.

There was a data corruption problem a while back with the Linux kernel and 
Samsung 850 Pro drives, but your problem doesn't sound like data corruption. 
Still, I'd check to make sure the kernel version you're running has the fix.




[cid:SC_LOGO_VERT_4C_100x72_f823be1a-ae53-43d3-975c-b054a1b22ec3.jpg]


Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




On Thu, 2017-06-01 at 13:40 +0100, Oliver Humpage wrote:


On 1 Jun 2017, at 11:55, Matthew Vernon 
<m...@sanger.ac.uk<mailto:m...@sanger.ac.uk>> wrote:

You don't say what's in kern.log - we've had (rotating) disks that were 
throwing read errors but still saying they were OK on SMART.



Fair point. There was nothing correlating to the time that ceph logged an error 
this morning, which is why I didn’t mention it, but looking harder I see 
yesterday there was a

May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Sense Key : Hardware Error 
[current]
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 Add. Sense: Internal 
target failure
May 31 07:20:13 osd1 kernel: sd 0:0:8:0: [sdi] tag#0 CDB: Read(10) 28 00 77 51 
42 d8 00 02 00 00
May 31 07:20:13 osd1 kernel: blk_update_request: critical target error, dev 
sdi, sector 2001814232

sdi was the disk with the OSD affected today. Guess it’s flakey SSDs then.

Weird that just re-reading the file makes everything OK though - wondering how 
much it’s worth worrying about that, or if there’s a way of making ceph retry 
reads automatically?

Oliver.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
One other thing to note with this experience is that we do a LOT of RBD snap 
trimming, like hundreds of millions of objects per day added to our snap_trimqs 
globally. All of the unfound objects in these cases were found on other OSDs in 
the cluster with identical contents, but associated with different snapshots. 
In other words, the file contents matched exactly, but the xattrs differed and 
the filenames indicated that the objects belonged to different snapshots.

Some of the unfound objects belonged to head, so I don't necessarily believe 
that they were in the process of being trimmed, but I imagine there is some 
possibility that this issue is related to snap trimming or deleting snapshots. 
Just more information...

On Thu, 2017-03-30 at 17:13 +, Steve Taylor wrote:

Good suggestion, Nick. I actually did that at the time. The "ceph osd map" 
wasn't all that interesting because the OSDs had been outed and their PGs had 
been mapped to new OSDs. Everything appeared to be in order with the PGs being 
mapped to the right number of new OSDs. The PG mappings looked fine, but the 
objects just didn't exist anywhere except on the OSDs that had been marked out.

The PG queries were a little more useful, but still didn't really help in the 
end. In all cases (unfound objects from 2 OSDs in each of 2 occurrences), the 
PGs showed 5 or so OSDs where they thought the unfound objects might be, one of 
which was an OSD that had been marked out. In both cases we even waited until 
backfilling completed to see if perhaps the missing objects would turn up 
somewhere else, but none ever did.

In the first instance we were simply able to reattach the 2 OSDs to the cluster 
with 0 weight and recover the unfound objects. The second instance involved 
drive problems and was a little bit trickier. The drives had experienced errors 
and the XFS filesystems had both become corrupt and wouldn't even mount. We 
didn't have any spare drives large enough, so I ended up using dd, ignoring 
errors, to copy the disks to RBDs in a different Ceph cluster. I then kernel 
mapped the RBDs on the host with the failed drives, ran XFS repairs on them, 
mouted them to the OSD directories, started the OSDs, and put them back in the 
cluster with 0 weight. I was lucky enough that those objects were available and 
they were recovered. Of course I immediately removed those OSDs once the 
unfound objects cleared up.

That's the other intersting aspect of this problem. This cluster had 4TB HGST 
drives for its OSDs, but we had to expand it fairly urgently and didn't have 
enough drives. We added two new servers, each with 16 4TB drives and 16 8TB 
HGST He8 drives. In both instances the problems we encountered were with the 
8TB drives. We have since acquired more 4TB drives and have replaced all of the 
8TB drives in the cluster. We have a total of 8 production clusters globally 
and have been running Ceph in production for 2 years. These two occurences 
recently are the only times we've seen these types of issues, and it was 
exclusive to the 8TB OSDs. I'm not sure how that would cause such a problem, 
but it's an interesting data point.

On Thu, 2017-03-30 at 17:33 +0100, Nick Fisk wrote:
Hi Steve,

If you can recreate or if you can remember the object name, it might be worth 
trying to run “ceph osd map” on the objects and see where it thinks they map 
to. And/or maybe pg query might show something?

Nick




[cid:1490900827.2469.72.camel@storagecraft.com]<https://storagecraft.com>   
Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.





[cid:imagef0e9d2.JPG@294fd64f.4893a633]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: 30 March 2017 16:24
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Question about unfound objects

We've had a couple of puzzling experiences recently with unfound
objects, and I wonder if anyone can shed some light.

This happened with Ham

Re: [ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
Good suggestion, Nick. I actually did that at the time. The "ceph osd map" 
wasn't all that interesting because the OSDs had been outed and their PGs had 
been mapped to new OSDs. Everything appeared to be in order with the PGs being 
mapped to the right number of new OSDs. The PG mappings looked fine, but the 
objects just didn't exist anywhere except on the OSDs that had been marked out.

The PG queries were a little more useful, but still didn't really help in the 
end. In all cases (unfound objects from 2 OSDs in each of 2 occurrences), the 
PGs showed 5 or so OSDs where they thought the unfound objects might be, one of 
which was an OSD that had been marked out. In both cases we even waited until 
backfilling completed to see if perhaps the missing objects would turn up 
somewhere else, but none ever did.

In the first instance we were simply able to reattach the 2 OSDs to the cluster 
with 0 weight and recover the unfound objects. The second instance involved 
drive problems and was a little bit trickier. The drives had experienced errors 
and the XFS filesystems had both become corrupt and wouldn't even mount. We 
didn't have any spare drives large enough, so I ended up using dd, ignoring 
errors, to copy the disks to RBDs in a different Ceph cluster. I then kernel 
mapped the RBDs on the host with the failed drives, ran XFS repairs on them, 
mouted them to the OSD directories, started the OSDs, and put them back in the 
cluster with 0 weight. I was lucky enough that those objects were available and 
they were recovered. Of course I immediately removed those OSDs once the 
unfound objects cleared up.

That's the other intersting aspect of this problem. This cluster had 4TB HGST 
drives for its OSDs, but we had to expand it fairly urgently and didn't have 
enough drives. We added two new servers, each with 16 4TB drives and 16 8TB 
HGST He8 drives. In both instances the problems we encountered were with the 
8TB drives. We have since acquired more 4TB drives and have replaced all of the 
8TB drives in the cluster. We have a total of 8 production clusters globally 
and have been running Ceph in production for 2 years. These two occurences 
recently are the only times we've seen these types of issues, and it was 
exclusive to the 8TB OSDs. I'm not sure how that would cause such a problem, 
but it's an interesting data point.

On Thu, 2017-03-30 at 17:33 +0100, Nick Fisk wrote:
Hi Steve,

If you can recreate or if you can remember the object name, it might be worth 
trying to run “ceph osd map” on the objects and see where it thinks they map 
to. And/or maybe pg query might show something?

Nick




[cid:imagec0161b.JPG@d2cd1459.4ebbf9d5]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: 30 March 2017 16:24
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Question about unfound objects

We've had a couple of puzzling experiences recently with unfound
objects, and I wonder if anyone can shed some light.

This happened with Hammer 0.94.7 on a cluster with 1,309 OSDs. Our use
case is exclusively RBD in this cluster, so it's naturally replicated.
The rbd pool size is 3, min_size is 2. The crush map is flat, so each
host is a failure domain. The OSD hosts are 4U Supermicro chassis with
32 OSDs each. Drive failures have caused the OSD count to be 1,309
instead of 1,312.

Twice in the last few weeks we've experienced issues where the cluster
was HEALTH_OK but was frequently getting some blocked requests. In each
of the two occurrences we investigated and discovered that the blocked
requests resulted from two drives in the same host that were
misbehaving (different set of 2 drives in each occurrence). We decided
to remove the misbehaving OSDs and let things backfill to see if that
would address the issue. Removing the drives resulted in a small number
of unfound objects, which was surprising. We were able to add the OSDs
back with 0 weight and recover the unfound objects in both cases, but
removing two OSDs from a single failure domain shouldn't have resulted
in unfound objects in an otherwise healthy cluster, correct?

[cid:1490894021.2469.65.camel@storagecraft.com]<http://xo4t.mj.am/lnk/ADsAAHBLtsEAAF3gdq4AADNJBWwAAACRXwBY3TNPpyGQvpKrR9qnYfGowzXSBwAAlBI/1/9Putges-Yax4GeLa0aybAg/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>

Steve Taylor | Senior Software Engine

[ceph-users] Question about unfound objects

2017-03-30 Thread Steve Taylor
We've had a couple of puzzling experiences recently with unfound
objects, and I wonder if anyone can shed some light.

This happened with Hammer 0.94.7 on a cluster with 1,309 OSDs. Our use
case is exclusively RBD in this cluster, so it's naturally replicated.
The rbd pool size is 3, min_size is 2. The crush map is flat, so each
host is a failure domain. The OSD hosts are 4U Supermicro chassis with
32 OSDs each. Drive failures have caused the OSD count to be 1,309
instead of 1,312.

Twice in the last few weeks we've experienced issues where the cluster
was HEALTH_OK but was frequently getting some blocked requests. In each
of the two occurrences we investigated and discovered that the blocked
requests resulted from two drives in the same host that were
misbehaving (different set of 2 drives in each occurrence). We decided
to remove the misbehaving OSDs and let things backfill to see if that
would address the issue. Removing the drives resulted in a small number
of unfound objects, which was surprising. We were able to add the OSDs
back with 0 weight and recover the unfound objects in both cases, but
removing two OSDs from a single failure domain shouldn't have resulted
in unfound objects in an otherwise healthy cluster, correct?



[cid:image575d42.JPG@8ddd3310.40afc06a]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] add multiple OSDs to cluster

2017-03-21 Thread Steve Taylor
Generally speaking, you are correct. Adding more OSDs at once is more
efficient than adding fewer at a time.

That being said, do so carefully. We typically add OSDs to our clusters
either 32 or 64 at once, and we have had issues on occasion with bad
drives. It's common for us to have a drive or two go bad within 24
hours or so of adding them to Ceph, and if multiple drives fail in
multiple failure domains within a short amount of time, bad things can
happen. The efficient, safe approach is to add as many drives as
possible within a single failure domain, wait for recovery, and repeat.

On Tue, 2017-03-21 at 19:56 +0100, mj wrote:
> Hi,
>
> Just a quick question about adding OSDs, since most of the docs I
> can
> find talk about adding ONE OSD, and I'd like to add four per server
> on
> my three-node cluster.
>
> This morning I tried the careful approach, and added one OSD to
> server1.
> It all went fine, everything rebuilt and I have a HEALTH_OK again
> now.
> It took around 7 hours.
>
> But now I started thinking... (and that's when things go wrong,
> therefore hoping for feedback here)
>
> The question: was I being stupid to add only ONE osd to the server1?
> Is
> it not smarter to add all four OSDs at the same time?
>
> I mean: things will rebuild anyway...and I have the feeling that
> rebuilding from 4 -> 8 OSDs is not going to be much heavier than
> rebuilding from 4 -> 5 OSDs. Right?
>
> So better add all new OSDs together on a specific server?
>
> Or not? :-)
>
> MJ
>

________

[cid:imagec7a5fc.JPG@dc945914.44a32fb5]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] KVM/QEMU rbd read latency

2017-02-16 Thread Steve Taylor
You might try running fio directly on the host using the rbd ioengine (direct 
librbd) and see how that compares. The major difference between that and the 
krbd test will be the page cache readahead, which will be present in the krbd 
stack but not with the rbd ioengine. I would have expected the guest OS to 
normalize that some due to its own page cache in the librbd test, but that 
might at least give you some more clues about where to look further.




[cid:imagea0af4f.JPG@e3d04a1e.44ace3d9]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Phil 
Lacroute
Sent: Thursday, February 16, 2017 11:54 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] KVM/QEMU rbd read latency

Hi,

I am doing some performance characterization experiments for ceph with KVM 
guests, and I’m observing significantly higher read latency when using the QEMU 
rbd client compared to krbd.  Is that expected or have I missed some tuning 
knobs to improve this?

Cluster details:
Note that this cluster was built for evaluation purposes, not production, hence 
the choice of small SSDs with low endurance specs.
Client host OS: Debian, 4.7.0 kernel
QEMU version 2.7.0
Ceph version Jewel 10.2.3
Client and OSD CPU: Xeon D-1541 2.1 GHz
OSDs: 5 nodes, 3 SSDs each, one journal partition and one data partition per 
SSD, XFS data file system (15 OSDs total)
Disks: DC S3510 240GB
Network: 10 GbE, dedicated switch for storage traffic Guest OS: Debian, virtio 
drivers

Performance testing was done with fio on raw disk devices using this config:
ioengine=libaio
iodepth=128
direct=1
size=100%
rw=randread
bs=4k

Case 1: krbd, fio running on the raw rbd device on the client host (no guest)
IOPS: 142k
Average latency: 0.9 msec

Case 2: krbd, fio running in a guest (libvirt config below)
   
 
 
 
 
   
IOPS: 119k
Average Latency: 1.1 msec

Case 3: QEMU RBD client, fio running in a guest (libvirt config below)
   
 
 
   
 
 
 
   
IOPS: 25k
Average Latency: 5.2 msec

The question is why the test with the QEMU RBD client (case 3) shows 4 msec of 
additional latency compared the guest using the krbd-mapped image (case 2).

Note that the IOPS bottleneck for all of these cases is the rate at which the 
client issues requests, which is limited by the average latency and the maximum 
number of outstanding requests (128).  Since the latency is the dominant factor 
in average read throughput for these small accesses, we would really like to 
understand the source of the additional latency.

Thanks,
Phil



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
Thanks, Nick.

One other data point that has come up is that nearly all of the blocked 
requests that are waiting on subops are waiting for OSDs with more PGs than the 
others. My test cluster has 184 OSDs, 177 of which are 3TB, with 7 4TB OSDs. 
The cluster is well balanced based on OSD capacity, so those 7 OSDs 
individually have 33% more PGs than the others and are causing almost all of 
the blocked requests. It appears that maps updates are generally not blocking 
long enough to show up as blocked requests.

I set the reweight on those 7 OSDs to 0.75 and things are backfilling now. I’ll 
test some more when the PG counts per OSD are more balanced and see what I get. 
I’ll also play with the filestore queue. I was telling some of my colleagues 
yesterday that this looked likely to be related to buffer bloat somewhere. I 
appreciate the suggestion.




[cid:image1bd943.JPG@b026bd80.43945ba2]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


From: Nick Fisk [mailto:n...@fisk.me.uk]
Sent: Tuesday, February 7, 2017 10:25 AM
To: Steve Taylor <steve.tay...@storagecraft.com>; ceph-users@lists.ceph.com
Subject: RE: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Hi Steve,

From what I understand, the issue is not with the queueing in Ceph, which is 
correctly moving client IO to the front of the queue. The problem lies below 
what Ceph controls, ie the scheduler and disk layer in Linux. Once the IO’s 
leave Ceph it’s a bit of a free for all and the client IO’s tend to get lost in 
large disk queues surrounded by all the snap trim IO’s.

The workaround Sam is working on will limit the amount of snap trims that are 
allowed to run, which I believe will have a similar effect to the sleep 
parameters in pre-jewel clusters, but without pausing the whole IO thread.

Ultimately the solution requires Ceph to be able to control the queuing of IO’s 
at the lower levels of the kernel. Whether this is via some sort of tagging per 
IO (currently CFQ is only per thread/process) or some other method, I don’t 
know. I was speaking to Sage and he thinks the easiest method might be to 
shrink the filestore queue so that you don’t get buffer bloat at the disk 
level. You should be able to test this out pretty easily now by changing the 
parameter, probably around a queue of 5-10 would be about right for spinning 
disks. It’s a trade off of peak throughput vs queue latency though.

Nick

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: 07 February 2017 17:01
To: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

As I look at more of these stuck ops, it looks like more of them are actually 
waiting on subops than on osdmap updates, so maybe there is still some headway 
to be made with the weighted priority queue settings. I do see OSDs waiting for 
map updates all the time, but they aren’t blocking things as much as the subops 
are. Thoughts?


[cid:image001.jpg@01D28146.3CD2FDC0]<http://xo4t.mj.am/lnk/AEAAHdX_NV8AAF3gdq4AADNJBWwAAACRXwBYmgL2v2Jjr_O-R2O240JbYsyYegAAlBI/1/octhy6gsul-9GJY5LCpcaA/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>

Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<http://xo4t.mj.am/lnk/AEAAHdX_NV8AAF3gdq4AADNJBWwAAACRXwBYmgL2v2Jjr_O-R2O240JbYsyYegAAlBI/2/tEMD834dug8FiYlzBdnDDg/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |


If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

________
From: Steve Taylor
Sent: Tuesday, February 7, 2017 9:13 AM
To: 'ceph-users@lists.ceph.com' 
<ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Sorry, I lost the previous thread on this. I apologize for the resulting 
incomplete reply.

The issue that we’re having with Jewel, as David Turner mentioned, is that we 
can’t seem to throttle snap trimming sufficiently to prevent it from blocking 
I/O requests. On further investigation, I encountered 
osd_op_pq_max_tokens_per_priority,

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
As I look at more of these stuck ops, it looks like more of them are actually 
waiting on subops than on osdmap updates, so maybe there is still some headway 
to be made with the weighted priority queue settings. I do see OSDs waiting for 
map updates all the time, but they aren’t blocking things as much as the subops 
are. Thoughts?




[cid:image99464a.JPG@898dfa11.4e81d597]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.

____
From: Steve Taylor
Sent: Tuesday, February 7, 2017 9:13 AM
To: 'ceph-users@lists.ceph.com' <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Sorry, I lost the previous thread on this. I apologize for the resulting 
incomplete reply.

The issue that we’re having with Jewel, as David Turner mentioned, is that we 
can’t seem to throttle snap trimming sufficiently to prevent it from blocking 
I/O requests. On further investigation, I encountered 
osd_op_pq_max_tokens_per_priority, which should be able to be used in 
conjunction with ‘osd_op_queue = wpq’ to govern the availability of queue 
positions for various operations using costs if I understand correctly. I’m 
testing with RBDs using 4MB objects, so in order to leave plenty of room in the 
weighted priority queue for client I/O, I set osd_op_pq_max_tokens_per_priority 
to 64MB and osd_snap_trim_cost to 32MB+1. I figured this should essentially 
reserve 32MB in the queue for client I/O operations, which are prioritized 
higher and therefore shouldn’t get blocked.

I still see blocked I/O requests, and when I dump in-flight ops, they show ‘op 
must wait for map.’ I assume this means that what’s blocking the I/O requests 
at this point is all of the osdmap updates caused by snap trimming, and not the 
actual snap trimming itself starving the ops of op threads. Hammer is able to 
mitigate this with osd_snap_trim_sleep by directly throttling snap trimming and 
therefore causing less frequent osdmap updates, but there doesn’t seem to be a 
good way to accomplish the same thing with Jewel.

First of all, am I understanding these settings correctly? If so, are there 
other settings that could potentially help here, or do we just need something 
like Sam already mentioned that can sort of reserve threads for client I/O 
requests? Even then it seems like we might have issues if we can’t also 
throttle snap trimming. We delete a LOT of RBD snapshots on a daily basis, 
which we recognize is an extreme use case. Just wondering if there’s something 
else to try or if we need to start working toward implementing something new 
ourselves to handle our use case better.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

2017-02-07 Thread Steve Taylor
Sorry, I lost the previous thread on this. I apologize for the resulting 
incomplete reply.

The issue that we’re having with Jewel, as David Turner mentioned, is that we 
can’t seem to throttle snap trimming sufficiently to prevent it from blocking 
I/O requests. On further investigation, I encountered 
osd_op_pq_max_tokens_per_priority, which should be able to be used in 
conjunction with ‘osd_op_queue = wpq’ to govern the availability of queue 
positions for various operations using costs if I understand correctly. I’m 
testing with RBDs using 4MB objects, so in order to leave plenty of room in the 
weighted priority queue for client I/O, I set osd_op_pq_max_tokens_per_priority 
to 64MB and osd_snap_trim_cost to 32MB+1. I figured this should essentially 
reserve 32MB in the queue for client I/O operations, which are prioritized 
higher and therefore shouldn’t get blocked.

I still see blocked I/O requests, and when I dump in-flight ops, they show ‘op 
must wait for map.’ I assume this means that what’s blocking the I/O requests 
at this point is all of the osdmap updates caused by snap trimming, and not the 
actual snap trimming itself starving the ops of op threads. Hammer is able to 
mitigate this with osd_snap_trim_sleep by directly throttling snap trimming and 
therefore causing less frequent osdmap updates, but there doesn’t seem to be a 
good way to accomplish the same thing with Jewel.

First of all, am I understanding these settings correctly? If so, are there 
other settings that could potentially help here, or do we just need something 
like Sam already mentioned that can sort of reserve threads for client I/O 
requests? Even then it seems like we might have issues if we can’t also 
throttle snap trimming. We delete a LOT of RBD snapshots on a daily basis, 
which we recognize is an extreme use case. Just wondering if there’s something 
else to try or if we need to start working toward implementing something new 
ourselves to handle our use case better.



[cid:imagef15e00.JPG@e8bcd715.4a89bd4c]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ***Suspected Spam*** dm-crypt journal replacement

2017-01-25 Thread Steve Taylor
No need to re-create the osd.

The easiest way to replace the journal is by creating the new journal partition 
with the same partition guid. You can use 'sgdisk -n :: --change-name=":ceph journal" --partition-guid=: --typecode=:45b0969e-9b03-4f30-b4c6-5ec00ceff106 ' to create the new 
journal partition. You can get the partition guid of the failed journal via 
'cat /var/lib/ceph/osd//journal_uuid' if you don't have it 
already.

Once your partition is created correctly, dmcrypt should be able to map it 
using the existing key from the old journal. Then the journal needs to be 
initialized via 'ceph-osd -i  --mkjournal' and you should be able 
to start the osd at that point.

If you can't or don't want to reuse the existing partition guid with its 
associated dmcrypt key, you can follow the same procedure to create the journal 
partition using a new partition guid of your choice, but then you have to 
generate a dmcrypt key with something like 'dd bs= 
count=1 if=/dev/urandom of=/etc/ceph/dmcrypt-keys/' and then 
create the dmcrypt volume with 'cryptsetup --key-file 
/etc/ceph/dmcrypt-keys/ --key-size  create 
' to get the encrypted journal device. Then you have to 
replace the 'journal' and 'journal_dmcrypt' symlinks in /var/lib/ceph/ and write the new partition guid to the 'journal_uuid' file in the 
same directory. You still have to perform the --mkjournal with ceph-osd, and 
you should be good to go.




[cid:image9f5ad9.JPG@cc8e4767.4394994f]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Nikolay Khramchikhin
Sent: Wednesday, January 25, 2017 6:50 AM
To: ceph-users@lists.ceph.com
Subject: ***Suspected Spam*** [ceph-users] dm-crypt journal replacement

 Hello, folks,


 Can someone share the procedure of replacement failed journal deployed with 
"ceph-deploy  disk prepare --dm-crypt" ? Can`t find at docs anything about it. 
Is there only way - recreation of osd?

 Ceph Jewel 10.2.5


--
 Regards,

 Nikolay Khramchikhin,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 10.2.4 Jewel released

2016-12-07 Thread Steve Taylor
I'm seeing the same behavior with very similar perf top output. One server with 
32 OSDs has a load average approaching 800. No excessive memory usage and no 
iowait at all.




[cid:imagea8a69a.JPG@f4e62cf1.419383aa]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ruben 
Kerkhof
Sent: Wednesday, December 7, 2016 3:08 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 10.2.4 Jewel released

On Wed, Dec 7, 2016 at 8:46 PM, Francois Lafont 
<francois.lafont.1...@gmail.com> wrote:
> Hi,
>
> On 12/07/2016 01:21 PM, Abhishek L wrote:
>
>> This point release fixes several important bugs in RBD mirroring, RGW
>> multi-site, CephFS, and RADOS.
>>
>> We recommend that all v10.2.x users upgrade. Also note the following
>> when upgrading from hammer
>
> Well... little warning: after upgrade from 10.2.3 to 10.2.4, I have big load 
> cpu on osd and mds.

Yes, same here. perf top shows:

  8.23%  [kernel]  [k] sock_recvmsg
  8.16%  libpthread-2.17.so[.] __libc_recv
  7.33%  [kernel]  [k] fget_light
  7.24%  [kernel]  [k] tcp_recvmsg
  6.41%  [kernel]  [k] sock_has_perm
  6.19%  [kernel]  [k] _raw_spin_lock_bh
  4.89%  [kernel]  [k] system_call
  4.74%  [kernel]  [k] avc_has_perm_flags
  3.93%  [kernel]  [k] SYSC_recvfrom
  3.18%  [kernel]  [k] fput
  3.15%  [kernel]  [k] system_call_after_swapgs
  3.12%  [kernel]  [k] local_bh_enable_ip
  3.11%  [kernel]  [k] release_sock
  2.90%  libpthread-2.17.so[.] __pthread_enable_asynccancel
  2.71%  libpthread-2.17.so[.] __pthread_disable_asynccancel
  2.57%  [kernel]  [k] inet_recvmsg
  2.43%  [kernel]  [k] local_bh_enable
  2.16%  [kernel]  [k] local_bh_disable
  2.03%  [kernel]  [k] tcp_cleanup_rbuf
  1.44%  [kernel]  [k] sockfd_lookup_light
  1.26%  [kernel]  [k] _raw_spin_unlock
  1.20%  [kernel]  [k] sysret_check
  1.18%  [kernel]  [k] lock_sock_nested
  1.07%  [kernel]  [k] selinux_socket_recvmsg
  0.98%  [kernel]  [k] _raw_spin_unlock_bh
  0.97%  ceph-osd  [.] Pipe::do_recv
  0.87%  [kernel]  [k] _cond_resched
  0.73%  [kernel]  [k] tcp_release_cb
  0.52%  [kernel]  [k] security_socket_recvmsg

Kind regards,

Ruben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-30 Thread Steve Taylor
I also should have mentioned that you’ll naturally have to remount your OSD 
filestores once you’ve made the change to ceph.conf. You can either restart 
each OSD after making the config file change or simply use the mount command 
yourself with the remount option to add the allocsize option live to each OSD’s 
filestore mount point.




[cid:image71f234.JPG@2c6ee238.46ab8bf6]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steve 
Taylor
Sent: Wednesday, November 30, 2016 8:50 AM
To: Thomas Bennett <tho...@ska.ac.za>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs = 
rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so XFS 
allocates 1M blocks for new files. That only affected new files, so manual 
defragmentation was still necessary to clean up older data, but once that was 
done everything got better and stayed better.

You can use the xfs_db command to check fragmentation on an XFS volume and 
xfs_fsr to perform a defragmentation. The defragmentation can run on a mounted 
filesystem too, so you don’t even have to rely on Ceph to avoid downtime. I 
probably wouldn’t run it everywhere at once though for performance reasons. A 
single OSD at a time would be ideal, but that’s a matter of preference.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Bennett
Sent: Wednesday, November 30, 2016 5:58 AM
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

Hi Kate and Steve,

Thanks for the replies. Always good to hear back from a community :)

I'm using Linux on x86_64 architecture and the block size is limited to the 
page size which is 4k. So it looks like I'm hitting hard limits in any changes. 
to increase the block size.

I found this out by running the following command:

$ mkfs.xfs -f -b size=8192 /dev/sda1

$ mount -v /dev/sda1 /tmp/disk/
mount: Function not implemented #huh???

Checking out the man page:

$ man mkfs.xfs
 -b block_size_options
  ... XFS  on  Linux  currently  only  supports pagesize or smaller blocks.

I'm hesitant to implement btrfs as its still experimental and ext4 seems to 
have the same current limitation.

Our current approach is to exclude the hard drive that we're getting the poor 
read rates from our procurement process, but it would still be nice to find out 
how much control we have over how ceph-osd  daemons read from the drives. I may 
attempts a strace on an osd daemon as we read to see what the actual read 
request size is being asked to the kernel.

Cheers,
Tom


On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor 
<steve.tay...@storagecraft.com<mailto:steve.tay...@storagecraft.com>> wrote:
We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M 
blocks) due to massive fragmentation in our filestores a while back. We were 
having to defrag all the time and cluster performance was noticeably degraded. 
We also create and delete lots of RBD snapshots on a daily basis, so that 
likely contributed to the fragmentation as well. It’s been MUCH better since we 
switched XFS to use 1M allocations. Virtually no fragmentation and performance 
is consistently good.

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Kate Ward
Sent: Tuesday, November 29, 2016 2:02 PM
To: Thomas Bennett <tho...@ska.ac.za<mailto:tho...@ska.ac.za>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

I have no experience with XFS, but wouldn't expect poor behaviour with it. I 
use ZFS myself and know that it would combine writes, but btrfs might be an 
option.

Do you know what block size was used to create the XFS filesystem? It looks 
like 4k is the default (reasonable) with a max of 64k. Perhaps a larger block 
size will give better performance for your particular use case. (I use a 1M 
block size with ZFS.)
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch04s02.html


On Tue, Nov 29, 2016 at 10:23 AM Thomas Bennett 
<tho...@ska.ac.za<ma

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-30 Thread Steve Taylor
We’re using Ubuntu 14.04 on x86_64. We just added ‘osd mount options xfs = 
rw,noatime,inode64,allocsize=1m’ to the [osd] section of our ceph.conf so XFS 
allocates 1M blocks for new files. That only affected new files, so manual 
defragmentation was still necessary to clean up older data, but once that was 
done everything got better and stayed better.

You can use the xfs_db command to check fragmentation on an XFS volume and 
xfs_fsr to perform a defragmentation. The defragmentation can run on a mounted 
filesystem too, so you don’t even have to rely on Ceph to avoid downtime. I 
probably wouldn’t run it everywhere at once though for performance reasons. A 
single OSD at a time would be ideal, but that’s a matter of preference.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Thomas 
Bennett
Sent: Wednesday, November 30, 2016 5:58 AM
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

Hi Kate and Steve,

Thanks for the replies. Always good to hear back from a community :)

I'm using Linux on x86_64 architecture and the block size is limited to the 
page size which is 4k. So it looks like I'm hitting hard limits in any changes. 
to increase the block size.

I found this out by running the following command:

$ mkfs.xfs -f -b size=8192 /dev/sda1

$ mount -v /dev/sda1 /tmp/disk/
mount: Function not implemented #huh???

Checking out the man page:

$ man mkfs.xfs
 -b block_size_options
  ... XFS  on  Linux  currently  only  supports pagesize or smaller blocks.

I'm hesitant to implement btrfs as its still experimental and ext4 seems to 
have the same current limitation.

Our current approach is to exclude the hard drive that we're getting the poor 
read rates from our procurement process, but it would still be nice to find out 
how much control we have over how ceph-osd  daemons read from the drives. I may 
attempts a strace on an osd daemon as we read to see what the actual read 
request size is being asked to the kernel.

Cheers,
Tom


On Tue, Nov 29, 2016 at 11:53 PM, Steve Taylor 
<steve.tay...@storagecraft.com<mailto:steve.tay...@storagecraft.com>> wrote:
We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M 
blocks) due to massive fragmentation in our filestores a while back. We were 
having to defrag all the time and cluster performance was noticeably degraded. 
We also create and delete lots of RBD snapshots on a daily basis, so that 
likely contributed to the fragmentation as well. It’s been MUCH better since we 
switched XFS to use 1M allocations. Virtually no fragmentation and performance 
is consistently good.

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Kate Ward
Sent: Tuesday, November 29, 2016 2:02 PM
To: Thomas Bennett <tho...@ska.ac.za<mailto:tho...@ska.ac.za>>
Cc: ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

I have no experience with XFS, but wouldn't expect poor behaviour with it. I 
use ZFS myself and know that it would combine writes, but btrfs might be an 
option.

Do you know what block size was used to create the XFS filesystem? It looks 
like 4k is the default (reasonable) with a max of 64k. Perhaps a larger block 
size will give better performance for your particular use case. (I use a 1M 
block size with ZFS.)
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch04s02.html


On Tue, Nov 29, 2016 at 10:23 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi Kate,

Thanks for your reply. We currently use xfs as created by ceph-deploy.

What would you recommend we try?

Kind regards,
Tom


On Tue, Nov 29, 2016 at 11:14 AM, Kate Ward 
<kate.w...@forestent.com<mailto:kate.w...@forestent.com>> wrote:
What filesystem do you use on the OSD? Have you considered a different 
filesystem that is better at combining requests before they get to the drive?

k8

On Tue, Nov 29, 2016 at 9:52 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi,

We have a use case where we are reading 128MB objects off spinning disks.

We've benchmarked a number of different hard drive and have noticed that for a 
particular hard drive, we're experiencing slow reads by comparison.

This occurs when we have multiple readers (even just 2) reading objects off the 
OSD.

We've recreated the effect using iozone and have noticed that once the record 
size drops to 4k, the hard drive miss behaves.

Is there a setting on Ceph that we can change to fix the minimum read size when 
the ceph-osd daemon reads the object of the hard drives, to see if we can 
overcome the overall slow read rate.

Cheers,
Tom
____
[cid:image001.jpg@01D24AE5.51191450]<https:

Re: [ceph-users] Is there a setting on Ceph that we can use to fix the minimum read size?

2016-11-29 Thread Steve Taylor
We configured XFS on our OSDs to use 1M blocks (our use case is RBDs with 1M 
blocks) due to massive fragmentation in our filestores a while back. We were 
having to defrag all the time and cluster performance was noticeably degraded. 
We also create and delete lots of RBD snapshots on a daily basis, so that 
likely contributed to the fragmentation as well. It’s been MUCH better since we 
switched XFS to use 1M allocations. Virtually no fragmentation and performance 
is consistently good.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Kate 
Ward
Sent: Tuesday, November 29, 2016 2:02 PM
To: Thomas Bennett <tho...@ska.ac.za>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Is there a setting on Ceph that we can use to fix the 
minimum read size?

I have no experience with XFS, but wouldn't expect poor behaviour with it. I 
use ZFS myself and know that it would combine writes, but btrfs might be an 
option.

Do you know what block size was used to create the XFS filesystem? It looks 
like 4k is the default (reasonable) with a max of 64k. Perhaps a larger block 
size will give better performance for your particular use case. (I use a 1M 
block size with ZFS.)
http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch04s02.html


On Tue, Nov 29, 2016 at 10:23 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi Kate,

Thanks for your reply. We currently use xfs as created by ceph-deploy.

What would you recommend we try?

Kind regards,
Tom


On Tue, Nov 29, 2016 at 11:14 AM, Kate Ward 
<kate.w...@forestent.com<mailto:kate.w...@forestent.com>> wrote:
What filesystem do you use on the OSD? Have you considered a different 
filesystem that is better at combining requests before they get to the drive?

k8

On Tue, Nov 29, 2016 at 9:52 AM Thomas Bennett 
<tho...@ska.ac.za<mailto:tho...@ska.ac.za>> wrote:
Hi,

We have a use case where we are reading 128MB objects off spinning disks.

We've benchmarked a number of different hard drive and have noticed that for a 
particular hard drive, we're experiencing slow reads by comparison.

This occurs when we have multiple readers (even just 2) reading objects off the 
OSD.

We've recreated the effect using iozone and have noticed that once the record 
size drops to 4k, the hard drive miss behaves.

Is there a setting on Ceph that we can change to fix the minimum read size when 
the ceph-osd daemon reads the object of the hard drives, to see if we can 
overcome the overall slow read rate.

Cheers,
Tom



[cid:image5f646a.JPG@1e0ce342.4f8bc00f]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Thomas Bennett

SKA South Africa
Science Processing Team

Office: +27 21 5067341<tel:+27%2021%20506%207341>
Mobile: +27 79 5237105<tel:+27%2079%20523%207105>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Out-of-date RBD client libraries

2016-10-25 Thread Steve Taylor
CRUSH is what determines where data gets stored, so if you employ newer CRUSH 
tunables prematurely against older clients that don’t support them, then you 
run the risk of your clients not being able to find nor place objects 
correctly. I don’t know Ceph’s internals well enough to tell you all of what 
might result at a lower level from such a scenario, but clients not knowing 
where data belongs seems bad enough. I wouldn’t necessarily expect data loss, 
but potentially a lot of client errors.

From: jdavidli...@gmail.com [mailto:jdavidli...@gmail.com] On Behalf Of J David
Sent: Tuesday, October 25, 2016 1:27 PM
To: Steve Taylor <steve.tay...@storagecraft.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Out-of-date RBD client libraries


On Tue, Oct 25, 2016 at 3:10 PM, Steve Taylor 
<steve.tay...@storagecraft.com<mailto:steve.tay...@storagecraft.com>> wrote:
Recently we tested an upgrade from 0.94.7 to 10.2.3 and found exactly the 
opposite. Upgrading the clients first worked for many operations, but we got 
"function not implemented" errors when we would try to clone RBD snapshots.

Yes, we have seen “function not implemented” in the past as well when 
connecting new clients to old clusters.

you must keep your CRUSH tunables at firefly or hammer until the clients are 
upgraded.

Not that I am proposing to try it, but… or else what?

Whatever the “or else!” is, the same would apply, I assume, to connecting old 
clients to a brand-new jewel cluster which would have been created with jewel 
tunables in the first place?

Thanks!




[cid:image8cec56.JPG@f605432c.4b8508fe]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Out-of-date RBD client libraries

2016-10-25 Thread Steve Taylor
We tested an upgrade from 0.94.3 to 0.94.7 and experienced issues when the 
librbd clients were not upgraded first in the process. It was a while back and 
I don't remember the specific issues, but upgrading the clients prior to 
upgrading any services worked in that case.

Recently we tested an upgrade from 0.94.7 to 10.2.3 and found exactly the 
opposite. Upgrading the clients first worked for many operations, but we got 
"function not implemented" errors when we would try to clone RBD snapshots. We 
re-tested that upgrade with the clients being upgraded after all of the 
services and everything worked fine for us in that case. The caveat there is 
that you must keep your CRUSH tunables at firefly or hammer until the clients 
are upgraded.

At any rate, we've had different experiences upgrading the clients at different 
points in the process depending on the releases involved. The key is to test 
first and make sure you have a sane upgrade path before doing anything in 
production.




[cid:imagebeeb2c.JPG@5541413f.4f9d6fa0]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of J David
Sent: Tuesday, October 25, 2016 12:46 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Out-of-date RBD client libraries

What are the potential consequences of using out-of-date client libraries with 
RBD against newer clusters?

Specifically, what are the potential ill-effects of using Firefly client 
libraries (0.80.7 and 0.80.8) to access Hammer or Jewel
(10.2.3) clusters?

The upgrading instructions (
http://docs.ceph.com/docs/jewel/install/upgrading-ceph/ ) don’t actually 
mention clients, just giving the recommended order as:
ceph-deploy, mons, osds, mds, object gateways.

Are long-running RBD clients (like Qemu virtual machines) placed at risk of 
instability or data corruption if they are not updated and restarted before, 
during, or after such an upgrade?

If so, what are the potential consequences, and where in the process should 
they be upgraded to avoid those consequences?

Thanks for any advice!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph consultants?

2016-10-05 Thread Steve Taylor
Try using 'ceph-deploy osd create' instead of 'ceph-deploy osd prepare' and 
'ceph-deploy osd activate' when using an entire disk for an OSD. That will 
create a journal partition and co-locate your journal on the same disk with the 
OSD, but that's fine for an initial dev setup.




[cid:imageaf5f23.JPG@182b1064.43828019]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Tracy 
Reed
Sent: Wednesday, October 5, 2016 3:12 PM
To: Peter Maloney <peter.malo...@brockmann-consult.de>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph consultants?

On Wed, Oct 05, 2016 at 01:17:52PM PDT, Peter Maloney spake thusly:
> What do you need help with specifically? Setting up ceph isn't very
> complicated... just fixing it when things go wrong should be. What
> type of scale are you working with, and do you already have hardware?
> Or is the problem more to do with integrating it with clients?

Hi Peter,

I agree, setting up Ceph isn't very complicated. I posted to the list on
10/03/16 with the initial problem I have run into under the subject "Can't 
activate OSD". Please refer to that thread as it has logs, details of my setup, 
etc.

I started working on this about a month ago then spent several days on it and a 
few hours with a couple different people on IRC. Nobody has been able to figure 
out how to get my OSD activated. I took a couple weeks off and now I'm back at 
it as I really need to get this going soon.

Basically, I'm following the quickstart guide at 
http://docs.ceph.com/docs/jewel/start/quick-ceph-deploy/ and when I run the 
command to activate the OSDs like so:

ceph-deploy osd activate ceph02:/dev/sdc ceph03:/dev/sdc

I get this in the ceph-deploy log:

[2016-10-03 15:16:10,193][ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 
7.2.1511 Core
[2016-10-03 15:16:10,193][ceph_deploy.osd][DEBUG ] activating host ceph03 disk 
/dev/sdc
[2016-10-03 15:16:10,193][ceph_deploy.osd][DEBUG ] will use init type: systemd
[2016-10-03 15:16:10,194][ceph03][DEBUG ] find the location of an executable
[2016-10-03 15:16:10,200][ceph03][INFO  ] Running command: sudo 
/usr/sbin/ceph-disk -v activate --mark-init systemd --mount /dev/sdc
[2016-10-03 15:16:10,377][ceph03][WARNING] main_activate: path = /dev/sdc
[2016-10-03 15:21:10,380][ceph03][WARNING] No data was received after 300 
seconds, disconnecting...
[2016-10-03 15:21:15,387][ceph03][INFO  ] checking OSD status...
[2016-10-03 15:21:15,401][ceph03][DEBUG ] find the location of an executable
[2016-10-03 15:21:15,472][ceph03][INFO  ] Running command: sudo /bin/ceph 
--cluster=ceph osd stat --format=json
[2016-10-03 15:21:15,698][ceph03][INFO  ] Running command: sudo systemctl 
enable ceph.target

More details in other thread.

Where am I going wrong here?

Thanks!

--
Tracy Reed
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Steve Taylor
I think it's a maximum of 30 maps per osdmap update. So if you've got huge 
caches like we had, then you might have to generate a lot of updates to get 
things squared away. That's what I did, and it worked really well.



[cid:image0a6b59.JPG@80784c1e.4796a4c3]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: Dan Van Der Ster [daniel.vanders...@cern.ch]
Sent: Wednesday, September 14, 2016 7:21 AM
To: Steve Taylor
Cc: ceph-us...@ceph.com
Subject: Re: Cleanup old osdmaps after #13990 fix applied

Hi Steve,

Thanks, that sounds promising.
Are only a limited number of maps trimmed for each new osdmap generated? If so, 
I'll generate a bit of churn to get these cleaned up.

-- Dan


> On 14 Sep 2016, at 15:08, Steve Taylor <steve.tay...@storagecraft.com> wrote:
>
> http://tracker.ceph.com/issues/13990 was created by a colleague of mine from 
> an issue that was affecting us in production. When 0.94.8 was released with 
> the fix, I immediately deployed a test cluster on 0.94.7, reproduced this 
> issue, upgraded to 0.94.8, and tested the fix. It worked beautifully.
>
> I suspect the issue you're seeing is that the clean-up only occurs when new 
> osdmaps are generated, so as long as nothing is changing you'll continue to 
> see lots of stale maps cached. We delete RBD snapshots all the time in our 
> production use case, which updates the osdmap, so I did that in my test 
> cluster and watched the map cache on one of the OSDs. Sure enough, after a 
> while the cache was pruned down to the expected size.
>
> Over time I imagine you'll see things settle, but it may take a while if you 
> don't update the osdmap frequently.
>
>  Steve Taylor | Senior Software Engineer | StorageCraft 
> Technology Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 |
> If you are not the intended recipient of this message or received it 
> erroneously, please notify the sender and delete it, together with any 
> attachments, and be advised that any dissemination or copying of this message 
> is prohibited.
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan Van Der 
> Ster [daniel.vanders...@cern.ch]
> Sent: Wednesday, September 14, 2016 3:45 AM
> To: ceph-us...@ceph.com
> Subject: [ceph-users] Cleanup old osdmaps after #13990 fix applied
>
> Hi,
>
> We've just upgraded to 0.94.9, so I believe this issue is fixed:
>
>http://tracker.ceph.com/issues/13990
>
> AFAICT "resolved" means the number of osdmaps saved on each OSD will not grow 
> unboundedly anymore.
>
> However, we have many OSDs with loads of old osdmaps, e.g.:
>
> # pwd
> /var/lib/ceph/osd/ceph-257/current/meta
> # find . -name 'osdmap*' | wc -l
> 112810
>
> (And our maps are ~1MB, so this is >100GB per OSD).
>
> Is there a solution to remove these old maps?
>
> Cheers,
> Dan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cleanup old osdmaps after #13990 fix applied

2016-09-14 Thread Steve Taylor
http://tracker.ceph.com/issues/13990 was created by a colleague of mine from an 
issue that was affecting us in production. When 0.94.8 was released with the 
fix, I immediately deployed a test cluster on 0.94.7, reproduced this issue, 
upgraded to 0.94.8, and tested the fix. It worked beautifully.

I suspect the issue you're seeing is that the clean-up only occurs when new 
osdmaps are generated, so as long as nothing is changing you'll continue to see 
lots of stale maps cached. We delete RBD snapshots all the time in our 
production use case, which updates the osdmap, so I did that in my test cluster 
and watched the map cache on one of the OSDs. Sure enough, after a while the 
cache was pruned down to the expected size.

Over time I imagine you'll see things settle, but it may take a while if you 
don't update the osdmap frequently.



[cid:image9cbe59.JPG@a1d77762.42974963]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.




From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan Van Der 
Ster [daniel.vanders...@cern.ch]
Sent: Wednesday, September 14, 2016 3:45 AM
To: ceph-us...@ceph.com
Subject: [ceph-users] Cleanup old osdmaps after #13990 fix applied

Hi,

We've just upgraded to 0.94.9, so I believe this issue is fixed:

   http://tracker.ceph.com/issues/13990

AFAICT "resolved" means the number of osdmaps saved on each OSD will not grow 
unboundedly anymore.

However, we have many OSDs with loads of old osdmaps, e.g.:

# pwd
/var/lib/ceph/osd/ceph-257/current/meta
# find . -name 'osdmap*' | wc -l
112810

(And our maps are ~1MB, so this is >100GB per OSD).

Is there a solution to remove these old maps?

Cheers,
Dan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-02 Thread Steve Taylor
You can use 'rbd -p images --image 417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560 info' 
to see the parentage of your cloned RBD from Ceph's perspective. It seems like 
that could be useful at various times throughout this test to determine what 
glance is doing under the covers.




[cid:imagebc1a87.JPG@004db369.419de911]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 |



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: Eugen Block [mailto:ebl...@nde.ag]
Sent: Friday, September 2, 2016 7:12 AM
To: Steve Taylor <steve.tay...@storagecraft.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Turn snapshot of a flattened snapshot into regular 
image

> Something isn't right. Ceph won't delete RBDs that have existing
> snapshots

That's what I thought, and I also noticed that in the first test, but not in 
the second.

> The clone becomes a cinder device that is then attached to the nova instance.

This is one option, but I don't use it. nova would create a cinder volume if I 
executed "nova boot --block-device ...", but I don't, so there's no cinder 
involved.
I'll try to provide some details from openstack and ceph, maybe that helps to 
find the cause.

So I created a glance image
control1:~ #  glance image-list | grep Test
| 87862452-5872-40c9-b657-f5fec0d105c5 | Test2-SLE12SP1

which automatically gets one snapshot in rbd and has no children yet, because 
no VM has been launched yet:

ceph@node1:~/ceph-deploy> rbd -p images --image
87862452-5872-40c9-b657-f5fec0d105c5 snap ls
SNAPID NAMESIZE
   429 snap 5120 MB

ceph@node1:~/ceph-deploy> rbd -p images --image
87862452-5872-40c9-b657-f5fec0d105c5 children --snap snap 
ceph@node1:~/ceph-deploy>

Now I boot a VM

nova boot --flavor 2 --image 87862452-5872-40c9-b657-f5fec0d105c5
--nic net-id=4eafc4da-a3cd-4def-b863-5fb8e645e984 vm1

with a resulting instance_uuid=0e44badb-8a76-41d8-be43-b4125ffc6806
and see this in ceph:

ceph@node1:~/ceph-deploy> rbd -p images --image
87862452-5872-40c9-b657-f5fec0d105c5 children --snap snap 
images/0e44badb-8a76-41d8-be43-b4125ffc6806_disk

So I have the base image with a snapshot, and based on this snapshot a child 
which is the disk image for my instance. There is no cinder
volume:

control1:~ #  cinder list
+++--+--+-+--+-+
| ID | Status | Name | Size | Volume Type | Bootable | Attached to |
+++--+--+-+--+-+
+++--+--+-+--+-+

Now I create a snapshot of vm1 (I removed some lines to focus on the IDs):

control1:~ #  nova image-show 417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560
+-+--+
| Property| Value
  |
+-+--+
| id  | 417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560
  |
| metadata base_image_ref | 87862452-5872-40c9-b657-f5fec0d105c5
  |
| metadata image_type | snapshot
  |
| metadata instance_uuid  | 0e44badb-8a76-41d8-be43-b4125ffc6806
  |
| name| snap-vm1
  |
| server  | 0e44badb-8a76-41d8-be43-b4125ffc6806
  |
| status  | ACTIVE
  |
| updated | 2016-09-02T12:51:28Z
  |
+-+--+

In rbd there is a new object now, without any children:

ceph@node1:~/ceph-deploy> rbd -p images --image
417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560 snap ls
SNAPID NAME SIZE
   443 snap 20480 MB
ceph@node1:~/ceph-deploy> rbd -p images --image
417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560 children --snap snap 
ceph@node1:~/ceph-deploy>

And there's still no cinder volume ;-)
After removing vm1 I can delete the base image and snap-vm1:

control1:~ #  nova delete vm1
Request to delete server vm1 has been accepted.
control1:~ #  glance image-delete 87862452-5872-40c9-b657-f5fec0d105c5
control1:~ #
control1:~ #  glance image-delete 417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560

I did not flatten any snapshot yet, this is really strange! It seems as if the 
nova snapshot creates a full image (flattened) so it doesn't depend on the base 
image. But I didn't

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-01 Thread Steve Taylor
Something isn't right. Ceph won't delete RBDs that have existing snapshots, 
even when those snapshots aren't protected. You can't delete a snapshot that's 
protected, and you can't unprotect a snapshot if there is a COW clone that 
depends on it.

I'm not intimately familiar with OpenStack, but it must be deleting A without 
any snapshots. That would seem to indicate that at the point of deletion there 
are no COW clones of A or that any clone is no longer dependent on A. A COW 
clone requires a protected snapshot, a protected snapshot can't be deleted, and 
existing snapshots prevent RBDs from being deleted.

In my experience with OpenStack, booting a nova instance from a glance image 
causes a snapshot to be created, protected, and cloned on the RBD for the 
glance image. The clone becomes a cinder device that is then attached to the 
nova instance. Thus you're able to modify the contents of the volume within the 
instance. You wouldn't be able to delete the glance image at that point unless 
the cinder device were deleted first or it was flattened and no longer 
dependent on the glance image. I haven't performed this particular test. It's 
possible that OpenStack does the flattening for you in this scenario.

This issue will likely require some investigation at the RBD level throughout 
your testing process to understand exactly what's happening.




[cid:image5feece.JPG@7cacebfd.42833f4d]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: Eugen Block [mailto:ebl...@nde.ag]
Sent: Thursday, September 1, 2016 9:06 AM
To: Steve Taylor <steve.tay...@storagecraft.com>
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Turn snapshot of a flattened snapshot into regular 
image

Thanks for the quick response, but I don't believe I'm there yet ;-)

> cloned the glance image to a cinder device

I have configured these three services (nova, glance, cinder) to use ceph as 
storage backend, but cinder is not involved in this process I'm referring to.

Now I wanted to reproduce this scenario to show a colleague, and couldn't 
because now I was able to delete image A even with a non-flattened snapshot! 
How is that even possible?

Eugen



Zitat von Steve Taylor <steve.tay...@storagecraft.com>:

> You're already there. When you booted ONE you cloned the glance image
> to a cinder device (A', separate RBD) that was a COW clone of A.
> That's why you can't delete A until you flatten SNAP1. A' isn't a full
> copy until that flatten is complete, at which point you're able to
> delete A.
>
> SNAP2 is a second snapshot on A', and thus A' already has all of the
> data it needs from the previous flatten of SNAP1 to allow you to
> delete SNAP1. So SNAP2 isn't actually a full extra copy of the data.
>
>
> 
>
> [cid:imagef01287.JPG@753835fa.45a0b2c0]<https://storagecraft.com>
>Steve Taylor | Senior Software Engineer | StorageCraft Technology
> Corporation<https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799
>
> 
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with
> any attachments, and be advised that any dissemination or copying of
> this message is prohibited.
>
> 
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On
> Behalf Of Eugen Block
> Sent: Thursday, September 1, 2016 6:51 AM
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Turn snapshot of a flattened snapshot into
> regular image
>
> Hi all,
>
> I'm trying to understand the idea behind rbd images and their
> clones/snapshots. I have tried this scenario:
>
> 1. upload image A to glance
> 2. boot instance ONE from image A
> 3. make changes to instance ONE (install new package) 4. create
> snapshot SNAP1 from ONE 5. delete instance ONE 6. delete image A
>deleting image A fails because of existing snapshot SNAP1 7.
> flatten snapshot SNAP1 8. delete image A
>succeeds
> 9. launch instance TWO from SNAP1
> 10. make changes to TWO (install package) 11. create snapshot SNAP2
> from TWO 12. delete TWO 13. delete SNAP1
> succeeds
>
> This means that th

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-01 Thread Steve Taylor
You're already there. When you booted ONE you cloned the glance image to a 
cinder device (A', separate RBD) that was a COW clone of A. That's why you 
can't delete A until you flatten SNAP1. A' isn't a full copy until that flatten 
is complete, at which point you're able to delete A.

SNAP2 is a second snapshot on A', and thus A' already has all of the data it 
needs from the previous flatten of SNAP1 to allow you to delete SNAP1. So SNAP2 
isn't actually a full extra copy of the data.




[cid:imagef01287.JPG@753835fa.45a0b2c0]<https://storagecraft.com>   Steve 
Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799



If you are not the intended recipient of this message or received it 
erroneously, please notify the sender and delete it, together with any 
attachments, and be advised that any dissemination or copying of this message 
is prohibited.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Eugen 
Block
Sent: Thursday, September 1, 2016 6:51 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Turn snapshot of a flattened snapshot into regular image

Hi all,

I'm trying to understand the idea behind rbd images and their clones/snapshots. 
I have tried this scenario:

1. upload image A to glance
2. boot instance ONE from image A
3. make changes to instance ONE (install new package) 4. create snapshot SNAP1 
from ONE 5. delete instance ONE 6. delete image A
   deleting image A fails because of existing snapshot SNAP1 7. flatten 
snapshot SNAP1 8. delete image A
   succeeds
9. launch instance TWO from SNAP1
10. make changes to TWO (install package) 11. create snapshot SNAP2 from TWO 
12. delete TWO 13. delete SNAP1
succeeds

This means that the second snapshot has the same (full) size as the first. Can 
I manipulate SNAP1 somehow so that snapshots are not flattened anymore and 
SNAP2 becomes a cow clone of SNAP1?

I hope my description is not too confusing. The idea behind this question is, 
if I have one base image and want to adjust that image from time to time, I 
don't want to keep several versions of that image, I just want one. But this 
way i would lose the protection from deleting the base image.

Is there any config option in ceph or Openstack or anything else I can do to 
"un-flatten" an image? I would assume that there is some kind of flag set for 
that image. Maybe someone can point me to the right direction.

Thanks,
Eugen

--
Eugen Block voice   : +49-40-559 51 75
NDE Netzdesign und -entwicklung AG  fax : +49-40-559 51 77
Postfach 61 03 15
D-22423 Hamburg e-mail  : ebl...@nde.ag

Vorsitzende des Aufsichtsrates: Angelika Mozdzen
  Sitz und Registergericht: Hamburg, HRB 90934
  Vorstand: Jens-U. Mozdzen
   USt-IdNr. DE 814 013 983

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Steve Taylor
Nick is right. Setting noout is the right move in this scenario. Restarting an 
OSD shouldn't block I/O unless nodown is also set, however. The exception to 
this would be a case where min_size can't be achieved because of the down OSD, 
i.e. min_size=3 and 1 of 3 OSDs is restarting. That would certainly block 
writes. Otherwise the cluster will recognize down OSDs as down (without nodown 
set), redirect I/O requests to OSDs that are up, and backfill as necessary when 
things are back to normal.

You can set min_size to something lower if you don't have enough OSDs to allow 
you to restart one without blocking writes. If this isn't the case, something 
deeper is going on with your cluster. You shouldn't get slow requests due to 
restarting a single OSD with only noout set and idle disks on the remaining 
OSDs. I've done this many, many times.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Nick 
Fisk
Sent: Friday, February 12, 2016 9:07 AM
To: 'Christian Balzer' <ch...@gol.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't 
uptosnuff)



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Christian Balzer
> Sent: 12 February 2016 15:38
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Reducing the impact of OSD restarts (noout 
> ain't
> uptosnuff)
> 
> On Fri, 12 Feb 2016 15:56:31 +0100 Burkhard Linke wrote:
> 
> > Hi,
> >
> > On 02/12/2016 03:47 PM, Christian Balzer wrote:
> > > Hello,
> > >
> > > yesterday I upgraded our most busy (in other words lethally
> > > overloaded) production cluster to the latest Firefly in 
> > > preparation for a Hammer upgrade and then phasing in of a cache tier.
> > >
> > > When restarting the ODSs it took 3 minutes (1 minute in a 
> > > consecutive repeat to test the impact of primed caches) during 
> > > which the cluster crawled to a near stand-still and the dreaded 
> > > slow requests piled up, causing applications in the VMs to fail.
> > >
> > > I had of course set things to "noout" beforehand, in hopes of 
> > > staving off this kind of scenario.
> > >
> > > Note that the other OSDs and their backing storage were NOT 
> > > overloaded during that time, only the backing storage of the OSD 
> > > being restarted was under duress.
> > >
> > > I was under the (wishful thinking?) impression that with noout set 
> > > and a controlled OSD shutdown/restart, operations would be 
> > > redirect to the new primary for the duration.
> > > The strain on the restarted OSDs when recovering those operations 
> > > (which I also saw) I was prepared for, the near screeching halt 
> > > not so much.
> > >
> > > Any thoughts on how to mitigate this further or is this the 
> > > expected behavior?
> >
> > I wouldn't use noout in this scenario. It keeps the cluster from 
> > recognizing that a OSD is not available; other OSD will still try to 
> > write to that OSD. This is probably the cause of the blocked requests.
> > Redirecting only works if the cluster is able to detect a PG as 
> > being degraded.
> >
> Oh well, that makes of course sense, but I found some article stating 
> that
it
> also would redirect things and the recovery activity I saw afterwards
suggests
> it did so at some point.

Doesn't noout just stop the crushmap from being modified and hence data 
shuffling. Nodown controls whether or not the OSD is available for IO? 

Maybe try the reverse. Set noup so that OSD's don't participate in IO and then 
bring them in manually?

> 
> > If the cluster is aware of the OSD being missing, it could handle 
> > the write requests more gracefully. To prevent it from backfilling 
> > etc, I prefer to use nobackfill and norecover. It blocks backfill on 
> > the cluster level, but allows requests to be carried out (at least 
> > in my understanding of these flags).
> >
> Yes, I concur and was thinking of that as well. Will give it a spin 
> with
the
> upgrade to Hammer.
> 
> > 'noout' is fine for large scale cluster maintenance, since it keeps 
> > the cluster from backfilling. I've 

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Steve Taylor
I could be wrong, but I didn't think a PG would have to peer when an OSD is 
restarted with noout set. If I'm wrong, then this peering would definitely 
block I/O. I just did a quick test on a non-busy cluster and didn't see any 
peering when my OSD went down or up, but I'm not sure how good a test that is. 
The OSD should also stay "in" throughout the restart with noout set, so it 
wouldn't have been "out" before to cause peering when it came "in."

I do know that OSDs don’t mark themselves "up" until they're caught up on OSD 
maps. They won't accept any op requests until they're "up," so they shouldn't 
have any catching up to do by the time they start taking op requests. In theory 
they're ready to handle I/O by the time they start handling I/O. At least 
that's my understanding.

It would be interesting to see what this cluster looks like as far as OSD 
count, journal configuration, network, CPU, RAM, etc. Something is obviously 
amiss. Even in a semi-decent configuration one should be able to restart a 
single OSD with noout under little load without causing blocked op requests.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-Original Message-
From: Robert LeBlanc [mailto:rob...@leblancnet.us] 
Sent: Friday, February 12, 2016 1:30 PM
To: Nick Fisk <n...@fisk.me.uk>
Cc: Steve Taylor <steve.tay...@storagecraft.com>; Christian Balzer 
<ch...@gol.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't 
uptosnuff)

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

What I've seen is that when an OSD starts up in a busy cluster, as soon as it 
is "in" (could be "out" before) it starts getting client traffic. However, it 
has be "in" to start catching up and peering to the other OSDs in the cluster. 
The OSD is not ready to service requests for that PG yet, but it has the OP 
queued until it is ready.
On a busy cluster it can take an OSD a long time to become ready especially if 
it is servicing client requests at the same time.

If someone isn't able to look into the code to resolve this by the time I'm 
finished with the queue optimizations I'm doing (hopefully in a week or two), I 
plan on looking into this to see if there is something that can be done to 
prevent the OPs from being accepted until the OSD is ready for them.
- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Fri, Feb 12, 2016 at 9:42 AM, Nick Fisk  wrote:
> I wonder if Christian is hitting some performance issue when the OSD 
> or number of OSD's all start up at once? Or maybe the OSD is still 
> doing some internal startup procedure and when the IO hits it on a 
> very busy cluster, it causes it to become overloaded for a few seconds?
>
> I've seen similar things in the past where if I did not have enough 
> min free KB's configured, PG's would take a long time to peer/activate 
> and cause slow ops.
>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
>> Of Steve Taylor
>> Sent: 12 February 2016 16:32
>> To: Nick Fisk ; 'Christian Balzer' ; ceph- us...@lists.ceph.com
>> Subject: Re: [ceph-users] Reducing the impact of OSD restarts (noout 
>> ain't
>> uptosnuff)
>>
>> Nick is right. Setting noout is the right move in this scenario.
> Restarting an
>> OSD shouldn't block I/O unless nodown is also set, however. The 
>> exception to this would be a case where min_size can't be achieved 
>> because of the down OSD, i.e. min_size=3 and 1 of 3 OSDs is 
>> restarting. That would
> certainly
>> block writes. Otherwise the cluster will recognize down OSDs as down 
>> (without nodown set), redirect I/O requests to OSDs that are up, and
> backfill
>> as necessary when things are back to normal.
>>
>> You can set min_size to something lower if you don't have enough OSDs 
>> to allow you to restart one without blocking writes. If this isn't 
>> the case, something deeper is going on with your cluster. You 
>> shouldn't get slow requests due to restarting a single OSD with only 
>> noout set and idle disks
> on
>> the remaining OSDs. I've done this many, many times.
>>
>> Steve Taylor | Senior Software Engineer | StorageCraft Technology 
>> Corporation
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> Office: 80

Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Steve Taylor
With a single osd there shouldn't be much to worry about. It will have to get 
caught up on map epochs before it will report itself as up, but on a new 
cluster that should be pretty immediate.

You'll probably have to look for clues in the osd and mon logs. I would expect 
some sort of error reported in this scenario. It seems likely that it would be 
network-related in this case, but the logs will confirm or debunk that theory.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-Original Message-
From: Jeff Epstein [mailto:jeff.epst...@commerceguys.com] 
Sent: Monday, January 18, 2016 8:32 AM
To: Steve Taylor <steve.tay...@storagecraft.com>; ceph-users 
<ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] OSDs are down, don't know why

Hi Steve
Thanks for your answer. I don't have a private network defined. 
Furthermore, in my current testing configuration, there is only one OSD, so 
communication between OSDs should be a non-issue.
Do you know how OSD up/down state is determined when there is only one OSD?
Best,
Jeff

On 01/18/2016 03:59 PM, Steve Taylor wrote:
> Do you have a ceph private network defined in your config file? I've seen 
> this before in that situation where the private network isn't functional. The 
> osds can talk to the mon(s) but not to each other, so they report each other 
> as down when they're all running just fine.
>
>
> Steve Taylor | Senior Software Engineer | StorageCraft Technology 
> Corporation
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2799 | Fax: 801.545.4705
>
> If you are not the intended recipient of this message, be advised that any 
> dissemination or copying of this message is prohibited.
> If you received this message erroneously, please notify the sender and delete 
> it, together with any attachments.
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Jeff Epstein
> Sent: Friday, January 15, 2016 7:28 PM
> To: ceph-users <ceph-users@lists.ceph.com>
> Subject: [ceph-users] OSDs are down, don't know why
>
> Hello,
>
> I'm setting up a small test instance of ceph and I'm running into a situation 
> where the OSDs are being shown as down, but I don't know why.
>
> Connectivity seems to be working. The OSD hosts are able to communicate with 
> the MON hosts; running "ceph status" and "ceph osd in" from an OSD host works 
> fine, but with a HEALTH_WARN that I have 2 osds: 0 up, 2 in.
> Both the OSD and MON daemons seem to be running fine. Network connectivity 
> seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the 
> MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max 
> config options so that the OSDs will use only these ports). Neither OSD nor 
> MON logs show anything that seems unusual, nor why the OSD is marked as being 
> down.
>
> Furthermore, using tcpdump i've watched network traffic between the OSD and 
> the MON, and it seems that the OSD is sending heartbeats and getting an ack 
> from the MON. So I'm definitely not sure why the MON thinks the OSD is down.
>
> Some questions:
> - How does the MON determine if the OSD is down?
> - Is there a way to get the MON to report on why an OSD is down, e.g. no 
> heartbeat?
> - Is there any need to open ports other than TCP 6789 and 6800-6803?
> - Any other suggestions?
>
> ceph 0.94 on Debian Jessie
>
> Best,
> Jeff
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Steve Taylor
Rafael,

Yes, the cluster still rebalances twice when removing a failed osd. An osd that 
is marked out for any reason but still exists in the crush map gets its 
placement groups remapped to different osds until it comes back in, at which 
point those pgs are remapped back. When an osd is removed from the crush map, 
its pgs get mapped to new osds permanently. The mappings may be completely 
different for these two cases, which is why you get double rebalancing even 
when those two operations happen without the osd coming back in in between.

In the case of a failed osd, I usually don't worry about it and just follow the 
documented steps because I'm marking an osd out and then removing it from the 
crush map immediately, so the first rebalance does almost nothing by the time 
the second overrides it, which matches what you were told by support. If this 
is a problem for you or if you're removing an osd that's still functional to 
some degree, then reweighting to 0, waiting for the single rebalance, then 
following the removal steps is probably your best bet.


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: Andy Allan [mailto:gravityst...@gmail.com] 
Sent: Monday, January 11, 2016 4:09 AM
To: Rafael Lopez <rafael.lo...@monash.edu>
Cc: Steve Taylor <steve.tay...@storagecraft.com>; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] double rebalance when removing osd

On 11 January 2016 at 02:10, Rafael Lopez <rafael.lo...@monash.edu> wrote:

> @Steve, even when you remove due to failing, have you noticed that the 
> cluster rebalances twice using the documented steps? You may not if you don't 
> wait for the initial recovery after 'ceph osd out'. If you do 'ceph osd out' 
> and immediately 'ceph osd crush remove', RH support has told me that this 
> effectively 'cancels' the original move triggered from 'ceph osd out' and 
> starts permanently remapping... which still doesn't really explain why we 
> have to do the ceph osd out in the first place..

This topic was last discussed in December - the documentation for removing an 
OSD from the cluster is not helpful. Unfortunately it doesn't look like anyone 
is going to fix the documentation.

http://comments.gmane.org/gmane.comp.file-systems.ceph.user/25627

Basically, when you want to remove an OSD, there's an alternative sequence of 
commands that avoids the double-rebalance.

The better approach is to reweight the OSD to zero first, then wait for the 
(one and only) rebalance, then mark out and remove. Here's more details from 
the previous thread:

http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/25629

Thanks,
Andy
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-07 Thread Steve Taylor
If I’m not mistaken, marking an osd out will remap its placement groups 
temporarily, while removing it from the crush map will remap the placement 
groups permanently. Additionally, other placement groups from other osds could 
get remapped permanently when an osd is removed from the crush map. I would 
think the only benefit to marking an osd out before stopping it would be a 
cleaner redirection of client I/O before the osd disappears, which may be 
worthwhile if you’re removing a healthy osd.

As for reweighting to 0 prior to removing an osd, it seems like that would give 
the osd the ability to participate in the recovery essentially in read-only 
fashion (plus deletes) until it’s empty, so objects wouldn’t become degraded as 
placement groups are backfilling onto other osds. Again, this would really only 
be useful if you’re removing a healthy osd. If you’re removing an osd where 
other osds in different failure domains are known to be unhealthy, it seems 
like this would be a really good idea.

I usually follow the documented steps you’ve outlined myself, but I’m typically 
removing osds due to failed/failing drives while the rest of the cluster is 
healthy.

Steve Taylor | Senior Software Engineer | StorageCraft Technology 
Corporation<http://www.storagecraft.com/>
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Rafael 
Lopez
Sent: Wednesday, January 06, 2016 4:53 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] double rebalance when removing osd

Hi all,

I am curious what practices other people follow when removing OSDs from a 
cluster. According to the docs, you are supposed to:

1. ceph osd out
2. stop daemon
3. ceph osd crush remove
4. ceph auth del
5. ceph osd rm

What value does ceph osd out (1) add to the removal process and why is it in 
the docs ? We have found (as have others) that by outing(1) and then crush 
removing (3), the cluster has to do two recoveries. Is it necessary? Can you 
just do a crush remove without step 1?

I found this earlier message from GregF which he seems to affirm that just 
doing the crush remove is fine:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html

This recent blog post from Sebastien that suggests reweighting to 0 first, but 
havent tested it:
http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

I thought that by marking it out, it sets the reweight to 0 anyway, so not sure 
how this would make a difference in terms of two rebalances but maybe there is 
a subtle difference.. ?

Thanks,
Raf

--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
[http://assets.monash.edu/logos/logo.gif]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery question

2015-07-29 Thread Steve Taylor
I recently migrated 240 OSDs to new servers this way in a single cluster, and 
it worked great. There are two additional items I would note based on my 
experience though.

First, if you're using dmcrypt then of course you need to copy the dmcrypt keys 
for the OSDs to the new host(s). I had to do this in my case, but it was very 
straightforward.

Second was an issue I didn't expect, probably just because of my ignorance. I 
was not able to migrate existing OSDs from different failure domains into a 
new, single failure domain without waiting for full recovery to HEALTH_OK in 
between. The very first server I put OSD disks from two different failure 
domains into had issues. The OSDs came up and in just fine, but immediately 
started flapping and failed to make progress toward recovery. I removed the 
disks from one failure domain and left the others, and recovery progressed as 
expected. As soon as I saw HEALTH_OK I re-migrated the OSDs from the other 
failure domain and again the cluster recovered as expected. Proceeding via this 
method allowed me to migrate all 240 OSDs without any further problems. I was 
also able to migrate as many OSDs as I wanted to simultaneously as long as I 
didn't mix OSDs from different, old failure domains in a new failure domain 
without recovering in between. I understand mixing failure domains li
 ke this is risky, but I sort of expected it to work anyway. Maybe it was 
better in the end that Ceph forced me to do it more safely.

Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Peter 
Hinman
Sent: Wednesday, July 29, 2015 12:58 PM
To: Robert LeBlanc rob...@leblancnet.us
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Recovery question

Thanks for the guidance.  I'm working on building a valid ceph.conf right now.  
I'm not familiar with the osd-bootstrap key. Is that the standard filename for 
it?  Is it the keyring that is stored on the osd?

I'll see if the logs turn up anything I can decipher after I rebuild the 
ceph.conf file.

--
Peter Hinman

On 7/29/2015 12:49 PM, Robert LeBlanc wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256

 Did you use ceph-depoy or ceph-disk to create the OSDs? If so, it 
 should use udev to start he OSDs. In that case, a new host that has 
 the correct ceph.conf and osd-bootstrap key should be able to bring up 
 the OSDs into the cluster automatically. Just make sure you have the 
 correct journal in the same host with the matching OSD disk, udev 
 should do the magic.

 The OSD logs are your friend if they don't start properly.
 - 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


 On Wed, Jul 29, 2015 at 10:48 AM, Peter Hinman  wrote:
 I've got a situation that seems on the surface like it should be 
 recoverable, but I'm struggling to understand how to do it.

 I had a cluster of 3 monitors, 3 osd disks, and 3 journal ssds. After 
 multiple hardware failures, I pulled the 3 osd disks and 3 journal 
 ssds and am attempting to bring them back up again on new hardware in a new 
 cluster.
 I see plenty of documentation on how to zap and initialize and add new
 osds, but I don't see anything on rebuilding with existing osd disks.

 Could somebody provide guidance on how to do this?  I'm running 94.2 
 on all machines.

 Thanks,

 --
 Peter Hinman


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 -BEGIN PGP SIGNATURE-
 Version: Mailvelope v0.13.1
 Comment: https://www.mailvelope.com

 wsFcBAEBCAAQBQJVuSA/CRDmVDuy+mK58QAAfGAQAMq62W7QvCAo2RSDWLli
 13AJTpAWhk+ilBwcmxFr/gP/Aa9hMN5bV8idDqI56YWBjGO2WPQIUT8CXH5v
 ocBUZZJ0X08gOgHqFQ8x3rSSe6QINy1bQONMql3Jgpy8He/ctLnXROhNT9SU
 l30CI4qKwG48AZU5E4PoWgwQmdbFv0WIuFwCzPOVIU6GvO0umirerw3C7tZQ
 I34+OINURzCjKzLY/OEF4hRvRq3PV0KZAoolQTeBJtEdlyNgAQ/bHOgpfJ/h
 diGwQZyhSzqTvFYOEHWUuh5ZnhZAMNtaLBulwreUEKoI0IcXGxpH6KsC7ag4
 KJ1kD8U0I18eP4iyTOIXg+DxafUU4wrITlKdomW12XqmlHadi2vYYBCqataI
 uc4KeXHP4/SrA1qoEDtXroAV2iuV6UUNIwsY4HPBJ/CNKXFU5QSdGOey3Kjs
 Mz2zuCpMkTf6fj8B4XJfenfFulRVJwrKJml7JebPFpLTRPFMbsuZ5htUMASn
 UWyCA9IfxLYsC5tPlii79Kkb93mvN3cCdvchkH2CQ38jxkVRZRUqeJlzvtVp
 2mwinvqPD0irTvr+LvmlKOdtvFSOKJM0XmRSVk1LgLlpoyIZ9BqI02ul01fE
 7nZ892/17zdv0Nguxr8F8bps0jA7NLFpgRhEsakdmTVTJQLMwSv7z6c9fdP0
 7AWQ
 =VJV0
 -END PGP SIGNATURE-


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users