[ceph-users] ceph-volume failed after replacing disk

2019-07-04 Thread ST Wong (ITSC)
Hi all,

We replaced a faulty disk out of N OSD and tried to follow steps according to 
"Replacing and OSD" in 
http://docs.ceph.com/docs/nautilus/rados/operations/add-or-rm-osds/, but got 
error:

# ceph osd destroy 71--yes-i-really-mean-it
# ceph-volume lvm create --bluestore --data /dev/data/lv01 --osd-id 71 
--block.db /dev/db/lv01
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
-->  RuntimeError: The osd ID 71 is already in use or does not exist.

ceph -s still shows  N OSDS.   I then remove with "ceph osd rm 71".   Now "ceph 
-s" shows N-1 OSDS and id 71 doesn't appear in "ceph osd ls".

However, repeating the ceph-volume command still gets same error.
We're running CEPH 14.2.1.   I must have some steps missed.Would anyone 
please help? Thanks a lot.

Rgds,
/stwong

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Faux-Jewel Client Features

2019-07-04 Thread Konstantin Shalygin

Hi all,

Starting to make preparations for Nautilus upgrades from Mimic, and I'm looking 
over my client/session features and trying to fully grasp the situation.

>/$ ceph versions />/{ />/"mon": { />/"ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic 
(stable)": 3 }, />/"mgr": { />/"ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic 
(stable)": 3 }, />/"osd": { />/"ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic 
(stable)": 204 }, />/"mds": { />/"ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic 
(stable)": 2 }, />/"overall": { />/"ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic 
(stable)": 212 } />/} /


>/$ ceph features />/{ />/"mon": [ />/{ "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 3 } ], />/"mds": [ />/{ "features": "0x3ffddff8ffacfffb", "release": "luminous" "num": 2 } ], />/"osd": [ />/{ "features": "0x3ffddff8ffacfffb", "num": 204 } ], />/"client": [ />/{ "features": "0x7010fb86aa42ada", "release": "jewel", "num": 4 }, />/{ "features": "0x7018fb86aa42ada", "release": "jewel", "num": 1 }, />/{ "features": "0x3ffddff8eea4fffb", "release": "luminous", "num": 344 }, />/{ "features": "0x3ffddff8eeacfffb", "release": "luminous", "num": 200 }, />/{ "features": "0x3ffddff8ffa4fffb", "release": "luminous", "num": 49 }, />/{ "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 213 
} ], />/"mgr": [ />/{ "features": "0x3ffddff8ffacfffb", "release": "luminous", "num": 3 } ] />/} /

>/$ ceph osd dump | grep compat />/require_min_compat_client luminous 
/>/min_compat_client luminous /

I flattened the output to make it a bit more vertical scrolling friendly.

Diving into the actual clients with those features:
>/# ceph daemon mon.mon1 sessions | grep jewel />/"MonSession(client.1649789192 ip.2:0/3697083337 is open allow *, 
features 0x7010fb86aa42ada (jewel))", />/"MonSession(client.1656508179 ip.202:0/2664244117 is open allow *, 
features 0x7018fb86aa42ada (jewel))", />/"MonSession(client.1637479106 ip.250:0/1882319989 is open allow *, 
features 0x7010fb86aa42ada (jewel))", />/"MonSession(client.1662023903 ip.249:0/3198281565 is open allow *, 
features 0x7010fb86aa42ada (jewel))", />/"MonSession(client.1658312940 ip.251:0/3538168209 is open allow *, 
features 0x7010fb86aa42ada (jewel))", /

ip.2 is a cephfs kernel client with 4.15.0-51-generic
ip.202 is a krbd client with kernel 4.18.0-22-generic
ip.250 is a krbd client with kernel 4.15.0-43-generic
ip.249 is a krbd client with kernel 4.15.0-45-generic
ip.251 is a krbd client with kernel 4.15.0-45-generic

For the krbd clients, the features are " features: layering, exclusive-lock".

My min_compat and require_min_compat clients are already set to Luminous, 
however, I would love some reassurance that I'm not going to run into issues 
with the krbd/kcephfs clients, and trying to make use of new features like the 
PG autoscaler for instance.
I should have full upmap compatibility as the balancer in upmap mode has been 
functioning, and given that they are relatively recent kernels.

Just looking for some sanity checks to make sure I don't have any surprises for 
these 'jewel' clients come a nautilus rollout.


Your krbd (0x7010fb86aa42ada) is enough for upmap.



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph pool EC with overwrite enabled

2019-07-04 Thread huang jun
try: rbd create backup2/teste --size 5T --data-pool ec_pool

Fabio Abreu  于2019年7月5日周五 上午1:49写道:
>
> Hi Everybody,
>
> I have a doubt about the usability of rbd with EC pool , I tried to use this 
> in my CentOS lab but  I just receive some errors when I try create a rbd 
> image inside this pool.
>
> For luminous environment this feature is supported?
>
> http://docs.ceph.com/docs/mimic/rados/operations/erasure-code/#erasure-coding-with-overwrites
>
> ceph osd pool set ec_pool allow_ec_overwrites true
>
>
> This error bellow happened when I try to create the RBD image :
>
>
> [root@mon1 ceph-key]# rbd create backup2/teste --size 5T --data-pool backup2
>
> ...
>
> warning: line 9: 'osd_pool_default_crush_rule' in section 'global' redefined
>
> 2019-07-03 17:27:33.721593 7f12c3fff700 -1 librbd::image::CreateRequest: 
> 0x560f2f0db0a0 handle_add_image_to_directory: error adding image to 
> directory: (95) Operation not supported
>
> rbd: create error: (95) Operation not supported
>
>
> Regards,
> Fabio Abreu Reis
> http://fajlinux.com.br
> Tel : +55 21 98244-0161
> Skype : fabioabreureis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Understanding incomplete PGs

2019-07-04 Thread Kyle
Hello,

I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore on 
lvm) and recently ran into a problem with 17 pgs marked as incomplete after 
adding/removing OSDs.

Here's the sequence of events:
1. 7 osds in the cluster, health is OK, all pgs are active+clean
2. 3 new osds on a new host are added, lots of backfilling in progress
3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd utilization"
5. ceph osd out 6
6. systemctl stop ceph-osd@6
7. the drive backing osd 6 is pulled and wiped
8. backfilling has now finished all pgs are active+clean except for 17 
incomplete pgs

>From reading the docs, it sounds like there has been unrecoverable data loss 
in those 17 pgs. That raises some questions for me:

Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead of 
the current actual allocation?

Why is there data loss from a single osd being removed? Shouldn't that be 
recoverable?
All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1 with 
default "host" failure domain. They shouldn't suffer data loss with a single 
osd being removed even if there were no reweighting beforehand. Does the 
backfilling temporarily reduce data durability in some way?

Is there a way to see which pgs actually have data on a given osd?

I attached an example of one of the incomplete pgs.

Thanks for any help,

Kyle{
"state": "incomplete",
"snap_trimq": "[]",
"snap_trimq_len": 0,
"epoch": 2087,
"up": [
4,
3,
8
],
"acting": [
4,
3,
8
],
"info": {
"pgid": "15.59s0",
"last_update": "753'7465",
"last_complete": "753'7465",
"log_tail": "663'4401",
"last_user_version": 6947,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [],
"history": {
"epoch_created": 603,
"epoch_pool_created": 603,
"last_epoch_started": 1581,
"last_interval_started": 1580,
"last_epoch_clean": 945,
"last_interval_clean": 944,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 2082,
"same_interval_since": 2082,
"same_primary_since": 2076,
"last_scrub": "753'7465",
"last_scrub_stamp": "2019-07-02 13:40:58.935208",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2019-06-27 17:42:04.685790",
"last_clean_scrub_stamp": "2019-07-02 13:40:58.935208"
},
"stats": {
"version": "753'7465",
"reported_seq": "12691",
"reported_epoch": "2087",
"state": "incomplete",
"last_fresh": "2019-07-04 14:30:47.930190",
"last_change": "2019-07-04 14:30:47.930190",
"last_active": "2019-07-03 13:04:00.967354",
"last_peered": "2019-07-03 13:02:40.242867",
"last_clean": "2019-07-02 23:04:26.601070",
"last_became_active": "2019-07-03 08:35:12.459857",
"last_became_peered": "2019-07-03 08:35:12.459857",
"last_unstale": "2019-07-04 14:30:47.930190",
"last_undegraded": "2019-07-04 14:30:47.930190",
"last_fullsized": "2019-07-04 14:30:47.930190",
"mapping_epoch": 2082,
"log_start": "663'4401",
"ondisk_log_start": "663'4401",
"created": 603,
"last_epoch_clean": 945,
"parent": "0.0",
"parent_split_bits": 0,
"last_scrub": "753'7465",
"last_scrub_stamp": "2019-07-02 13:40:58.935208",
"last_deep_scrub": "0'0",
"last_deep_scrub_stamp": "2019-06-27 17:42:04.685790",
"last_clean_scrub_stamp": "2019-07-02 13:40:58.935208",
"log_size": 3064,
"ondisk_log_size": 3064,
"stats_invalid": false,
"dirty_stats_invalid": false,
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": false,
"manifest_stats_invalid": false,
"snaptrimq_len": 0,
"stat_sum": {
"num_bytes": 12872933376,
"num_objects": 3094,
"num_object_clones": 0,
"num_object_copies": 9282,
"num_objects_missing_on_primary": 0,
"num_objects_missing": 0,
"num_objects_degraded": 0,
"num_objects_misplaced": 0,
"num_objects_unfound": 0,
"num_objects_dirty": 3094,
"num_whiteouts": 0,
"num_read": 896,
"num_read_kb": 3708,
"num_write": 5870,
"num_write_kb": 12567180,

[ceph-users] Ceph pool EC with overwrite enabled

2019-07-04 Thread Fabio Abreu
Hi Everybody,

I have a doubt about the usability of rbd with EC pool , I tried to use
this in my CentOS lab but  I just receive some errors when I try create a
rbd image inside this pool.

For luminous environment this feature is supported?

http://docs.ceph.com/docs/mimic/rados/operations/erasure-code/#erasure-coding-with-overwrites

ceph osd pool set ec_pool allow_ec_overwrites true


This error bellow happened when I try to create the RBD image :


[root@mon1 ceph-key]# rbd create backup2/teste --size 5T --data-pool backup2

...

warning: line 9: 'osd_pool_default_crush_rule' in section 'global' redefined

2019-07-03 17:27:33.721593 7f12c3fff700 -1 librbd::image::CreateRequest:
0x560f2f0db0a0 handle_add_image_to_directory: error adding image to
directory: (95) Operation not supported

rbd: create error: (95) Operation not supported

Regards,
Fabio Abreu Reis
http://fajlinux.com.br
*Tel : *+55 21 98244-0161
*Skype : *fabioabreureis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Random slow requests without any load

2019-07-04 Thread Maximilien Cuony

Hello,

I do have a very strange situation: in a almost-new ceph cluster, I do 
have random requests blocked, leading to timeouts.


Example error from logs:

> [WRN] Health check failed: 8 slow requests are blocked > 32 sec. 
Implicated osds 12 (REQUEST_SLOW)


>7fd8bb0bd700  0 log_channel(cluster) log [WRN] : slow request 
30.796124 seconds old, received at 2019-07-04 16:18:54.53038    8: 
osd_op(client.2829606.0:103 3.135 
3:ac9abb76:::rbd_data.2b00ee6b8b4567.:head 
[set-alloc-hint object_size 4194304 write_size 419430 4,write 0~4096] 
snapc 0=[] ondisk+write+known_if_redirected e294) currently op_applied


This happens totally randomly. I'm not able to reproduce it: I never had 
the issue with benchmarks, I do have it occasionally when I start or 
stop of VM (it's a proxmox deployment and ceph / rdb is used as storage 
for VMs) or when I use the VM.


This is a example request stuck (with dump_ops_in_flight):

> { "description": "osd_op(client.2829606.0:103 3.135 
3:ac9abb76:::rbd_data.2b00ee6b8b4567.:head 
[set-alloc-hint object_size 4194304 write_size 4194304,write 0~4096] 
snapc 0=[] ondisk+write+known_if_redirected e294)", "initiated_at": 
"2019-07-04 16:18:54.530388", "age": 196.315782, "duration": 196.315797, 
"type_data": { "flag_point": "waiting for sub ops", "client_info": { 
"client": "client.2829606", "client_addr": "10.3.5.40:0/444048627", 
"tid": 103 }, "events": [ { "time": "2019-07-04 16:18:54.530388", 
"event": "initiated" }, { "time": "2019-07-04 16:18:54.530429", "event": 
"queued_for_pg" }, { "time": "2019-07-04 16:18:54.530437", "event": 
"reached_pg" }, { "time": "2019-07-04 16:18:54.530455", "event": 
"started" }, { "time": "2019-07-04 16:18:54.530507", "event": "waiting 
for subops from 8" }, { "time": "2019-07-04 16:18:54.531020", "event": 
"op_commit" }, { "time": "2019-07-04 16:18:54.531024", "event": 
"op_applied" } ] } }


Since he seems to be waiting on osd 8, I tried to dump_ops_in_flight and 
to dump_historic_ops but there was nothing (witch is quite strange no ?).


The cluster has no load in general: There is no i/o errors, no requests 
on disks (iostat is at 99.+% idle), no cpu usage, no ethernet usage.


The OSD or the OSD waited on in subops are random.

If I restart the target osd, the request is unstuck. There is nothing 
else in logs / dmesg except this:


> 7fd8bf8f1700  0 -- 10.3.5.41:6809/16241 >> 10.3.5.42:6813/1015314 
conn(0x555ddb9db800 :6809 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 
cs=0 l=0).handle_connect_msg accept connect_seq 39 vs existing csq=39 
existing_state=STATE_CONNECTING


But not around errors, I'm not sure it's just debuging output.

On the network side, I had jumbo frames but disabling them changed 
nothing. Just in case, I do have a LACP bond to two switches (mlag/vtl), 
but I don't see any network issues (heavy pings are totally fine, even 
for a long time).


I kind of suspect the tcp connection of the OSD who is stuck / in a bad 
state for some reason, but I'm not sure what and how I can debug this.


My ceph version is: 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) 
luminous (stable)


Do you have any idea/pointer/help on what is the issue / what can I try 
to debug / check ?


Thanks a lot and have a nice day!

--
Maximilien Cuony

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] To backport or not to backport

2019-07-04 Thread Daniel Baumann
Hi,

On 7/4/19 3:00 PM, Stefan Kooman wrote:
> - Only backport fixes that do not introduce new functionality, but addresses
>   (impaired) functionality already present in the release.

ack, and also my full agrement/support for everything else you wrote,
thanks.

reading in the changelogs about backported features (in particular the
one release where bluestor was backported to) left me quite scared for
upgrading our cluster.

Regards,
Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] To backport or not to backport

2019-07-04 Thread Stefan Kooman
Hi,

Now the release cadence has been set, it's time for another discussion
:-).

During Ceph day NL we had a panel q/a [1]. One of the things that was
discussed were backports. Occasionally users will ask for backports of
functionality in newer releases to older releases (that are still in
support).

Ceph is quite a unique project in the sense that new functionality gets
backported to older releases. Sometimes even functionality gets changed
in the lifetime of a release. I can recall "ceph-volume" change to LVM
in the beginning of the Luminous release. While backports can enrich the
user experience of a ceph operator, it's not without risks. There have
been several issues with "incomplete" backports and or unforeseen
circumstances that had the reverse effect: downtime of (part of) ceph
services. The ones that come to my mind are:

- MDS (cephfs damaged)  mimic backport (13.2.2)
- RADOS (pg log hard limit) luminous / mimic backport (12.2.8 / 13.2.2)

I would like to define a simple rule of when to backport:

- Only backport fixes that do not introduce new functionality, but addresses
  (impaired) functionality already present in the release.

Example of, IMHO, a backport that matches the backport criteria was the
"bitmap_allocator" fix. It fixed a real problem, not some corner case.
Don't get me wrong here, it is important to catch corner cases, but it
should not put the majority of clusters at risk.

The time and effort that might be saved with this approach can indeed be
spend in one of the new focus areas Sage mentioned during his keynote
talk at Cephalocon Barcelona: quality. Quality of the backports that are
needed, improved testing, especially for upgrades to newer releases. If
upgrades are seemless, people are more willing to upgrade, because hey,
it just works(tm). Upgrades should be boring.

How many clusters (not nautilus ;-)) are running with "bitmap_allocator" or
with the pglog_hardlimit enabled? If a new feature is not enabled by
default and it's unclear how "stable" it is to use, operators tend to not
enable it, defeating the purpose of the backport.

Backporting fixes to older releases can be considered a "business
opportunity" for the likes of Red Hat, SUSE, Fujitsu, etc. Especially
for users that want a system that "keeps on running forever" and never
needs "dangerous" updates.

This is my view on the matter, please let me know what you think of
this.

Gr. Stefan

P.s. Just to make things clear: this thread is in _no way_ intended to pick on
anybody. 


[1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel

-- 
| BIT BV  https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6   +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cannot add fuse options to ceph-fuse command

2019-07-04 Thread songz.gucas
Hi,



I try to add some fuse options when mount cephfs using ceph-fuse tool, but it 
errored:

 
ceph-fuse -m 10.128.5.1,10.128.5.2,10.128.5.3 -r /test1 /cephfs/test1 -o 
entry_timeout=5
 
ceph-fuse[3857515]: starting ceph client2019-07-04 21:55:37.767 7fc1d9cbdbc0 -1 
init, newargv = 0x555d6f847490 newargc=9
 


 
fuse: unknown option `entry_timeout=5'
 
ceph-fuse[3857515]: fuse failed to start
 
2019-07-04 21:55:37.796 7fc1d9cbdbc0 -1 fuse_lowlevel_new failed







How can I pass options to fuse?




Thank you for your precious help !___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cinder pool inaccessible after Nautilus upgrade

2019-07-04 Thread Adrien Georget
It appears that if the client or Openstack cinder service is in the same 
network as Ceph, it works.
In the Openstack network it fails, but only on this particular pool! It 
was working well before the upgrade and no changes have been made on 
network side.
Very strange issue. I checked the Ceph release notes in order to find 
network changes but found nothing relevant.
Only the biggest pool is concerned, same pool config, same hosts, ACLs 
all open, no iptables, ...


Anything else to check?
We are thinking about adding a VNIC to all Ceph and Openstack hosts in 
order to be in the same subnet.



Adrien


Le 03/07/2019 à 13:46, Adrien Georget a écrit :

Hi,

With --debug-objecter=20, I found that the rados ls command hangs 
looping on laggy messages :

|
||2019-07-03 13:33:24.913 7efc402f5700 10 client.21363886.objecter 
_op_submit op 0x7efc3800dc10||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter 
_calc_target epoch 13146 base  @3 precalc_pgid 1 pgid 3.100 is_read||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter 
_calc_target target  @3 -> pgid 3.100||
||2019-07-03 13:33:24.913 7efc402f5700 10 client.21363886.objecter 
_calc_target  raw pgid 3.100 -> actual 3.100 acting [29,12,55] primary 
29||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter 
_get_session s=0x7efc380024c0 osd=29 3||
||2019-07-03 13:33:24.913 7efc402f5700 10 client.21363886.objecter 
_op_submit oid  '@3' '@3' [pgnls start_epoch 13146] tid 11 osd.29||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter 
get_session s=0x7efc380024c0 osd=29 3||
||2019-07-03 13:33:24.913 7efc402f5700 15 client.21363886.objecter 
_session_op_assign 29 11||
||2019-07-03 13:33:24.913 7efc402f5700 15 client.21363886.objecter 
_send_op 11 to 3.100 on osd.29||
||2019-07-03 13:33:24.913 7efc402f5700 20 client.21363886.objecter 
put_session s=0x7efc380024c0 osd=29 4||
||2019-07-03 13:33:24.913 7efc402f5700  5 client.21363886.objecter 1 
in flight||

||2019-07-03 13:33:29.678 7efc3e2f1700 10 client.21363886.objecter tick||
||2019-07-03 13:33:34.678 7efc3e2f1700 10 client.21363886.objecter tick||
||2019-07-03 13:33:39.678 7efc3e2f1700 10 client.21363886.objecter tick||
||2019-07-03 13:33:39.678 7efc3e2f1700  2 client.21363886.objecter  
tid 11 on osd.29 is laggy||
||2019-07-03 13:33:39.678 7efc3e2f1700 10 client.21363886.objecter 
_maybe_request_map subscribing (onetime) to next osd map||

||2019-07-03 13:33:44.678 7efc3e2f1700 10 client.21363886.objecter tick||
||2019-07-03 13:33:44.678 7efc3e2f1700  2 client.21363886.objecter  
tid 11 on osd.29 is laggy||
||2019-07-03 13:33:44.678 7efc3e2f1700 10 client.21363886.objecter 
_maybe_request_map subscribing (onetime) to next osd map||

||2019-07-03 13:33:49.679 7efc3e2f1700 10 client.21363886.objecter tick
...

|I tried to disable this OSD but the problem goes on another OSD, and 
so on.
The ceph client packages are up to date, all RBD command still work 
from a monitor but not from Openstack controllers.
And the other Ceph pool on the same OSD host but on different disks 
works perfectly with Openstack...


The issue looks like these old on, but It seems fixed since fews years 
: https://tracker.ceph.com/issues/2454 and 
https://tracker.ceph.com/issues/8515


Is there anything more I can check?

Adrien


Le 02/07/2019 à 14:10, Adrien Georget a écrit :

Hi Eugen,

The cinder keyring used by the 2 pools is the same, the rbd command 
works using this keyring and ceph.conf used by Openstack while the 
rados ls command stays stuck.


I tried with the previous ceph-common version used 10.2.5 and the 
last ceph version 14.2.1.
With the Nautilus ceph-common version, the 2 cinder-volume services 
crashed...


Adrien

Le 02/07/2019 à 13:50, Eugen Block a écrit :

Hi,

did you try to use rbd and rados commands with the cinder keyring, 
not the admin keyring? Did you check if the caps for that client are 
still valid (do the caps differ between the two cinder pools)?


Are the ceph versions on your hypervisors also nautilus?

Regards,
Eugen


Zitat von Adrien Georget :


Hi all,

I'm facing a very strange issue after migrating my Luminous cluster 
to Nautilus.
I have 2 pools configured for Openstack cinder volumes with 
multiple backend setup, One "service" Ceph pool with cache tiering 
and one "R" Ceph pool.
After the upgrade, the R pool became inaccessible for Cinder and 
the cinder-volume service using this pool can't start anymore.
What is strange is that Openstack and Ceph report no error, Ceph 
cluster is healthy, all OSDs are UP & running and the "service" 
pool is still running well with the other cinder service on the 
same openstack host.
I followed exactly the upgrade procedure 
(https://ceph.com/releases/v14-2-0-nautilus-released/#upgrading-from-mimic-or-luminous), 
no problem during the upgrade but I can't understand why Cinder 
still fails with this pool.
I can access, list, create volume on this pool with rbd or rados 
command from the 

Re: [ceph-users] slow requests due to scrubbing of very small pg

2019-07-04 Thread Igor Fedotov

Hi Lukasz,

I've seen something like that - slow requests and relevant OSD reboots 
on suicide timeout at least twice with two different clusters. The root 
cause was slow omap listing for some objects which had started to happen 
after massive removals from RocksDB.


To verify if this is the case you can create a script that uses 
ceph-objectstore-tool to list objects for the specific pg and then 
list-omap for every returned object.


If omap listing for some object(s) takes too long (minutes in my case) - 
you're facing the same issue.


PR that implements automatic lookup for such "slow" objects in 
ceph-objectstore-tool is under review: 
https://github.com/ceph/ceph/pull/27985



The only known workaround for existing OSDs so far is manual DB 
compaction. And https://github.com/ceph/ceph/pull/27627 hopefully fixes 
the issue for newly deployed OSDs.




Relevant upstream tickets are:

http://tracker.ceph.com/issues/36482

http://tracker.ceph.com/issues/40557


Hope this helps,

Igor

On 7/3/2019 9:54 AM, Luk wrote:

Hello,

I have strange problem with scrubbing.

When  scrubbing starts on PG which belong to default.rgw.buckets.index
pool,  I  can  see that this OSD is very busy (see attachment), and starts 
showing many
slow  request,  after  the  scrubbing  of this PG stops, slow requests
stops immediately.

[root@stor-b02 /var/lib/ceph/osd/ceph-118/current]# zgrep scrub 
/var/log/ceph/ceph-osd.118.log.1.gz  | grep -w 20.2
2019-07-03 00:14:57.496308 7fd4c7a09700  0 log_channel(cluster) log [DBG] : 
20.2 deep-scrub starts
2019-07-03 05:36:13.274637 7fd4ca20e700  0 log_channel(cluster) log [DBG] : 
20.2 deep-scrub ok
[root@stor-b02 /var/lib/ceph/osd/ceph-118/current]#

[root@stor-b02 /var/lib/ceph/osd/ceph-118/current]# du -sh 20.2_*
636K20.2_head
0   20.2_TEMP
[root@stor-b02 /var/lib/ceph/osd/ceph-118/current]# ls -1 -R 20.2_head | wc -l
4125
[root@stor-b02 /var/lib/ceph/osd/ceph-118/current]#

and on mon:

2019-07-03 00:48:44.793893 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 6231090 : 
cluster [WRN] Health check failed: 105 slow requests are blocked > 32 sec. 
Implicated osds 118 (REQUEST_SLOW)
2019-07-03 00:48:54.086446 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 6231097 : 
cluster [WRN] Health check update: 102 slow requests are blocked > 32 sec. 
Implicated osds 118 (REQUEST_SLOW)
2019-07-03 00:48:59.088240 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 6231099 : 
cluster [WRN] Health check update: 91 slow requests are blocked > 32 sec. 
Implicated osds 118 (REQUEST_SLOW)

[...]

2019-07-03 05:36:19.695586 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 6243211 : 
cluster [INF] Health check cleared: REQUEST_SLOW (was: 23 slow requests are 
blocked > 32 sec. Implicated osds 118)
2019-07-03 05:36:19.695700 mon.ceph-mon-01 mon.0 10.10.8.221:6789/0 6243212 : 
cluster [INF] Cluster is now healthy

ceph version 12.2.9

it  might  be related to this (taken from:
https://ceph.com/releases/v12-2-11-luminous-released/) ? :

"
There have been fixes to RGW dynamic and manual resharding, which no longer
leaves behind stale bucket instances to be removed manually. For finding and
cleaning up older instances from a reshard a radosgw-admin command reshard
stale-instances list and reshard stale-instances rm should do the necessary
cleanup.
"


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] troubleshooting space usage

2019-07-04 Thread Andrei Mikhailovsky
Thanks for trying to help, Igor. 

> From: "Igor Fedotov" 
> To: "Andrei Mikhailovsky" 
> Cc: "ceph-users" 
> Sent: Thursday, 4 July, 2019 12:52:16
> Subject: Re: [ceph-users] troubleshooting space usage

> Yep, this looks fine..

> hmm... sorry, but I'm out of ideas what's happening..

> Anyway I think ceph reports are more trustworthy than rgw ones. Looks like 
> some
> issue with rgw reporting or may be some object leakage.

> Regards,

> Igor

> On 7/3/2019 6:34 PM, Andrei Mikhailovsky wrote:

>> Hi Igor.

>> The numbers are identical it seems:

>> .rgw.buckets 19 15 TiB 78.22 4.3 TiB 8786934

>> # cat /root/ceph-rgw.buckets-rados-ls-all |wc -l
>> 8786934

>> Cheers

>>> From: "Igor Fedotov" [ mailto:ifedo...@suse.de |  ]
>>> To: "andrei" [ mailto:and...@arhont.com |  ]
>>> Cc: "ceph-users" [ mailto:ceph-users@lists.ceph.com |
>>>  ]
>>> Sent: Wednesday, 3 July, 2019 13:49:02
>>> Subject: Re: [ceph-users] troubleshooting space usage

>>> Looks fine - comparing bluestore_allocated vs. bluestore_stored shows a 
>>> little
>>> difference. So that's not the allocation overhead.

>>> What's about comparing object counts reported by ceph and radosgw tools?

>>> Igor.

>>> On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:

 Thanks Igor, Here is a link to the ceph perf data on several osds.

 [ https://paste.ee/p/IzDMy | https://paste.ee/p/IzDMy ]

 In terms of the object sizes. We use rgw to backup the data from various
 workstations and servers. So, the sizes would be from a few kb to a few 
 gig per
 individual file.

 Cheers

> From: "Igor Fedotov" [ mailto:ifedo...@suse.de |  ]
> To: "andrei" [ mailto:and...@arhont.com |  ]
> Cc: "ceph-users" [ mailto:ceph-users@lists.ceph.com |
>  ]
> Sent: Wednesday, 3 July, 2019 12:29:33
> Subject: Re: [ceph-users] troubleshooting space usage

> Hi Andrei,

> Additionally I'd like to see performance counters dump for a couple of 
> HDD OSDs
> (obtained through 'ceph daemon osd.N perf dump' command).

> W.r.t average object size - I was thinking that you might know what 
> objects had
> been uploaded... If not then you might want to estimate it by using 
> "rados get"
> command on the pool: retrieve some random object set and check their 
> sizes. But
> let's check performance counters first - most probably they will show 
> loses
> caused by allocation.

> Also I've just found similar issue (still unresolved) in our internal 
> tracker -
> but its root cause is definitely different from allocation overhead. 
> Looks like
> some orphaned objects in the pool. Could you please compare and share the
> amounts of objects in the pool reported by "ceph (or rados) df detail" and
> radosgw tools?

> Thanks,

> Igor

> On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote:

>> Hi Igor,

>> Many thanks for your reply. Here are the details about the cluster:

>> 1. Ceph version - 13.2.5-1xenial (installed from Ceph repository for 
>> ubuntu
>> 16.04)

>> 2. main devices for radosgw pool - hdd. we do use a few ssds for the 
>> other pool,
>> but it is not used by radosgw

>> 3. we use BlueStore

>> 4. Average rgw object size - I have no idea how to check that. Couldn't 
>> find a
>> simple answer from google either. Could you please let me know how to 
>> check
>> that?

>> 5. Ceph osd df tree:

>> 6. Other useful info on the cluster:

>> # ceph osd df tree
>> ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS TYPE NAME

>> -1 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - root uk
>> -5 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - datacenter ldex
>> -11 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - room ldex-dc3
>> -13 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - row row-a
>> -4 112.17979 - 113 TiB 90 TiB 23 TiB 79.25 1.00 - rack ldex-rack-a5
>> -2 28.04495 - 28 TiB 22 TiB 6.2 TiB 77.96 0.98 - host arh-ibstorage1-ib

>> 0 hdd 2.73000 0.7 2.8 TiB 2.3 TiB 519 GiB 81.61 1.03 145 osd.0
>> 1 hdd 2.73000 1.0 2.8 TiB 1.9 TiB 847 GiB 70.00 0.88 130 osd.1
>> 2 hdd 2.73000 1.0 2.8 TiB 2.2 TiB 561 GiB 80.12 1.01 152 osd.2
>> 3 hdd 2.73000 1.0 2.8 TiB 2.3 TiB 469 GiB 83.41 1.05 160 osd.3
>> 4 hdd 2.73000 1.0 2.8 TiB 1.8 TiB 983 GiB 65.18 0.82 141 osd.4
>> 32 hdd 5.45999 1.0 5.5 TiB 4.4 TiB 1.1 TiB 80.68 1.02 306 osd.32
>> 35 hdd 2.73000 1.0 2.8 TiB 1.7 TiB 1.0 TiB 62.89 0.79 126 osd.35
>> 36 hdd 2.73000 1.0 2.8 TiB 2.3 TiB 464 GiB 83.58 1.05 175 osd.36
>> 37 hdd 2.73000 0.8 2.8 TiB 2.5 TiB 301 GiB 89.34 1.13 160 osd.37
>> 5 ssd 0.74500 1.0 745 GiB 642 GiB 103 GiB 86.15 1.09 65 osd.5

>> -3 28.04495 - 28 TiB 24 TiB 4.5 TiB 84.03 1.06 - host arh-ibstorage2-ib
>> 9 hdd 2.73000 0.95000 2.8 TiB 2.4 TiB 405 GiB 85.65 1.08 158 osd.9
>> 10 hdd 2.73000 0.8 2.8 TiB 2.4 TiB 

Re: [ceph-users] troubleshooting space usage

2019-07-04 Thread Igor Fedotov

Yep, this looks fine..

hmm... sorry, but I'm out of ideas what's happening..

Anyway I think ceph  reports are more trustworthy than rgw ones. Looks 
like some issue with rgw reporting or may be some object leakage.



Regards,

Igor


On 7/3/2019 6:34 PM, Andrei Mikhailovsky wrote:

Hi Igor.

The numbers are identical it seems:

    .rgw.buckets   19      15 TiB     78.22       4.3 TiB *8786934*

# cat /root/ceph-rgw.buckets-rados-ls-all |wc -l
*8786934*

Cheers


*From: *"Igor Fedotov" 
*To: *"andrei" 
*Cc: *"ceph-users" 
*Sent: *Wednesday, 3 July, 2019 13:49:02
*Subject: *Re: [ceph-users] troubleshooting space usage

Looks fine - comparing bluestore_allocated vs. bluestore_stored
shows a little difference. So that's not the allocation overhead.

What's about comparing object counts reported by ceph and radosgw
tools?


Igor.


On 7/3/2019 3:25 PM, Andrei Mikhailovsky wrote:

Thanks Igor, Here is a link to the ceph perf data on several osds.

https://paste.ee/p/IzDMy

In terms of the object sizes. We use rgw to backup the data
from various workstations and servers. So, the sizes would be
from a few kb to a few gig per individual file.

Cheers





*From: *"Igor Fedotov" 
*To: *"andrei" 
*Cc: *"ceph-users" 
*Sent: *Wednesday, 3 July, 2019 12:29:33
*Subject: *Re: [ceph-users] troubleshooting space usage

Hi Andrei,

Additionally I'd like to see performance counters dump for
a couple of HDD OSDs (obtained through 'ceph daemon osd.N
perf dump' command).

W.r.t average object size - I was thinking that you might
know what objects had been uploaded... If not then you
might want to estimate it by using "rados get" command on
the pool: retrieve some random object set and check their
sizes. But let's check performance counters first - most
probably they will show loses caused by allocation.


Also I've just found similar issue (still unresolved) in
our internal tracker - but its root cause is definitely
different from allocation overhead. Looks like some
orphaned objects in the pool. Could you please compare and
share the amounts of objects in the pool reported by "ceph
(or rados) df detail" and radosgw tools?


Thanks,

Igor


On 7/3/2019 12:56 PM, Andrei Mikhailovsky wrote:

Hi Igor,

Many thanks for your reply. Here are the details about
the cluster:

1. Ceph version - 13.2.5-1xenial (installed from Ceph
repository for ubuntu 16.04)

2. main devices for radosgw pool - hdd. we do use a
few ssds for the other pool, but it is not used by radosgw

3. we use BlueStore

4. Average rgw object size - I have no idea how to
check that. Couldn't find a simple answer from google
either. Could you please let me know how to check that?

5. Ceph osd df tree:

6. Other useful info on the cluster:

# ceph osd df tree
ID  CLASS WEIGHT    REWEIGHT SIZE    USE     AVAIL  
%USE  VAR  PGS TYPE NAME

 -1       112.17979        - 113 TiB  90 TiB  23 TiB
79.25 1.00   - root uk
 -5       112.17979        - 113 TiB  90 TiB  23 TiB
79.25 1.00   -     datacenter ldex
-11       112.17979        - 113 TiB  90 TiB  23 TiB
79.25 1.00   -         room ldex-dc3
-13       112.17979        - 113 TiB  90 TiB  23 TiB
79.25 1.00   -             row row-a
 -4       112.17979        - 113 TiB  90 TiB  23 TiB
79.25 1.00   - rack ldex-rack-a5
 -2        28.04495        -  28 TiB  22 TiB 6.2 TiB
77.96 0.98   -   host arh-ibstorage1-ib


  0   hdd   2.73000  0.7 2.8 TiB 2.3 TiB 519 GiB
81.61 1.03 145       osd.0
  1   hdd   2.73000  1.0 2.8 TiB 1.9 TiB 847 GiB
70.00 0.88 130       osd.1
 2   hdd   2.73000  1.0 2.8 TiB 2.2 TiB 561 GiB
80.12 1.01 152         osd.2
  3   hdd   2.73000  1.0 2.8 TiB 2.3 TiB 469 GiB
83.41 1.05 160             osd.3
  4   hdd   2.73000  1.0 2.8 TiB 1.8 TiB 983 GiB
65.18 0.82 141             osd.4
 32   hdd   5.45999  1.0 5.5 TiB 4.4 TiB 1.1 TiB
80.68 

Re: [ceph-users] Two clusters in one network

2019-07-04 Thread Paul Emmerich
There's nothing special about layer 2 networks in Ceph, so yeah that's as
valid as any other network setup.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Thu, Jul 4, 2019 at 10:14 AM Jarek  wrote:

>
> Are two clusters in one layer2 network safe in production use?
> The goal is a rbd-mirror between them.
>
> --
> Pozdrawiam
> Jarosław Mociak - Nettelekom GK Sp. z o.o.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Two clusters in one network

2019-07-04 Thread Jarek

Are two clusters in one layer2 network safe in production use?
The goal is a rbd-mirror between them.

-- 
Pozdrawiam
Jarosław Mociak - Nettelekom GK Sp. z o.o.


pgpGjD23Bl8Ln.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pgs not deep-scrubbed in time

2019-07-04 Thread Alexander Walker

Hi,
thanks for you quickly answer. This option is set to false.
root@heku1 ~# ceph daemon osd.1 config get osd_scrub_auto_repair
{
    "osd_scrub_auto_repair": "false"
}

Best regards
Alex

Am 03.07.2019 um 15:42 schrieb Paul Emmerich:

auto repair enabled___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com