Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
Hello, >> I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk >> (no SSD) in 20 servers. >> >> I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow >> requests, 0 included below; oldest blocked for > 60281.199503 secs" >> >> After a few investigations, I saw that ALL ceph-osd process eat a lot of >> memory, up to 130GB RSS each. It this value normal? May this related to >> slow requests? Is disk only increasing the probability to get slow requests? > > If you haven't set: > > osd op queue cut off = high > > in /etc/ceph/ceph.conf on your OSDs, I'd give that a try. It should > help quite a bit with pure HDD clusters. OK I'll try this, thanks. If I want to add this my ceph-ansible playbook parameters, in which files I should add it and what is the best way to do it ? Add those 3 lines in all.yml or osds.yml ? ceph_conf_overrides: global: osd_op_queue_cut_off: high Is there another (better?) way to do that? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage
cluded > below; oldest blocked for > 62456.242289 secs > 2019-09-19 08:52:58.960777 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62452.674936 secs > 2019-09-19 08:53:03.960853 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62457.675011 secs > 2019-09-19 08:53:07.528033 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62461.242354 secs > 2019-09-19 08:53:12.528177 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62466.242487 secs > 2019-09-19 08:53:08.960965 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62462.675123 secs > 2019-09-19 08:53:13.961034 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62467.675195 secs > 2019-09-19 08:53:17.528276 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62471.242592 secs > 2019-09-19 08:53:22.528407 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62476.242729 secs > 2019-09-19 08:53:18.961149 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62472.675310 secs > 2019-09-19 08:53:23.961234 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62477.675392 secs > 2019-09-19 08:53:27.528509 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62481.242832 secs > 2019-09-19 08:53:32.528651 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62486.242961 secs > 2019-09-19 08:53:28.961314 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62482.675471 secs > 2019-09-19 08:53:33.961393 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62487.675549 secs > 2019-09-19 08:53:37.528706 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62491.243031 secs > 2019-09-19 08:53:42.528790 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62496.243105 secs > 2019-09-19 08:53:38.961476 mds.icadmin006 [WRN] 10 slow requests, 1 included > below; oldest blocked for > 62492.675617 secs > 2019-09-19 08:53:38.961485 mds.icadmin006 [WRN] slow request 61441.151061 > seconds old, received at 2019-09-18 17:49:37.810351: > client_request(client.21441:176429 getattr pAsLsXsFs #0x1f2b1b3 > 2019-09-18 17:49:37.806002 caller_uid=204878, caller_gid=11233{}) currently > failed to rdlock, waiting > 2019-09-19 08:53:43.961569 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62497.675728 secs > 2019-09-19 08:53:47.528891 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62501.243214 secs > 2019-09-19 08:53:52.529021 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62506.243337 secs > 2019-09-19 08:53:48.961685 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62502.675839 secs > 2019-09-19 08:53:53.961792 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62507.675948 secs > 2019-09-19 08:53:57.529113 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62511.243437 secs > 2019-09-19 08:54:02.529224 mds.icadmin007 [WRN] 3 slow requests, 0 included > below; oldest blocked for > 62516.243546 secs > 2019-09-19 08:53:58.961866 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62512.676025 secs > 2019-09-19 08:54:03.961939 mds.icadmin006 [WRN] 10 slow requests, 0 included > below; oldest blocked for > 62517.676099 secs Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT [ { "id": 651292, "num_leases": 0, "num_caps": 0, "state": "open", "request_load_avg": 0, "uptime": 65094.458896163, "replay_requests": 0, "completed_requests": 0, "reconnecting": false, "inst": "client.651292 v1:10.90.47.29:0/2037483206", "client_metadata": { "features": "00ff", "entity_id": "labo04", "hostname": "iccluster177.", "kernel_version": "4.15.0-43-generic", "root": "/labo04-scratch" } }, { "id": 89226, "num_leases": 0, "num_caps": 0, "state": "open", "request_load_avg":
Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size
>> I am doing some tests with Nautilus and cephfs on erasure coding pool. >> >> I noticed something strange between k+m in my erasure profile and >> size+min_size in the pool created: >> >>> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2 >>> crush-device-class= >>> crush-failure-domain=osd >>> crush-root=default >>> jerasure-per-chunk-alignment=false >>> k=4 >>> m=2 >>> plugin=jerasure >>> technique=reed_sol_van >>> w=8 >> >>> test@icadmin004:~$ ceph --cluster test osd pool create cephfs_data 8 8 >>> erasure ecpool-4-2 >>> pool 'cephfs_data' created >> >>> test@icadmin004:~$ ceph osd pool ls detail | grep cephfs_data >>> pool 14 'cephfs_data' erasure size 6 min_size 5 crush_rule 1 object_hash >>> rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 2646 >>> flags hashpspool stripe_width 16384 >> >> Why min_size = 5 and not 4 ? >> > this question comes up regularly and is been discussed just now: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html Oh thanks, I missed that thread, make sense. I agree with some comment that it is a little bit confusing. Best, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size
Dear all, I am doing some tests with Nautilus and cephfs on erasure coding pool. I noticed something strange between k+m in my erasure profile and size+min_size in the pool created: > test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2 > crush-device-class= > crush-failure-domain=osd > crush-root=default > jerasure-per-chunk-alignment=false > k=4 > m=2 > plugin=jerasure > technique=reed_sol_van > w=8 > test@icadmin004:~$ ceph --cluster test osd pool create cephfs_data 8 8 > erasure ecpool-4-2 > pool 'cephfs_data' created > test@icadmin004:~$ ceph osd pool ls detail | grep cephfs_data > pool 14 'cephfs_data' erasure size 6 min_size 5 crush_rule 1 object_hash > rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 2646 flags > hashpspool stripe_width 16384 Why min_size = 5 and not 4 ? Best, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs free space issue
iB 1.26TiB 391GiB 76.70 1.01 198 > 181 hdd 1.63739 1.0 1.64TiB 1.27TiB 380GiB 77.33 1.02 186 > 186 hdd 1.63739 1.0 1.64TiB 1.20TiB 451GiB 73.10 0.96 190 > 182 hdd 1.63739 1.0 1.64TiB 1.31TiB 332GiB 80.20 1.06 204 > 187 hdd 1.63739 1.0 1.64TiB 1.22TiB 424GiB 74.72 0.98 189 > 183 hdd 1.63739 1.0 1.64TiB 1.33TiB 318GiB 81.05 1.07 206 > 189 hdd 1.63739 1.0 1.64TiB 1.08TiB 576GiB 65.66 0.86 169 > 184 hdd 1.63739 1.0 1.64TiB 1.21TiB 441GiB 73.70 0.97 183 > 188 hdd 1.63739 1.0 1.64TiB 1.17TiB 474GiB 71.70 0.94 182 > 190 hdd 1.63739 1.0 1.64TiB 1.27TiB 373GiB 77.75 1.02 195 > 195 hdd 1.63739 1.0 1.64TiB 1.32TiB 327GiB 80.47 1.06 198 > 191 hdd 1.63739 1.0 1.64TiB 1.16TiB 484GiB 71.15 0.94 183 > 197 hdd 1.63739 1.0 1.64TiB 1.28TiB 370GiB 77.94 1.03 197 > 192 hdd 1.63739 1.0 1.64TiB 1.26TiB 382GiB 77.24 1.02 200 > 196 hdd 1.63739 1.0 1.64TiB 1.24TiB 402GiB 76.02 1.00 201 > 193 hdd 1.63739 1.0 1.64TiB 1.24TiB 409GiB 75.59 1.00 186 > 198 hdd 1.63739 1.0 1.64TiB 1.15TiB 501GiB 70.13 0.92 175 > 194 hdd 1.63739 1.0 1.64TiB 1.29TiB 353GiB 78.98 1.04 202 > 199 hdd 1.63739 1.0 1.64TiB 1.34TiB 309GiB 81.58 1.07 221 > TOTAL 65.5TiB 49.7TiB 15.8TiB 75.94 > MIN/MAX VAR: 0.86/1.09 STDDEV: 3.92 -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem with CephFS - No space left on device
> root@pf-us1-dfs3:/home/rodrigo# ceph osd crush rule dump > [ > { > "rule_id": 0, > "rule_name": "replicated_rule", > "ruleset": 0, > "type": 1, > "min_size": 1, > "max_size": 10, > "steps": [ > { > "op": "take", > "item": -1, > "item_name": "default" > }, > { > "op": "chooseleaf_firstn", > "num": 0, > "type": "host" > } This means the failure domain is set to "host", the cluster will try to balance objects between "hosts" to be able to lose one host and be able to keep data online. You can change this to "disk" but in that case, your cluster will tolerate the failure of one disk but you won't be able to lose one server, you won't have the warranty that all replica of an object will be on different hosts. The best thing you can do here is added two disks to pf-us1-dfs3. The second one would be, moving one disk from one of the 2 other servers to pf-us1-dfs3 if you can't quickly get new disks. I don't know what is the best way to do that, I never had this case on my cluster. Best regards, Yoann > On Tue, Jan 8, 2019 at 11:35 AM Yoann Moulin <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > > Hi Yoann, thanks for your response. > > Here are the results of the commands. > > > > root@pf-us1-dfs2:/var/log/ceph# ceph osd df > > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > > 0 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310 > > 5 hdd 7.27739 1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271 > > 6 hdd 7.27739 1.0 7.3 TiB 609 GiB 6.7 TiB 8.17 0.15 49 > > 8 hdd 7.27739 1.0 7.3 TiB 2.5 GiB 7.3 TiB 0.03 0 42 > > 1 hdd 7.27739 1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285 > > 3 hdd 7.27739 1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296 > > 7 hdd 7.27739 1.0 7.3 TiB 360 GiB 6.9 TiB 4.84 0.09 53 > > 9 hdd 7.27739 1.0 7.3 TiB 4.1 GiB 7.3 TiB 0.06 0.00 38 > > 2 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321 > > 4 hdd 7.27739 1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351 > > TOTAL 73 TiB 39 TiB 34 TiB 53.13 > > MIN/MAX VAR: 0/1.79 STDDEV: 41.15 > > It looks like you don't have a good balance between your OSD, what is > your failure domain ? > > could you provide your crush map > http://docs.ceph.com/docs/luminous/rados/operations/crush-map/ > > ceph osd crush tree > ceph osd crush rule ls > ceph osd crush rule dump > > > > root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail > > pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 471 fla > > gs hashpspool,full stripe_width 0 > > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 256 pgp_num 256 last_change 471 lf > > or 0/439 flags hashpspool,full stripe_width 0 application cephfs > > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 256 pgp_num 256 last_change 47 > > 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs > > pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha > > shpspool,full stripe_width 0 application rgw > > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 47 > > 1 flags hashpspool,full stripe_width 0 application rgw > > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f > > lags hashpspool,full stripe_width 0 application rgw > > pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl > > ags hashpspool,full stripe_width 0 application rgw > > You may need to increase the pg num for cephfs_data pool. But before, you > must understand what is the impact https://ceph.com/pgcalc/ > you can't decrease pg_num, if it set too high you may have trouble in > your cluster. > > > root@pf-us1-dfs2:/var/log/ceph
Re: [ceph-users] Problem with CephFS - No space left on device
Hello, > Hi Yoann, thanks for your response. > Here are the results of the commands. > > root@pf-us1-dfs2:/var/log/ceph# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS > 0 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310 > 5 hdd 7.27739 1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271 > 6 hdd 7.27739 1.0 7.3 TiB 609 GiB 6.7 TiB 8.17 0.15 49 > 8 hdd 7.27739 1.0 7.3 TiB 2.5 GiB 7.3 TiB 0.03 0 42 > 1 hdd 7.27739 1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285 > 3 hdd 7.27739 1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296 > 7 hdd 7.27739 1.0 7.3 TiB 360 GiB 6.9 TiB 4.84 0.09 53 > 9 hdd 7.27739 1.0 7.3 TiB 4.1 GiB 7.3 TiB 0.06 0.00 38 > 2 hdd 7.27739 1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321 > 4 hdd 7.27739 1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351 > TOTAL 73 TiB 39 TiB 34 TiB 53.13 > MIN/MAX VAR: 0/1.79 STDDEV: 41.15 It looks like you don't have a good balance between your OSD, what is your failure domain ? could you provide your crush map http://docs.ceph.com/docs/luminous/rados/operations/crush-map/ ceph osd crush tree ceph osd crush rule ls ceph osd crush rule dump > root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail > pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 128 pgp_num 128 last_change 471 fla > gs hashpspool,full stripe_width 0 > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 256 pgp_num 256 last_change 471 lf > or 0/439 flags hashpspool,full stripe_width 0 application cephfs > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 256 pgp_num 256 last_change 47 > 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs > pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha > shpspool,full stripe_width 0 application rgw > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 47 > 1 flags hashpspool,full stripe_width 0 application rgw > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f > lags hashpspool,full stripe_width 0 application rgw > pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl > ags hashpspool,full stripe_width 0 application rgw You may need to increase the pg num for cephfs_data pool. But before, you must understand what is the impact https://ceph.com/pgcalc/ you can't decrease pg_num, if it set too high you may have trouble in your cluster. > root@pf-us1-dfs2:/var/log/ceph# ceph osd tree > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 72.77390 root default > -3 29.10956 host pf-us1-dfs1 > 0 hdd 7.27739 osd.0 up 1.0 1.0 > 5 hdd 7.27739 osd.5 up 1.0 1.0 > 6 hdd 7.27739 osd.6 up 1.0 1.0 > 8 hdd 7.27739 osd.8 up 1.0 1.0 > -5 29.10956 host pf-us1-dfs2 > 1 hdd 7.27739 osd.1 up 1.0 1.0 > 3 hdd 7.27739 osd.3 up 1.0 1.0 > 7 hdd 7.27739 osd.7 up 1.0 1.0 > 9 hdd 7.27739 osd.9 up 1.0 1.0 > -7 14.55478 host pf-us1-dfs3 > 2 hdd 7.27739 osd.2 up 1.0 1.0 > 4 hdd 7.27739 osd.4 up 1.0 1.0 You really should add 2 disks to pf-us1-dfs3, currently, the cluster tries to balance data between the 3 hosts, (replica 3, failure domain set to 'host' I guess). Each host will store 1/3 of data (1 replica) pf-us1-dfs3 only have half of the 2 others, you won't be able to put more than 3x (osd.2+osd.4) even though there are free spaces on others OSDs. Best regards, Yoann > On Tue, Jan 8, 2019 at 10:36 AM Yoann Moulin <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > > Hi guys, I need your help. > > I'm new with Cephfs and we started using it as file storage. > > Today we are getting no space left on device but I'm seeing that we > have plenty space on the filesystem. > > Filesystem Size Used Avail Use% Mounted on > > 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts > 73T 39T 35T 54% /mnt/cephfs >
Re: [ceph-users] Problem with CephFS - No space left on device
Hello, > Hi guys, I need your help. > I'm new with Cephfs and we started using it as file storage. > Today we are getting no space left on device but I'm seeing that we have > plenty space on the filesystem. > Filesystem Size Used Avail Use% Mounted on > 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts 73T > 39T 35T 54% /mnt/cephfs > > We have 35TB of disk space. I've added 2 additional OSD disks with 7TB each > but I'm getting the error "No space left on device" every time that > I want to add a new file. > After adding the 2 additional OSD disks I'm seeing that the load is beign > distributed among the cluster. > Please I need your help. Could you give us the output of ceph osd df ceph osd pool ls detail ceph osd tree Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap
Hello John, > Hello Yoann. I am working with similar issues at the moment in a biotech > company in Denmark. > > First of all what authentication setup are you using? ldap with sssd > If you are using sssd there is a very simple and useful utility called > sss_override > You can óverride' the uid which you get from LDAP with the genuine one. That's one of the option, I'm just asking if there are others or simpler solution. > Oops. On reading your email more closely. > Why not just add ceph to your /etc/group file? I tried but there is some side effect. I gave a look to the postinst script in ceph-common and I may find a way to fix this issue : > # Let the admin override these distro-specified defaults. This is NOT > # recommended! > [ -f "/etc/default/ceph" ] && . /etc/default/ceph > > [ -z "$SERVER_HOME" ] && SERVER_HOME=/var/lib/ceph > [ -z "$SERVER_USER" ] && SERVER_USER=ceph > [ -z "$SERVER_NAME" ] && SERVER_NAME="Ceph storage service" > [ -z "$SERVER_GROUP" ] && SERVER_GROUP=ceph > [ -z "$SERVER_UID" ] && SERVER_UID=64045 # alloc by Debian base-passwd > maintainer > [ -z "$SERVER_GID" ] && SERVER_GID=$SERVER_UID I can change the SERVER_UID / SERVER_GID and or SERVER_USER I'm gonna try to create a specific ceph user in the ldap and use it for ceph install. Yoann > On 15 May 2018 at 08:58, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I > have to install ceph-common to mount a cephfs filesystem but ceph-common > fails because a user with uid 65045 already exist with a group also set > at 65045. > > Server under Ubuntu 16.04.4 LTS > > > Setting up ceph-common (12.2.5-1xenial) ... > > Adding system user cephdone > > Setting system user ceph properties..usermod: group 'ceph' does not > exist > > dpkg: error processing package ceph-common (--configure): > > subprocess installed post-installation script returned error exit > status 6 > > The user is correctly created but the group not. > > > # grep ceph /etc/passwd > > ceph:x:64045:64045::/home/ceph:/bin/false > > # grep ceph /etc/group > > # > Is there a workaround for that? > > -- > Yoann Moulin > EPFL IC-IT > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > > -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap
Hello, I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I have to install ceph-common to mount a cephfs filesystem but ceph-common fails because a user with uid 65045 already exist with a group also set at 65045. Server under Ubuntu 16.04.4 LTS > Setting up ceph-common (12.2.5-1xenial) ... > Adding system user cephdone > Setting system user ceph properties..usermod: group 'ceph' does not exist > dpkg: error processing package ceph-common (--configure): > subprocess installed post-installation script returned error exit status 6 The user is correctly created but the group not. > # grep ceph /etc/passwd > ceph:x:64045:64045::/home/ceph:/bin/false > # grep ceph /etc/group > # Is there a workaround for that? -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent
Le 22/02/2018 à 05:23, Brad Hubbard a écrit : > On Wed, Feb 21, 2018 at 6:40 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >> Hello, >> >> I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible >> playbook), a few days after, ceph status told me "PG_DAMAGED >> Possible data damage: 1 pg inconsistent", I tried to repair the PG without >> success, I tried to stop the OSD, flush the journal and restart the >> OSDs but the OSD refuse to start due to a bad journal. I decided to destroy >> the OSD and recreated it from scratch. After that, everything seemed >> to be all right, but, I just saw now I have exactly the same error again on >> the same PG on the same OSD (78). >> >>> $ ceph health detail >>> HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent >>> OSD_SCRUB_ERRORS 3 scrub errors >>> PG_DAMAGED Possible data damage: 1 pg inconsistent >>> pg 11.5f is active+clean+inconsistent, acting [78,154,170] >> >>> $ ceph -s >>> cluster: >>> id: f9dfd27f-c704-4d53-9aa0-4a23d655c7c4 >>> health: HEALTH_ERR >>> 3 scrub errors >>> Possible data damage: 1 pg inconsistent >>> >>> services: >>> mon: 3 daemons, quorum >>> iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch >>> mgr: iccluster001(active), standbys: iccluster009, iccluster017 >>> mds: cephfs-3/3/3 up >>> {0=iccluster022.iccluster.epfl.ch=up:active,1=iccluster006.iccluster.epfl.ch=up:active,2=iccluster014.iccluster.epfl.ch=up:active} >>> osd: 180 osds: 180 up, 180 in >>> rgw: 6 daemons active >>> >>> data: >>> pools: 29 pools, 10432 pgs >>> objects: 82862k objects, 171 TB >>> usage: 515 TB used, 465 TB / 980 TB avail >>> pgs: 10425 active+clean >>> 6 active+clean+scrubbing+deep >>> 1 active+clean+inconsistent >>> >>> io: >>> client: 21538 B/s wr, 0 op/s rd, 33 op/s wr >> >>> ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous >>> (stable) >> >> Short log : >> >>> 2018-02-21 09:08:33.408396 7fb7b8222700 0 log_channel(cluster) log [DBG] : >>> 11.5f repair starts >>> 2018-02-21 09:08:33.727277 7fb7b8222700 -1 log_channel(cluster) log [ERR] : >>> 11.5f shard 78: soid >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head >>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9- >>> b494-57bdb48fab4e.314528.19:head(98394'20014544 osd.78.0:1623704 >>> dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd od d46bb5a1 >>> alloc_hint [0 0 0]) >>> 2018-02-21 09:08:33.727290 7fb7b8222700 -1 log_channel(cluster) log [ERR] : >>> 11.5f shard 154: soid >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head >>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544 >>> osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd >>> od d46bb5a1 alloc_hint [0 0 0]) >>> 2018-02-21 09:08:33.727293 7fb7b8222700 -1 log_channel(cluster) log [ERR] : >>> 11.5f shard 170: soid >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head >>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544 >>> osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd >>> od d46bb5a1 alloc_hint [0 0 0]) >>> 2018-02-21 09:08:33.727295 7fb7b8222700 -1 log_channel(cluster) log [ERR] : >>> 11.5f soid >>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head: >>> failed to pick suitable auth object >>> 2018-02-21 09:08:33.727333 7fb7b8222700 -1 log_channel(cluster) log [ERR] : >>> 11.5f repair 3 errors, 0 fixed >> >> I set "debug_osd 20/20" on osd.78 and start the repair again, the log file >> is here : >> >> ceph-post-file: 1ccac8ea-0947-4fe4-90b1-32d1048548f1 >> >> What can I do in that situation ? > > Take a look and see if http://tracker.ceph.com/issues/21388 is > relevant as well as the debugging and advice therein. Indeed, it looks like similar to my issue. I sent a comment directly on tracker, Thanks. Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent
Hello, I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible playbook), a few days after, ceph status told me "PG_DAMAGED Possible data damage: 1 pg inconsistent", I tried to repair the PG without success, I tried to stop the OSD, flush the journal and restart the OSDs but the OSD refuse to start due to a bad journal. I decided to destroy the OSD and recreated it from scratch. After that, everything seemed to be all right, but, I just saw now I have exactly the same error again on the same PG on the same OSD (78). > $ ceph health detail > HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent > OSD_SCRUB_ERRORS 3 scrub errors > PG_DAMAGED Possible data damage: 1 pg inconsistent > pg 11.5f is active+clean+inconsistent, acting [78,154,170] > $ ceph -s > cluster: > id: f9dfd27f-c704-4d53-9aa0-4a23d655c7c4 > health: HEALTH_ERR > 3 scrub errors > Possible data damage: 1 pg inconsistent > > services: > mon: 3 daemons, quorum > iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch > mgr: iccluster001(active), standbys: iccluster009, iccluster017 > mds: cephfs-3/3/3 up > {0=iccluster022.iccluster.epfl.ch=up:active,1=iccluster006.iccluster.epfl.ch=up:active,2=iccluster014.iccluster.epfl.ch=up:active} > osd: 180 osds: 180 up, 180 in > rgw: 6 daemons active > > data: > pools: 29 pools, 10432 pgs > objects: 82862k objects, 171 TB > usage: 515 TB used, 465 TB / 980 TB avail > pgs: 10425 active+clean > 6 active+clean+scrubbing+deep > 1 active+clean+inconsistent > > io: > client: 21538 B/s wr, 0 op/s rd, 33 op/s wr > ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous > (stable) Short log : > 2018-02-21 09:08:33.408396 7fb7b8222700 0 log_channel(cluster) log [DBG] : > 11.5f repair starts > 2018-02-21 09:08:33.727277 7fb7b8222700 -1 log_channel(cluster) log [ERR] : > 11.5f shard 78: soid > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head > omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9- > b494-57bdb48fab4e.314528.19:head(98394'20014544 osd.78.0:1623704 > dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd od d46bb5a1 > alloc_hint [0 0 0]) > 2018-02-21 09:08:33.727290 7fb7b8222700 -1 log_channel(cluster) log [ERR] : > 11.5f shard 154: soid > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head > omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544 > osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd > od d46bb5a1 alloc_hint [0 0 0]) > 2018-02-21 09:08:33.727293 7fb7b8222700 -1 log_channel(cluster) log [ERR] : > 11.5f shard 170: soid > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head > omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544 > osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd > od d46bb5a1 alloc_hint [0 0 0]) > 2018-02-21 09:08:33.727295 7fb7b8222700 -1 log_channel(cluster) log [ERR] : > 11.5f soid > 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head: > failed to pick suitable auth object > 2018-02-21 09:08:33.727333 7fb7b8222700 -1 log_channel(cluster) log [ERR] : > 11.5f repair 3 errors, 0 fixed I set "debug_osd 20/20" on osd.78 and start the repair again, the log file is here : ceph-post-file: 1ccac8ea-0947-4fe4-90b1-32d1048548f1 What can I do in that situation ? Thanks for your help. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?
Hello, What is the best kernel for Luminous on Ubuntu 16.04 ? Is linux-image-virtual-lts-xenial still the best one ? Or linux-virtual-hwe-16.04 will offer some improvement ? Thanks, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [Docs] s/ceph-disk/ceph-volume/g ?
Hello, By the fact ceph-disk is now deprecated, that would be great to update documentation to have also processes with ceph-volume. for example : add-or-rm-osds => http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ bluestore-migration => http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/ In my opinion, documentation for luminous branch should keep both options (ceph-disk and ceph-volume) but with a warning message to encourage people to use ceph-volume instead of ceph-disk. I guess, there is plenty of reference to ceph-disk that need to be updated. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-disk is now deprecated
Le 27/11/2017 à 14:36, Alfredo Deza a écrit : > For the upcoming Luminous release (12.2.2), ceph-disk will be > officially in 'deprecated' mode (bug fixes only). A large banner with > deprecation information has been added, which will try to raise > awareness. > > We are strongly suggesting using ceph-volume for new (and old) OSD > deployments. The only current exceptions to this are encrypted OSDs > and FreeBSD systems > > Encryption support is planned and will be coming soon to ceph-volume. > > A few items to consider: > > * ceph-disk is expected to be fully removed by the Mimic release > * Existing OSDs are supported by ceph-volume. They can be "taken over" [0] > * ceph-ansible already fully supports ceph-volume and will soon default to it > * ceph-deploy support is planned and should be fully implemented soon > > > [0] http://docs.ceph.com/docs/master/ceph-volume/simple/ > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Is that possible to update the "add-or-rm-osds" documentation to have also the process with ceph-volume. That would help to the adoption. http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/ This page should be updated as well with ceph-volume command. http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/ Documentation (at least for master, maybe for luminous) should keep both options (ceph-disk and ceph-volume) but with a warning message to encourage people to use ceph-volume instead of ceph-disk. I agree with comments here that say changing the status of ceph-disk as deprecated in a minor release is not what I expect for a stable storage systems but I also understand the necessity to move forward with ceph-volume (and bluestore). I think keeping ceph-disk in mimic is necessary, even though there is no update, just for compatibility with old scripts. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph S3 nginx Proxy
Hello, >> I am trying to set up an ceph cluster with an s3 buckets setup with an >> nignx proxy. >> >> I have the ceph and s3 parts working. :D >> >> when i run my php script through the nginx proxy i get an error >> "> encoding="UTF-8"?>SignatureDoesNotMatch" >> >> >> but direct it works fine. >> >> Has any one come across this before and can help out? > > My conf (may not be optimal): > > server { > listen 443 ssl http2; > listen [::]:443 ssl http2; > server_name FQDN; > > ssl_certificate /etc/ssl/certs/FQDN.crt; > ssl_certificate_key /etc/ssl/private/FQDN.key; > add_header Strict-Transport-Security 'max-age=31536000; preload'; > > location / { > include proxy_params; > proxy_redirect off; > proxy_pass http://127.0.0.1:1234; > client_max_body_size 0; > proxy_buffering off; > } > } By default in proxy_params, I don't see this line : proxy_set_header Host $host; here, the default proxy_parms on ubuntu 16.04 : $ cat proxy_params proxy_set_header Host $http_host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; I don't know if "Host $http_host;" is equivalent to "Host $host;" > And ceph's: > [client.radosgw.gateway] > host = rgw > rgw_frontends = civetweb port=127.0.0.1:1234 > keyring = /etc/ceph/keyring.radosgw.gateway In my rgw section I also have this : rgw dns name = that allows s3cmd to access to bucket with %(bucket)s.test.iccluster.epfl.ch URL Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Luminous : 3 clients failing to respond to cache pressure
>> I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) >> and we hit the "X clients failing to respond to cache pressure" message. >> I have 3 mds servers active. > > What type of client? Kernel? FUSE? > > If it's a kernel client, what kernel are you running? kernel client, version 4.10.0-35-generic, it's for kubernetes environment https://kubernetes.io/docs/concepts/storage/volumes/#cephfs https://github.com/kubernetes/examples/tree/master/staging/volumes/cephfs/ containers use this yaml template : https://github.com/kubernetes/examples/blob/master/staging/volumes/cephfs/cephfs.yaml -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Luminous : 3 clients failing to respond to cache pressure
Hello, I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) and we hit the "X clients failing to respond to cache pressure" message. I have 3 mds servers active. Is this something I have to worry about ? here some information about the cluster : > root@iccluster054:~# ceph --cluster container -s > cluster: > id: a294a95a-0baa-4641-81c1-7cd70fd93216 > health: HEALTH_WARN > 3 clients failing to respond to cache pressure > > services: > mon: 3 daemons, quorum > iccluster041.iccluster.epfl.ch,iccluster042.iccluster.epfl.ch,iccluster054.iccluster.epfl.ch > mgr: iccluster042(active), standbys: iccluster054 > mds: cephfs-3/3/3 up > {0=iccluster054.iccluster.epfl.ch=up:active,1=iccluster041.iccluster.epfl.ch=up:active,2=iccluster042.iccluster.epfl.ch=up:active} > osd: 18 osds: 18 up, 18 in > > data: > pools: 3 pools, 544 pgs > objects: 2357k objects, 564 GB > usage: 2011 GB used, 65055 GB / 67066 GB avail > pgs: 544 active+clean > > root@iccluster041:~# ceph --cluster container daemon > mds.iccluster041.iccluster.epfl.ch perf dump mds > { > "mds": { > "request": 193508283, > "reply": 192815355, > "reply_latency": { > "avgcount": 192815355, > "sum": 457371.475011160, > "avgtime": 0.002372069 > }, > "forward": 692928, > "dir_fetch": 1717132, > "dir_commit": 43521, > "dir_split": 4197, > "dir_merge": 4244, > "inode_max": 2147483647, > "inodes": 11098, > "inodes_top": 7668, > "inodes_bottom": 3404, > "inodes_pin_tail": 26, > "inodes_pinned": 143, > "inodes_expired": 138623, > "inodes_with_caps": 87, > "caps": 239, > "subtrees": 15, > "traverse": 195425369, > "traverse_hit": 192867085, > "traverse_forward": 692723, > "traverse_discover": 476, > "traverse_dir_fetch": 1714684, > "traverse_remote_ino": 0, > "traverse_lock": 6, > "load_cent": 19465322425, > "q": 0, > "exported": 1211, > "exported_inodes": 845556, > "imported": 1082, > "imported_inodes": 1209280 > } > } > root@iccluster041:~# ceph --cluster container daemon > mds.iccluster041.iccluster.epfl.ch perf dump mds > { > "mds": { > "request": 193508283, > "reply": 192815355, > "reply_latency": { > "avgcount": 192815355, > "sum": 457371.475011160, > "avgtime": 0.002372069 > }, > "forward": 692928, > "dir_fetch": 1717132, > "dir_commit": 43521, > "dir_split": 4197, > "dir_merge": 4244, > "inode_max": 2147483647, > "inodes": 11098, > "inodes_top": 7668, > "inodes_bottom": 3404, > "inodes_pin_tail": 26, > "inodes_pinned": 143, > "inodes_expired": 138623, > "inodes_with_caps": 87, > "caps": 239, > "subtrees": 15, > "traverse": 195425369, > "traverse_hit": 192867085, > "traverse_forward": 692723, > "traverse_discover": 476, > "traverse_dir_fetch": 1714684, > "traverse_remote_ino": 0, > "traverse_lock": 6, > "load_cent": 19465322425, > "q": 0, > "exported": 1211, > "exported_inodes": 845556, > "imported": 1082, > "imported_inodes": 1209280 > } > } > root@iccluster054:~# ceph --cluster container daemon > mds.iccluster054.iccluster.epfl.ch perf dump mds > { > "mds": { > "request": 267620366, > "reply": 255792944, > "reply_latency": { > "avgcount": 255792944, > "sum": 42256.407340600, > "avgtime": 0.000165197 > }, > "forward": 11827411, > "dir_fetch": 183, > "dir_commit": 2607, > "dir_split": 27, > "dir_merge": 19, > "inode_max": 2147483647, > "inodes": 3740, > "inodes_top": 2517, > "inodes_bottom": 1149, > "inodes_pin_tail": 74, > "inodes_pinned": 143, > "inodes_expired": 2103018, > "inodes_with_caps": 57, > "caps": 272, > "subtrees": 8, > "traverse": 267626346, > "traverse_hit": 255796915, > "traverse_forward": 11826902, > "traverse_discover": 77, > "traverse_dir_fetch": 30, > "traverse_remote_ino": 0, > "traverse_lock": 0, > "load_cent": 26824996745, > "q": 3, > "exported": 1319, > "exported_inodes": 2037400, > "imported": 418, > "imported_inodes": 7347 > } > } -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to restrict a CephFS client to a subdirectory
>> I am trying to follow the instructions at: >> http://docs.ceph.com/docs/master/cephfs/client-auth/ >> to restrict a client to a subdirectory of Ceph filesystem, but always get >> an error. >> >> We are running the latest stable release of Ceph (v12.2.1) on CentOS 7 >> servers. The user 'hydra' has the following capabilities: >> # ceph auth get client.hydra >> exported keyring for client.hydra >> [client.hydra] >> key = AQ== >> caps mds = "allow rw" >> caps mgr = "allow r" >> caps mon = "allow r" >> caps osd = "allow rw" >> >> When I tried to restrict the client to only mount and work within the >> directory /hydra of the Ceph filesystem 'pulpos', I got an error: >> # ceph fs authorize pulpos client.hydra /hydra rw >> Error EINVAL: key for client.dong exists but cap mds does not match >> >> I've tried a few combinations of user caps and CephFS client caps; but >> always got the same error! > > The "fs authorize" command isn't smart enough to edit existing > capabilities safely, so it is cautious and refuses to overwrite what > is already there. If you remove your client.hydra user and try again, > it should create it for you with the correct capabilities. I confirm it works perfectly ! it should be added to the documentation. :) # ceph fs authorize cephfs client.foo1 /foo1 rw [client.foo1] key = XXX1 # ceph fs authorize cephfs client.foo2 / r /foo2 rw [client.foo2] key = XXX2 # ceph auth get client.foo1 exported keyring for client.foo1 [client.foo1] key = XXX1 caps mds = "allow rw path=/foo1" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data" # ceph auth get client.foo2 exported keyring for client.foo2 [client.foo2] key = XXX2 caps mds = "allow r, allow rw path=/foo2" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data" Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to restrict a CephFS client to a subdirectory
Hello, > I am trying to follow the instructions at: > http://docs.ceph.com/docs/master/cephfs/client-auth/ > to restrict a client to a subdirectory of Ceph filesystem, but always get an > error. > > We are running the latest stable release of Ceph (v12.2.1) on CentOS 7 > servers. The user 'hydra' has the following capabilities: > # ceph auth get client.hydra > exported keyring for client.hydra > [client.hydra] > key = AQ== > caps mds = "allow rw" > caps mgr = "allow r" > caps mon = "allow r" > caps osd = "allow rw" > > When I tried to restrict the client to only mount and work within the > directory /hydra of the Ceph filesystem 'pulpos', I got an error: > # ceph fs authorize pulpos client.hydra /hydra rw > Error EINVAL: key for client.dong exists but cap mds does not match > > I've tried a few combinations of user caps and CephFS client caps; but always > got the same error! > > Has anyone able to get this to work? What is your recipe? In the case, the client runs an old kernel (at least 4.4 is old, 4.10 is not), you need to give a read access to the entire cephfs fs, if not, you won't be able to mount the subdirectory. 1/ give read access to the mds and rw to the subdirectory : # ceph auth get-or-create client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow r, allow rw path=/foo" or, if client.foo already exist : # ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow r, allow rw path=/foo" [client.foo] key = XXX caps mds = "allow r, allow rw path=/foo" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data" 2/ you give read access to / and rw access to the subdirectory : # ceph fs authorize cephfs client.foo / r /foo rw Then you get the secret key and mount : # ceph --cluster container auth get-key client.foo > foo.secret # mount.ceph mds1,mds2,mds3:/foo /foo -v -o name=foo,secretfile=/path/to/foo.secret With an old kernel, you will always be able to mount the root of the cephfs fs. # mount.ceph mds1,mds2,mds3:/ /foo -v -o name=foo,secretfile=/path/to/foo.secret if your client runs a not so old kernel you can do this : 1/ you need to give an access to the specific path like : # ceph auth get-or-create client.bar mon "allow r" osd "allow rw pool=cephfs_data" mds "allow rw path=/bar" or, if the client.bar already exist : # ceph auth caps client.bar mon "allow r" osd "allow rw pool=cephfs_data" mds "allow rw path=/bar" [client.bar] key = XXX caps mds = "allow rw path=/bar" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data" 2/ you give rw access only on the subdirectory : # ceph fs authorize cephfs client.bar /bar rw Then you get the secret key and mount : # ceph --cluster container auth get-key client.bar > bar.secret # mount.ceph mds1,mds2,mds3:/bar /bar -v -o name=bar,secretfile=/path/to/bar.secret if you try to mount the cephfs root, you should get an access denied # mount.ceph mds1,mds2,mds3:/ /bar -v -o name=bar,secretfile=/path/to/bar.secret In the case you want to increase the security, you might give a look to namespace and file layout http://docs.ceph.com/docs/master/cephfs/file-layouts/ I don't have given a look at yet but looks like really interesting ! > > Thanks, > Shaw > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] zone, zonegroup and resharding bucket on luminous
Hello, >> I'm doing some tests on the radosgw on luminous (12.2.1), I have a few >> questions. >> >> In the documentation[1], there is a reference to "radosgw-admin region get" >> but it seems not to be available anymore. >> It should be "radosgw-admin zonegroup get" I guess. >> >> 1. http://docs.ceph.com/docs/luminous/install/install-ceph-gateway/ >> >> I have installed my luminous cluster with ceph-ansible playbook. >> >> but when I try to manipulate zonegroup or zone, I have this >> >>> # radosgw-admin zonegroup get >>> failed to init zonegroup: (2) No such file or directory > > try with --rgw-zonegroup=default > >>> # radosgw-admin zone get >>> unable to initialize zone: (2) No such file or directory > > try with --rgw-zone=default > >> I guessed it's because I don't have a realm set and not default zone and >> zonegroup ? > > The default zone and zonegroup are part of the realm so without a > realm you cannot set them as defaults. > This means you have to specifiy --rgw-zonegroup=default and --rgw-zone=default > I am guessing our documentation needs updating :( > I think we can improve our behavior and make those command works > without a realm , i.e return the default zonegroup and zone. I will > open a tracker issue for that. a bug seems to be already open : http://tracker.ceph.com/issues/21583 >> Is that the default behaviour not to create default realm on a fresh radosgw >> ? Or is it a side effect of ceph-ansible installation ? >> > It is the default behavior, there is no default realm. > >> I have a bucket that referred to a zonegroup but without realm. Can I create >> a default realm ? Is that safe for the bucket that has already been >> uploaded ? >> > Yes You can create a realm and add the zonegroup to it. > Don't forgot to do "radosgw-admin period update --commit" to commit the > changes. I did that : # radosgw-admin realm create --rgw-realm=default --default { "id": "b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a", "name": "default", "current_period": "e7bfcb5a-829b-418f-ae26-d6573a5cc8b9", "epoch": 2 } # radosgw-admin zonegroup modify --realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a --rgw-zonegroup=default --default # radosgw-admin zone modify --realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a --rgw-zone=default --default # radosgw-admin period update --commit and it works now, I can edit zone and zonegroup :) >> On the "default" zonegroup (which is not set as default), the >> "bucket_index_max_shards" is set to "0", can I modify it without reaml ? >> > I just updated this section in this pr: > https://github.com/ceph/ceph/pull/18063 as discuss on irc, I did that but found on a bug : # radosgw-admin bucket reshard process --bucket image-net --num-shards=150 => http://tracker.ceph.com/issues/21619 Thanks, Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
>> In cases like this you also want to set RADOS namespaces for each tenant’s >> directory in the CephFS layout and give them OSD access to only that >> namespace. That will prevent malicious users from tampering with the raw >> RADOS objects of other users. > > You mean by doing something like : > > ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data > namespace=foo" mds "allow rw path=/foo" ? > > [client.foo] > key = [snip] > caps mds = "allow rw path=/foo" > caps mon = "allow r" > caps osd = "allow rw pool=cephfs_data namespace=foo" > > or you are referring also to : > > http://docs.ceph.com/docs/master/cephfs/file-layouts/ > > Yes, both of those. The "auth caps" portion gives the client permission on > the OSD to access the namespace "foo". The file layouts place the > CephFS file data into that namespace. OK, I will give a look next week. Thank you. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] zone, zonegroup and resharding bucket on luminous
; } > } > ], > "metadata_heap": "", > "tier_config": [], > "realm_id": "" > } > # radosgw-admin metadata get bucket:image-net > { > "key": "bucket:image-net", > "ver": { > "tag": "_2_RFnI5pKQV7XEc5s2euJJW", > "ver": 1 > }, > "mtime": "2017-08-28 12:27:35.629882Z", > "data": { > "bucket": { > "name": "image-net", > "marker": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1", > "bucket_id": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1", > "tenant": "", > "explicit_placement": { > "data_pool": "", > "data_extra_pool": "", > "index_pool": "" > } > }, > "owner": "rgwadmin", > "creation_time": "2017-08-28 12:27:33.492997Z", > "linked": "true", > "has_bucket_info": "false" > } > } > # radosgw-admin metadata get > bucket.instance:image-net:69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1 > { > "key": > "bucket.instance:image-net:69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1", > "ver": { > "tag": "_HJUIdLuc8HJdxWhortpLiE7", > "ver": 3 > }, > "mtime": "2017-09-26 14:14:47.749267Z", > "data": { > "bucket_info": { > "bucket": { > "name": "image-net", > "marker": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1", > "bucket_id": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1", > "tenant": "", > "explicit_placement": { > "data_pool": "", > "data_extra_pool": "", > "index_pool": "" > } > }, > "creation_time": "2017-08-28 12:27:33.492997Z", > "owner": "rgwadmin", > "flags": 0, > "zonegroup": "43d23097-56b9-48a6-ad52-de42341be4bd", > "placement_rule": "default-placement", > "has_instance_obj": "true", > "quota": { > "enabled": false, > "check_on_raw": false, > "max_size": -1, > "max_size_kb": 0, > "max_objects": -1 > }, > "num_shards": 0, > "bi_shard_hash_type": 0, > "requester_pays": "false", > "has_website": "false", > "swift_versioning": "false", > "swift_ver_location": "", > "index_type": 0, > "mdsearch_config": [], > "reshard_status": 0, > "new_bucket_instance_id": "" > }, > "attrs": [ > { > "key": "user.rgw.acl", > "val": > "AgKdAwIdCHJnd2FkbWluDQAAAFJhZG9zZ3cgQWRtaW4EA3QBAQgAAAByZ3dhZG1pbg8BCHJnd2FkbWluBQNBAgIEAAgAAAByZ3dhZG1pbgAAAgIEDw0AAABSYWRvc2d3IEFkbWluAA==" > }, > { > "key": "user.rgw.idtag", > "val": "" > } > ] > } > } -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
Hi, >>>>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96 >>>>> >>>>> What is exactly an older kernel client ? 4.4 is old ? >>>>> >>>>> See >>>>> http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version >>>>> >>>>> If you're on Ubuntu Xenial I would advise to use >>>>> "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel. >>>> >>>> OK, but I still cannot set caps without read access to "/" on cephfs >>>> volume, is there something else I must do ? >>>> >>>> # ceph auth get-or-create client.foo mon "allow r" osd "allow rw >>>> pool=cephfs_data" mds "allow rw path=/foo" >>>> Error EINVAL: key for client.foo exists but cap mds does not match >>>> >>>> # ceph fs authorize cephfs client.foo /foo rw >>>> Error EINVAL: key for client.foo exists but cap mds does not match >>> >>> Use "ceph auth list" to check the current caps for the client. With ceph >>> auth caps (note, _not_ get-or-create) you can update the caps: >>> >>> ceph auth caps client.foo mon "allow r" osd "allow rw >>> pool=cephfs_data" mds "allow rw path=/foo" >>> >>> The command should return "updated caps for client.foo" >> >> oops, you're right I must use "ceph auth caps" and not "ceph auth >> get-or-create" >> >> # ceph auth caps client.foo mon "allow r" osd "allow rw >> pool=cephfs_data" mds "allow rw path=/foo" >> updated caps for client.foo > > In cases like this you also want to set RADOS namespaces for each tenant’s > directory in the CephFS layout and give them OSD access to only that > namespace. That will prevent malicious users from tampering with the raw > RADOS objects of other users. You mean by doing something like : ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data namespace=foo" mds "allow rw path=/foo" ? [client.foo] key = [snip] caps mds = "allow rw path=/foo" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data namespace=foo" or you are referring also to : http://docs.ceph.com/docs/master/cephfs/file-layouts/ -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
>>>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96 >>>> >>>> What is exactly an older kernel client ? 4.4 is old ? >>> >>> See >>> http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version >>> >>> If you're on Ubuntu Xenial I would advise to use >>> "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel. >> >> OK, but I still cannot set caps without read access to "/" on cephfs volume, >> is there something else I must do ? >> >> # ceph auth get-or-create client.foo mon "allow r" osd "allow rw >> pool=cephfs_data" mds "allow rw path=/foo" >> Error EINVAL: key for client.foo exists but cap mds does not match >> >> # ceph fs authorize cephfs client.foo /foo rw >> Error EINVAL: key for client.foo exists but cap mds does not match > > Use "ceph auth list" to check the current caps for the client. With ceph > auth caps (note, _not_ get-or-create) you can update the caps: > > ceph auth caps client.foo mon "allow r" osd "allow rw > pool=cephfs_data" mds "allow rw path=/foo" > > The command should return "updated caps for client.foo" oops, you're right I must use "ceph auth caps" and not "ceph auth get-or-create" so finally I did that : # ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow rw path=/foo" updated caps for client.foo # ceph fs authorize cephfs client.foo /foo rw [client.foo] key = [snip] On the client : # uname -a Linux ntxvm006 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux # mount.ceph iccluster041,iccluster042,iccluster054:/ /mnt -v -o name=foo,secret=[snip] parsing options: name=foo,secret=[snip] mount error 13 = Permission denied # mount.ceph iccluster041,iccluster042,iccluster054:/foo /mnt -v -o name=foo,secret=[snip] parsing options: name=foo,secret=[snip] # df /mnt Filesystem1K-blocks Used Available Use% Mounted on 10.90.38.17,10.90.38.18,10.90.39.5:/foo 70324469760 26267648 70298202112 1% /mnt It seems to work as I want. Thanks a lot ! Cheers, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96 >> >> What is exactly an older kernel client ? 4.4 is old ? > > See > http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version > > If you're on Ubuntu Xenial I would advise to use > "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel. OK, but I still cannot set caps without read access to "/" on cephfs volume, is there something else I must do ? # ceph auth get-or-create client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow rw path=/foo" Error EINVAL: key for client.foo exists but cap mds does not match # ceph fs authorize cephfs client.foo /foo rw Error EINVAL: key for client.foo exists but cap mds does not match Thanks, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
>>>> We are working on a POC with containers (kubernetes) and cephfs (for >>>> permanent storage). >>>> >>>> The main idea is to give to a user access to a subdirectory of the >>>> cephfs but be sure he won't be able to access to the rest of the >>>> storage. As k8s works, the user will have access to the yml file >>>> where the cephfs mount point is defined. He will be able to change >>>> the subdirectory mounted inside the container (and set it to /). And >>>> inside the container, the user is root… >>>> >>>> So if even the user doesn't have access to the secret, he will be >>>> able to mount the whole cephfs volume with read access. >>>> >>>> Is there a possibility to have "root_squash" option on cephfs volume >>>> for a specific client.user + secret? >>>> >>>> Is it possible to allow a specific user to mount only /bla and >>>> disallow to mount the cephfs root "/"? >>>> >>>> Or is there another way to do that? >>> >>> Maybe this will get you started with the permissions for only this fs >>> path /smb >>> >>> sudo ceph auth get-or-create client.cephfs.smb mon 'allow r' mds >>> 'allow r, allow rw path=/smb' osd 'allow rwx pool=fs_meta,allow rwx >>> pool=fs_data' >> >> What I currently do is : >> >> mkdir /cephfs/foo >> chown nobody:foogrp /cephfs/foo >> chmod 770 /cephfs/foo >> ceph auth get-or-create client.foo mon "allow r" osd "allow rw >> pool=cephfs_data" mds "allow r, allow rw path=/foo" >> ceph fs authorize cephfs client.foo / r /foo rw >> >> so I have this for client.foo >> >> [client.foo] >> key = [secret] >> caps mds = "allow r, allow rw path=/foo" >> caps mon = "allow r" >> caps osd = "allow rw pool=cephfs_data" >> >> With this, the user foo is able to mount the root of the cephfs and read >> everything, of course, he cannot modify but my problem here is he is >> still able to have read access to everything with uid=0. > > I think that is because of the older kernel client, like mentioned here?> > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39734.html Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96 What is exactly an older kernel client ? 4.4 is old ? if I remove "/ r" in the "auth caps" or "fs authorize" : # ceph auth get-or-create client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow rw path=/foo" Error EINVAL: key for client.foo exists but cap mds does not match # ceph fs authorize cephfs client.foo /foo rw Error EINVAL: key for client.foo exists but cap mds does not match # ceph fs authorize cephfs client.foo / r /foo rw [client.foo] key = [secret] -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs : security questions?
>> We are working on a POC with containers (kubernetes) and cephfs (for >> permanent storage). >> >> The main idea is to give to a user access to a subdirectory of the >> cephfs but be sure he won't be able to access to the rest of the >> storage. As k8s works, the user will have access to the yml file where >> the cephfs mount point is defined. He will be able to change the >> subdirectory mounted inside the container (and set it to /). And inside >> the container, the user is root… >> >> So if even the user doesn't have access to the secret, he will be able >> to mount the whole cephfs volume with read access. >> >> Is there a possibility to have "root_squash" option on cephfs volume for >> a specific client.user + secret? >> >> Is it possible to allow a specific user to mount only /bla and disallow >> to mount the cephfs root "/"? >> >> Or is there another way to do that? > > Maybe this will get you started with the permissions for only this fs > path /smb > > sudo ceph auth get-or-create client.cephfs.smb mon 'allow r' mds 'allow > r, allow rw path=/smb' osd 'allow rwx pool=fs_meta,allow rwx > pool=fs_data' What I currently do is : mkdir /cephfs/foo chown nobody:foogrp /cephfs/foo chmod 770 /cephfs/foo ceph auth get-or-create client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds "allow r, allow rw path=/foo" ceph fs authorize cephfs client.foo / r /foo rw so I have this for client.foo [client.foo] key = [secret] caps mds = "allow r, allow rw path=/foo" caps mon = "allow r" caps osd = "allow rw pool=cephfs_data" With this, the user foo is able to mount the root of the cephfs and read everything, of course, he cannot modify but my problem here is he is still able to have read access to everything with uid=0. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Cephfs : security questions?
Hello, We are working on a POC with containers (kubernetes) and cephfs (for permanent storage). The main idea is to give to a user access to a subdirectory of the cephfs but be sure he won't be able to access to the rest of the storage. As k8s works, the user will have access to the yml file where the cephfs mount point is defined. He will be able to change the subdirectory mounted inside the container (and set it to /). And inside the container, the user is root… So if even the user doesn't have access to the secret, he will be able to mount the whole cephfs volume with read access. Is there a possibility to have "root_squash" option on cephfs volume for a specific client.user + secret? Is it possible to allow a specific user to mount only /bla and disallow to mount the cephfs root "/"? Or is there another way to do that? Thanks, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Minimum requirements to mount luminous cephfs ?
Le 27/09/2017 à 15:15, David Turner a écrit : > You can also use ceph-fuse instead of the kernel driver to mount cephfs. It > supports all of the luminous features. OK thanks, I will try this after, I need to be able to mount the cephfs directly into containers, I don't know what will the best way to do it so if I have multiple solutions, that will be great. Thanks, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Minimum requirements to mount luminous cephfs ?
Hello, > Try to work with the tunables: > > $ *ceph osd crush show-tunables* > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 0, > "straw_calc_version": 1, > "allowed_bucket_algs": 54, > "profile": "hammer", > "optimal_tunables": 0, > "legacy_tunables": 0, > "minimum_required_version": "firefly", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 0, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 0, > "require_feature_tunables5": 0, > "has_v5_rules": 0 > } > > try to 'disable' the '*require_feature_tunables5*', with that I think you > should be ok, maybe there's another way, but that works for me. One > way to change it, is to comment out in the crushmap the option "*tunable > chooseleaf_stable 1*" and inject the crushmap again in the cluster (of > course that would produce on a lot of data moving on the pgs) Thanks a lot, I removed the line "tunable chooseleaf_stable 1" from the crushmap and it works now ! root@iccluster013:~# df -h /mnt/ FilesystemSize Used Avail Use% Mounted on 10.90.38.17,10.90.38.18,10.90.39.5:/ 66T 19G 66T 1% /mnt Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Minimum requirements to mount luminous cephfs ?
Hello, I try to mount a cephfs filesystem from fresh luminous cluster. With the latest kernel 4.13.3, it works > $ sudo mount.ceph > iccluster041.iccluster,iccluster042.iccluster,iccluster054.iccluster:/ /mnt > -v -o name=container001,secretfile=/tmp/secret > parsing options: name=container001,secretfile=/tmp/secret > $ df -h /mnt > FilesystemSize Used Avail Use% Mounted on > 10.90.38.17,10.90.38.18,10.90.39.5:/ 66T 19G 66T 1% /mnt > root@iccluster054:~# ceph auth get client.container001 > exported keyring for client.container001 > [client.container001] > key = > caps mds = "allow rw" > caps mon = "allow r" > caps osd = "allow rw pool=cephfs_data" > root@iccluster05:~#:/var/log# ceph --cluster container fs authorize cephfs > client.container001 / rw > [client.container001] > key = With the latest Ubuntu 16.04 LTS Kernel and ceph-common 12.2.0, I'm not able to mount it > Linux iccluster013 4.4.0-96-generic #119~14.04.1-Ubuntu SMP Wed Sep 13 > 08:40:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > ii ceph-common 12.2.0-1trusty > amd64common utilities to mount and interact with a ceph storage > cluster > root@iccluster013:~# mount.ceph iccluster041,iccluster042,iccluster054:/ > /mnt -v -o name=container001,secretfile=/tmp/secret > parsing options: name=container001,secretfile=/tmp/secret > mount error 110 = Connection timed out here the dmesg : > [ 417.528621] Key type ceph registered > [ 417.528996] libceph: loaded (mon/osd proto 15/24) > [ 417.540534] FS-Cache: Netfs 'ceph' registered for caching > [ 417.540546] ceph: loaded (mds proto 32) > [...] > [ 2596.609885] libceph: mon1 10.90.38.18:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 400 > [ 2596.626797] libceph: mon1 10.90.38.18:6789 missing required protocol > features > [ 2606.960704] libceph: mon0 10.90.38.17:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 400 > [ 2606.977621] libceph: mon0 10.90.38.17:6789 missing required protocol > features > [ 2616.944998] libceph: mon0 10.90.38.17:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 400 > [ 2616.961917] libceph: mon0 10.90.38.17:6789 missing required protocol > features > [ 2626.961329] libceph: mon0 10.90.38.17:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 400 > [ 2626.978290] libceph: mon0 10.90.38.17:6789 missing required protocol > features > [ 2636.945765] libceph: mon0 10.90.38.17:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 400 > [ 2636.962677] libceph: mon0 10.90.38.17:6789 missing required protocol > features > [ 2646.962255] libceph: mon1 10.90.38.18:6789 feature set mismatch, my > 107b84a842aca < server's 40107b84a842aca, missing 4000000 > [ 2646.979228] libceph: mon1 10.90.38.18:6789 missing required protocol > features Is there specific option to set on the cephfs to be able to mount it on a kernel 4.4 ? Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Access to rbd with a user key
>> ok, I don't know where I read the -o option to write the key but the file >> was empty I do a ">" and seems to work to list or create rbd now. >> >> and for what I have tested then, the good syntax is « mon 'profile rbd' osd >> 'profile rbd pool=rbd' » >> >>> In the case we give access to those rbd inside the container, how I can be >>> sure users in each container do not have access to others rbd ? Is >>> the namespace good to isolate each user ? >> >> The question about namespace is still open, if I have a namespace in the osd >> caps, I can't create rbd volume. How I can isolate each client to >> only his own volumes ? > > Unfortunately, RBD doesn't currently support namespaces, but it's on > our backlog. So if I want to separate data between each container, I need to create a pool per user (one user can have multiple containers). I'm gonna give a look to cephfs, it seems possible to allow access only to a subdirectory per user, could you confirm it ? Thanks, Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Access to rbd with a user key
Hello, > I try to give access to a rbd to a client on a fresh Luminous cluster > > http://docs.ceph.com/docs/luminous/rados/operations/user-management/ > > first of all, I'd like to know the exact syntax for auth caps > > the result of "ceph auth ls" give this : > >> osd.9 >> key: AQDjAsVZ+nI7NBAA14X9U5Xjunlk/9ovTht3Og== >> caps: [mgr] allow profile osd >> caps: [mon] allow profile osd >> caps: [osd] allow * > > but in the documentation, it writes : > >> osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]' > > Does the "allow" needed before "profile" ? it's not clear > > If I create a user like this : > >> # ceph --cluster container auth get-or-create client.container001 \ >> mon 'allow profile rbd' \ >> osd 'allow profile rbd \ >> pool=rbd namespace=container001' \ >> -o /etc/ceph/container.client.container001.keyring ok, I don't know where I read the -o option to write the key but the file was empty I do a ">" and seems to work to list or create rbd now. and for what I have tested then, the good syntax is « mon 'profile rbd' osd 'profile rbd pool=rbd' » > In the case we give access to those rbd inside the container, how I can be > sure users in each container do not have access to others rbd ? Is > the namespace good to isolate each user ? The question about namespace is still open, if I have a namespace in the osd caps, I can't create rbd volume. How I can isolate each client to only his own volumes ? Thanks for your help Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Access to rbd with a user key
Hello, I try to give access to a rbd to a client on a fresh Luminous cluster http://docs.ceph.com/docs/luminous/rados/operations/user-management/ first of all, I'd like to know the exact syntax for auth caps the result of "ceph auth ls" give this : > osd.9 > key: AQDjAsVZ+nI7NBAA14X9U5Xjunlk/9ovTht3Og== > caps: [mgr] allow profile osd > caps: [mon] allow profile osd > caps: [osd] allow * but in the documentation, it writes : > osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]' Does the "allow" needed before "profile" ? it's not clear If I create a user like this : > # ceph --cluster container auth get-or-create client.container001 \ > mon 'allow profile rbd' \ > osd 'allow profile rbd \ > pool=rbd namespace=container001' \ > -o /etc/ceph/container.client.container001.keyring Is this user able to create an rbd volume ? > # rbd --cluster container create --size 1024 rbd/container003 --id > client.container001 --keyring /etc/ceph/container.client.container001.keyring > 2017-09-26 09:54:10.158234 7fbda23270c0 0 librados: > client.client.container001 authentication error (22) Invalid argument > rbd: couldn't connect to the cluster! In that case client.client.container001 does not exist, I tried without "client." but failed as well with another error. > # rbd --cluster container create --size 1024 rbd/container003 --id > container001 --keyring /etc/ceph/container.client.container001.keyring > 2017-09-26 09:55:11.869745 7f10de6d30c0 0 librados: client.container001 > authentication error (22) Invalid argument > rbd: couldn't connect to the cluster! it works if I create the rbd volume like : > # rbd --cluster container create --size 1024 rbd/container003 Then I can get rbd volume information with the admin key but not with the user key. > # rbd --cluster container info rbd/container003 > rbd image 'container003': > size 1024 MB in 256 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.5f7c74b0dc51 > format: 2 > features: layering, exclusive-lock, object-map, fast-diff, deep-flatten > flags: > create_timestamp: Tue Sep 26 09:54:50 2017 > # rbd --cluster container info rbd/container003 --keyring > /etc/ceph/container.client.container001.keyring > 2017-09-26 09:58:29.864348 7f2fe60780c0 0 librados: client.admin > authentication error (22) Invalid argument > rbd: couldn't connect to the cluster! > # rbd --cluster container info rbd/container003 --keyring > /etc/ceph/container.client.container001.keyring --id client.container001 > 2017-09-26 09:58:38.971827 7fcafa7aa0c0 0 librados: > client.client.container001 authentication error (22) Invalid argument > rbd: couldn't connect to the cluster! > # rbd --cluster container info rbd/container003 --keyring > /etc/ceph/container.client.container001.keyring --id container001 > 2017-09-26 09:58:45.515253 7fbb0208c0c0 0 librados: client.container001 > authentication error (22) Invalid argument > rbd: couldn't connect to the cluster! I might have missed something somewhere, but I don't know where. Does the "rbd profile" give the capability to create rbd volumes to the user ? or it just gives the access to rbd volume previously create by the admin ? In the case we give access to those rbd inside the container, how I can be sure users in each container do not have access to others rbd ? Is the namespace good to isolate each user ? I haven't used a lot rbd before and never use client keys capabilities, it might a bit confuse for me. Thanks for your help Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] s3cmd not working with luminous radosgw
Hi Matt, >>>> Does anyone have tested s3cmd or other tools to manage ACL on luminous >>>> radosGW ? >>> >>> Don't know about ACL, but s3cmd for other things works for me. Version >>> 1.6.1 >> >> Finally, I found out what happened, I had 2 issues. One, on s3cmd config >> file, radosgw with luminous does not support signature v2 anymore, only >> v4 is supported, I had to add this to my .s3cfg file : > > V4 is supported, but to the best of my knowledge, you can use sigv2 if > desired. Indeed, it seems to work in sigv2 :) >> The second was in the rgw section into ceph.conf file. The line "rgw dns >> name" was missing. > > Depending on your setup, "rgw dns name" may be required, yes. in my case, it seems to be mandatory Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] s3cmd not working with luminous radosgw
Hello, >> Does anyone have tested s3cmd or other tools to manage ACL on luminous >> radosGW ? > > Don't know about ACL, but s3cmd for other things works for me. Version 1.6.1 Finally, I found out what happened, I had 2 issues. One, on s3cmd config file, radosgw with luminous does not support signature v2 anymore, only v4 is supported, I had to add this to my .s3cfg file : The second was in the rgw section into ceph.conf file. The line "rgw dns name" was missing. I have deployed my cluster with ceph-ansible and it seems that I need a new option in the all.ym file : I have added it manually and now it works (ansible-playbook didn't add it, I must figure out why). Thanks for you help Best regards, Yoann Moulin >>> I have a fresh luminous cluster in test and I made a copy of a bucket (4TB >>> 1.5M files) with rclone, I'm able to list/copy files with rclone but >>> s3cmd does not work at all, it is just able to give the bucket list but I >>> can't list files neither update ACL. >>> >>> does anyone already test this ? >>> >>> root@iccluster012:~# rclone --version >>> rclone v1.37 >>> >>> root@iccluster012:~# s3cmd --version >>> s3cmd version 2.0.0 >>> >>> >>> ### rclone ls files ### >>> >>> root@iccluster012:~# rclone ls testadmin:image-net/LICENSE >>> 1589 LICENSE >>> root@iccluster012:~# >>> >>> nginx (as revers proxy) log : >>> >>>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE >>>> HTTP/1.1" 200 0 "-" "rclone/v1.37" >>>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET >>>> /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" >>>> "rclone/v1.37" >>> >>> rgw logs : >>> >>>> 2017-09-15 10:30:02.620266 7ff1f58f7700 1 == starting new request >>>> req=0x7ff1f58f11f0 = >>>> 2017-09-15 10:30:02.622245 7ff1f58f7700 1 == req done >>>> req=0x7ff1f58f11f0 op status=0 http_status=200 == >>>> 2017-09-15 10:30:02.622324 7ff1f58f7700 1 civetweb: 0x56061584b000: >>>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE >>>> HTTP/1.0" 1 0 - rclone/v1.37 >>>> 2017-09-15 10:30:02.623361 7ff1f50f6700 1 == starting new request >>>> req=0x7ff1f50f01f0 = >>>> 2017-09-15 10:30:02.689632 7ff1f50f6700 1 == req done >>>> req=0x7ff1f50f01f0 op status=0 http_status=200 == >>>> 2017-09-15 10:30:02.689719 7ff1f50f6700 1 civetweb: 0x56061585: >>>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET >>>> /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37 >>> >>> >>> >>> ### s3cmds ls files ### >>> >>> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls >>> s3://image-net/LICENSE >>> root@iccluster012:~# >>> >>> nginx (as revers proxy) log : >>> >>>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET >>>> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-" >>>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET >>>> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE >>>> HTTP/1.1" 200 318 "-" "-" >>> >>> rgw logs : >>> >>>> 2017-09-15 10:30:04.295355 7ff1f48f5700 1 == starting new request >>>> req=0x7ff1f48ef1f0 = >>>> 2017-09-15 10:30:04.295913 7ff1f48f5700 1 == req done >>>> req=0x7ff1f48ef1f0 op status=0 http_status=200 == >>>> 2017-09-15 10:30:04.295977 7ff1f48f5700 1 civetweb: 0x560615855000: >>>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location >>>> HTTP/1.0" 1 0 - - >>>> 2017-09-15 10:30:04.299303 7ff1f40f4700 1 == starting new request >>>> req=0x7ff1f40ee1f0 = >>>> 2017-09-15 10:30:04.300993 7ff1f40f4700 1 == req done >>>> req=0x7ff1f40ee1f0 op status=0 http_status=200 == >>>> 2017-09-15 10:30:04.301070 7ff1f40f4700 1 civetweb: 0x56061585a000: >>>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET >>>> /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - >>> >>> >>> >>> ### s3cmd : list bucket ### >>> >>> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3:// >>> 2017-08-
Re: [ceph-users] s3cmd not working with luminous radosgw
Hello, Does anyone have tested s3cmd or other tools to manage ACL on luminous radosGW ? I have opened an issue on s3cmd too https://github.com/s3tools/s3cmd/issues/919 Thanks for your help Yoann > I have a fresh luminous cluster in test and I made a copy of a bucket (4TB > 1.5M files) with rclone, I'm able to list/copy files with rclone but > s3cmd does not work at all, it is just able to give the bucket list but I > can't list files neither update ACL. > > does anyone already test this ? > > root@iccluster012:~# rclone --version > rclone v1.37 > > root@iccluster012:~# s3cmd --version > s3cmd version 2.0.0 > > > ### rclone ls files ### > > root@iccluster012:~# rclone ls testadmin:image-net/LICENSE > 1589 LICENSE > root@iccluster012:~# > > nginx (as revers proxy) log : > >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE >> HTTP/1.1" 200 0 "-" "rclone/v1.37" >> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET >> /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" >> "rclone/v1.37" > > rgw logs : > >> 2017-09-15 10:30:02.620266 7ff1f58f7700 1 == starting new request >> req=0x7ff1f58f11f0 = >> 2017-09-15 10:30:02.622245 7ff1f58f7700 1 == req done >> req=0x7ff1f58f11f0 op status=0 http_status=200 == >> 2017-09-15 10:30:02.622324 7ff1f58f7700 1 civetweb: 0x56061584b000: >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE >> HTTP/1.0" 1 0 - rclone/v1.37 >> 2017-09-15 10:30:02.623361 7ff1f50f6700 1 == starting new request >> req=0x7ff1f50f01f0 = >> 2017-09-15 10:30:02.689632 7ff1f50f6700 1 == req done >> req=0x7ff1f50f01f0 op status=0 http_status=200 == >> 2017-09-15 10:30:02.689719 7ff1f50f6700 1 civetweb: 0x56061585: >> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET >> /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37 > > > > ### s3cmds ls files ### > > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls > s3://image-net/LICENSE > root@iccluster012:~# > > nginx (as revers proxy) log : > >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET >> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-" >> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET >> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE >> HTTP/1.1" 200 318 "-" "-" > > rgw logs : > >> 2017-09-15 10:30:04.295355 7ff1f48f5700 1 == starting new request >> req=0x7ff1f48ef1f0 = >> 2017-09-15 10:30:04.295913 7ff1f48f5700 1 == req done >> req=0x7ff1f48ef1f0 op status=0 http_status=200 == >> 2017-09-15 10:30:04.295977 7ff1f48f5700 1 civetweb: 0x560615855000: >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location >> HTTP/1.0" 1 0 - - >> 2017-09-15 10:30:04.299303 7ff1f40f4700 1 == starting new request >> req=0x7ff1f40ee1f0 = >> 2017-09-15 10:30:04.300993 7ff1f40f4700 1 == req done >> req=0x7ff1f40ee1f0 op status=0 http_status=200 == >> 2017-09-15 10:30:04.301070 7ff1f40f4700 1 civetweb: 0x56061585a000: >> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET >> /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - > > > > ### s3cmd : list bucket ### > > root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3:// > 2017-08-28 12:27 s3://image-net > root@iccluster012:~# > > nginx (as revers proxy) log : > >> ==> nginx/access.log <== >> 10.90.37.13 - - [15/Sep/2017:10:36:10 +0200] "GET >> http://test.iccluster.epfl.ch/ HTTP/1.1" 200 318 "-" "-" > > rgw logs : > >> 2017-09-15 10:36:10.645354 7ff1f38f3700 1 == starting new request >> req=0x7ff1f38ed1f0 = >> 2017-09-15 10:36:10.647419 7ff1f38f3700 1 == req done >> req=0x7ff1f38ed1f0 op status=0 http_status=200 == >> 2017-09-15 10:36:10.647488 7ff1f38f3700 1 civetweb: 0x56061585f000: >> 127.0.0.1 - - [15/Sep/2017:10:36:10 +0200] "GET / HTTP/1.0" 1 0 - - > > > > ### rclone : list bucket ### > > > root@iccluster012:~# rclone lsd testadmin: > -1 2017-08-28 12:27:33-1 image-net > root@iccluster012:~# > > nginx (as revers proxy) log : > >> ==> nginx/access.log <== >> 10.90.37.13 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.1" 200 318 "-" >> "rclone/v1.37" > > rgw logs : > >> ==> ceph/luminous-rgw-iccluster015.log <== >> 2017-09-15 10:37:53.005424 7ff1f28f1700 1 == starting new request >> req=0x7ff1f28eb1f0 = >> 2017-09-15 10:37:53.007192 7ff1f28f1700 1 == req done >> req=0x7ff1f28eb1f0 op status=0 http_status=200 == >> 2017-09-15 10:37:53.007282 7ff1f28f1700 1 civetweb: 0x56061586e000: >> 127.0.0.1 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.0" 1 0 - >> rclone/v1.37 -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] s3cmd not working with luminous radosgw
Hello, I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 1.5M files) with rclone, I'm able to list/copy files with rclone but s3cmd does not work at all, it is just able to give the bucket list but I can't list files neither update ACL. does anyone already test this ? root@iccluster012:~# rclone --version rclone v1.37 root@iccluster012:~# s3cmd --version s3cmd version 2.0.0 ### rclone ls files ### root@iccluster012:~# rclone ls testadmin:image-net/LICENSE 1589 LICENSE root@iccluster012:~# nginx (as revers proxy) log : > 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE > HTTP/1.1" 200 0 "-" "rclone/v1.37" > 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET > /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" > "rclone/v1.37" rgw logs : > 2017-09-15 10:30:02.620266 7ff1f58f7700 1 == starting new request > req=0x7ff1f58f11f0 = > 2017-09-15 10:30:02.622245 7ff1f58f7700 1 == req done req=0x7ff1f58f11f0 > op status=0 http_status=200 == > 2017-09-15 10:30:02.622324 7ff1f58f7700 1 civetweb: 0x56061584b000: > 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE HTTP/1.0" > 1 0 - rclone/v1.37 > 2017-09-15 10:30:02.623361 7ff1f50f6700 1 == starting new request > req=0x7ff1f50f01f0 = > 2017-09-15 10:30:02.689632 7ff1f50f6700 1 == req done req=0x7ff1f50f01f0 > op status=0 http_status=200 == > 2017-09-15 10:30:02.689719 7ff1f50f6700 1 civetweb: 0x56061585: > 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET > /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37 ### s3cmds ls files ### root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://image-net/LICENSE root@iccluster012:~# nginx (as revers proxy) log : > 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET > http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-" > 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET > http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE > HTTP/1.1" 200 318 "-" "-" rgw logs : > 2017-09-15 10:30:04.295355 7ff1f48f5700 1 == starting new request > req=0x7ff1f48ef1f0 = > 2017-09-15 10:30:04.295913 7ff1f48f5700 1 == req done req=0x7ff1f48ef1f0 > op status=0 http_status=200 == > 2017-09-15 10:30:04.295977 7ff1f48f5700 1 civetweb: 0x560615855000: > 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location > HTTP/1.0" 1 0 - - > 2017-09-15 10:30:04.299303 7ff1f40f4700 1 == starting new request > req=0x7ff1f40ee1f0 = > 2017-09-15 10:30:04.300993 7ff1f40f4700 1 == req done req=0x7ff1f40ee1f0 > op status=0 http_status=200 == > 2017-09-15 10:30:04.301070 7ff1f40f4700 1 civetweb: 0x56061585a000: > 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET > /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - ### s3cmd : list bucket ### root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3:// 2017-08-28 12:27 s3://image-net root@iccluster012:~# nginx (as revers proxy) log : > ==> nginx/access.log <== > 10.90.37.13 - - [15/Sep/2017:10:36:10 +0200] "GET > http://test.iccluster.epfl.ch/ HTTP/1.1" 200 318 "-" "-" rgw logs : > 2017-09-15 10:36:10.645354 7ff1f38f3700 1 == starting new request > req=0x7ff1f38ed1f0 = > 2017-09-15 10:36:10.647419 7ff1f38f3700 1 == req done req=0x7ff1f38ed1f0 > op status=0 http_status=200 == > 2017-09-15 10:36:10.647488 7ff1f38f3700 1 civetweb: 0x56061585f000: > 127.0.0.1 - - [15/Sep/2017:10:36:10 +0200] "GET / HTTP/1.0" 1 0 - - ### rclone : list bucket ### root@iccluster012:~# rclone lsd testadmin: -1 2017-08-28 12:27:33-1 image-net root@iccluster012:~# nginx (as revers proxy) log : > ==> nginx/access.log <== > 10.90.37.13 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.1" 200 318 "-" > "rclone/v1.37" rgw logs : > ==> ceph/luminous-rgw-iccluster015.log <== > 2017-09-15 10:37:53.005424 7ff1f28f1700 1 == starting new request > req=0x7ff1f28eb1f0 = > 2017-09-15 10:37:53.007192 7ff1f28f1700 1 == req done req=0x7ff1f28eb1f0 > op status=0 http_status=200 == > 2017-09-15 10:37:53.007282 7ff1f28f1700 1 civetweb: 0x56061586e000: > 127.0.0.1 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.0" 1 0 - rclone/v1.37 Thanks for you help -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How to change the owner of a bucket
Dear list, I was looking on how to change the owner of a bucket. There is a lack of documentation on that point (even the man page is not clear), I found how with the Help of Orit. > radosgw-admin metadata get bucket: > radosgw-admin bucket link --uid= --bucket= > --bucket-id= this issue helped me : http://tracker.ceph.com/issues/14949 Also, in the radosgw-admin man page, unlink is described as "Remove a bucket", what does "remove" means in that case ? Delete ? > Remove a bucket: > $ radosgw-admin bucket unlink --bucket=foo http://docs.ceph.com/docs/master/man/8/radosgw-admin/ -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)
Hello, Finally, I found time to do some new benchmarks with the latest jewel release (10.2.5) on 4 nodes. Each node has 10 OSDs. I ran 2 times "ceph tell osd.* bench" over 40 OSDs, here the average speed : 4.2.0-42-generic 97.45 MB/s 4.4.0-53-generic 55.73 MB/s 4.8.15-040815-generic 62.41 MB/s 4.9.0-040900-generic 60.88 MB/s I have the same behaviour with at least 35 to 40% performance drop between kernel 4.2 and kernel > 4.4 I can do further benches if needed. Yoann Le 26/07/2016 à 09:09, Lomayani S. Laizer a écrit : > Hello, > do you have journal on disk too ? > > Yes am having journal on same hard disk. > > ok and could you do bench with kernel 4.2 ? just to see if you have better > throughput. Thanks > > In ubuntu 14 I was running 4.2 kernel. the throughput was the same around > 80-90MB/s per osd. I cant tell the difference because each test gives > the speeds on same range. I did not test kernel 4.4 in ubuntu 14 > > > -- > Lomayani > > On Tue, Jul 26, 2016 at 9:39 AM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > > Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are > similar. > > do you have journal on disk too ? > > > I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have > around > > 80-90MB/s of OSD speeds in both operating systems > > ok and could you do bench with kernel 4.2 ? just to see if you have better > throughput. Thanks > > > Only issue am observing now with ubuntu 16 is sometime osd fails on > rebooting > > until i start them manually or adding starting commands in rc.local. > > in my case, it's a test environment, so I don't have notice those > behaviours > > -- > Yoann > > > On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch> > > <mailto:yoann.mou...@epfl.ch <mailto:yoann.mou...@epfl.ch>>> wrote: > > > > Hello, > > > > (this is a repost, my previous message seems to be slipping under > the radar) > > > > Does anyone get a similar behaviour to the one described below ? > > > > I found a big performance drop between kernel 3.13.0-88 (default > kernel on > > Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 > (default kernel on > > Ubuntu Xenial 16.04) > > > > - ceph version is Jewel (10.2.2). > > - All tests have been done under Ubuntu 14.04 on > > - Each cluster has 5 nodes strictly identical. > > - Each node has 10 OSDs. > > - Journals are on the disk. > > > > Kernel 4.4 has a drop of more than 50% compared to 4.2 > > Kernel 4.4 has a drop of 40% compared to 3.13 > > > > details below : > > > > With the 3 kernel I have the same performance on disks : > > > > Raw benchmark: > > dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> > average ~230MB/s > > dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => > average ~220MB/s > > > > Filesystem mounted benchmark: > > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => > average ~205MB/s > > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => > average ~214MB/s > > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => > average ~190MB/s > > > > Ceph osd Benchmark: > > Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average > ~81MB/s > > Kernel 4.2.0-38-generic : ceph tell osd.ID bench => average > ~109MB/s > > Kernel 4.4.0-24-generic : ceph tell osd.ID bench => average > ~50MB/s > > > > I did new benchmarks then on 3 new fresh clusters. > > > > - Each cluster has 3 nodes strictly identical. > > - Each node has 10 OSDs. > > - Journals are on the disk. > > > > bench5 : Ubuntu 14.04 / Ceph Infernalis > > bench6 : Ubuntu 14.04 / Ceph Jewel > > bench7 : Ubuntu 16.04 / Ceph jewel > > > > this is the average of 2 runs of "ceph tell osd.* bench" on each > cluster (2 x 30 > > OSDs) > > > > bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s > > bench6 / 14.04 / Jewel /
Re: [ceph-users] stalls caused by scrub on jewel
Hello, > We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0 > and is no more capable to scrub neither deep-scrub. > > [1] http://tracker.ceph.com/issues/17859 > [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007 > [3] https://github.com/ceph/ceph/pull/11898 > > I'm worried we'll have to live with a cluster that can't scrub/deep-scrub > until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4). > > Can we have this fix any sooner ? As far as I know about that bug, it appears if you have big PGs, a workaround could be increasing the pg_num of the pool that has the biggest PGs. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] index-sharding on existing bucket ?
Hello, is that possible to shard the index of existing buckets ? I have more than 100TB of data in a couples of buckets, I'd like to avoid to re upload everythings. Thanks for your help, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] How files are split into PGs ?
Hello, I have a 1GB file and 2 pools, one replicated and one EC 8+2, and I want to make a copy of this file through the radosgw with s3. I'd like to know how this file will be split into PGs in both pools. Some details for my use case : 12 hosts 10 OSDs per Host failure domain set to Host PG=1024 If I push this file through ma radosgw, How I can find all replicats on the OSDs ? And another question, for really small files, on an EC pool, files will be replicated with k+m replica, won't they ? Thanks -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw - http status 400 while creating a bucket
Hello, > many thanks for your help. I've tried setting the zone to master, followed by > the period update --commit command. This is what i've had: maybe it's related to this issue : http://tracker.ceph.com/issues/16839 (fixe in Jewel 10.2.3) or this one : http://tracker.ceph.com/issues/17239 the "id" of the zonegroup shouldn't be "default" but an uuid afaik Best regards Yoann Moulin > root@arh-ibstorage1-ib:~# radosgw-admin zonegroup get --rgw-zonegroup=default > { > "id": "default", > "name": "default", > "api_name": "", > "is_master": "true", > "endpoints": [], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "default", > "zones": [ > { > "id": "default", > "name": "default", > "endpoints": [], > "log_meta": "false", > "log_data": "false", > "bucket_index_max_shards": 0, > "read_only": "false" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [] > } > ], > "default_placement": "default-placement", > "realm_id": "5b41b1b2-0f92-463d-b582-07552f83e66c" > } > > > root@arh-ibstorage1-ib:~# radosgw-admin period update --commit > cannot commit period: period does not have a master zone of a master zonegroup > failed to commit period: (22) Invalid argument > > > root@arh-ibstorage1-ib:~# radosgw-admin zonegroup get --rgw-zonegroup=default > { > "id": "default", > "name": "default", > "api_name": "", > "is_master": "true", > "endpoints": [], > "hostnames": [], > "hostnames_s3website": [], > "master_zone": "", > "zones": [ > { > "id": "default", > "name": "default", > "endpoints": [], > "log_meta": "false", > "log_data": "false", > "bucket_index_max_shards": 0, > "read_only": "false" > } > ], > "placement_targets": [ > { > "name": "default-placement", > "tags": [] > } > ], > "default_placement": "default-placement", > "realm_id": "" > } > > > > > > The strange thing as you can see, following the "radosgw-admin period update > --commit" command, the master_zone and the realm_id values reset to blank. > What could be causing this? > > Here is my ceph infrastructure setup, perhaps it will help with finding the > issue?: > > ceph osd and mon servers: > arh-ibstorage1-ib (also radosgw server) > arh-ibstorage2-ib (also radosgw server) > arh-ibstorage3-ib > > ceph mon server: > arh-cloud13-ib > > > > Thus, overall, i have 4 mon servers, 3 osd servers and 2 radosgw servers > > Thanks > > > > - Original Message - >> From: "Yehuda Sadeh-Weinraub" <yeh...@redhat.com> >> To: "Andrei Mikhailovsky" <and...@arhont.com> >> Cc: "ceph-users" <ceph-users@lists.ceph.com> >> Sent: Wednesday, 9 November, 2016 17:12:30 >> Subject: Re: [ceph-users] radosgw - http status 400 while creating a bucket > >> On Wed, Nov 9, 2016 at 1:30 AM, Andrei Mikhailovsky <and...@arhont.com> >> wrote: >>> Hi Yehuda, >>> >>> just tried to run the command to set the master_zone to default followed by >>> the >>> bucket create without doing the restart and I still have the same error on >>> the >>> client: >>> >>> >> encoding="UTF-8"?>InvalidArgumentmy-new-bucket-31337tx00010-005822ebbd-9951ad8-default9951ad8-default-default >>> >> >> After setting the master zone, try running: >> >> $ radosgw-admin period update --commit >> >> Yehuda >> >>> >>> Andrei >>> >>> - Original Mess
Re: [ceph-users] rgw / s3website, MethodNotAllowed on Jewel 10.2.3
Hello, > I'm trying to get s3website working on one of our Rados Gateway > installations, and I'm having some problems finding out what needs to be > done for this to work. It looks like this is a halfway secret feature, as I > can only find it briefly mentioned in the release notes for v10.0.4 - and > nowhere in the documentation - so I've tried to wrap my head around this by > looking through the source code without much luck. > > My cluster is running Jewel 10.2.3, and I've tried to enable the s3website > API specifically on the RGW-server. (But looking at the source > code, it should be enabled by default) > > Using s3cmd --debug ws-create s3://acme.example.org, I get served with 405 > Method Not Allowed > > DEBUG: Sending request method_string='PUT', uri='/?website', > headers={'x-amz-content-sha256': > '3fcf37205b114f03a910d11d74206358f1681381f0f9498b25aa1cc65e168937', > 'Authorization': 'AWS4-HMAC-SHA256 > Credential=V4NZ37SLP3VOPR2BI5UW/20161026/US/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=4cbd6a7c26dc149fc8fb352dae2d42c27e9bdc254cecc467802941cfc0e200a2', > 'x-amz-date': '20161026T094022Z'}, body=(159 bytes) > DEBUG: Response: {'status': 405, 'headers': {'content-length': '195', > 'accept-ranges': 'bytes', 'server': 'Apache/2.4.6 (CentOS) > OpenSSL/1.0.1e-fips', 'connection': 'close', 'x-amz-request-id': > 'tx3-0058107a06-20d3274-default', 'date': 'Wed, 26 Oct > 2016 09:40:22 GMT', 'content-type': 'application/xml'}, 'reason': 'Method Not > Allowed', 'data': ' encoding="UTF-8"?>MethodNotAllowedtx3-0058107a06-20d3274-default20d3274-default-default'} > > Has anyone have had any luck with this? does apache send $host variable to the backend ? something like "ProxyPreserveHost On" Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"
Hello, >>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is >>> compose by 12 nodes, each nodes have 10 OSD with journal on disk. >>> >>> We have one rbd partition and a radosGW with 2 data pool, one replicated, >>> one EC (8+2) >>> >>> in attachment few details on our cluster. >>> >>> Currently, our cluster is not usable at all due to too much OSD >>> instability. OSDs daemon die randomly with "hit suicide timeout". >>> Yesterday, all >>> of 120 OSDs died at least 12 time (max 74 time) with an average around 40 >>> time >>> >>> here logs from ceph mon and from one OSD : >>> >>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB) >>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB) >>> >>> We have stopped all clients i/o to see if the cluster get stable without >>> success, to avoid endless rebalancing with OSD flapping, we had to >>> "set noout" the cluster. For now we have no idea what's going on. >>> >>> Anyone can help us to understand what's happening ? >>> >>> thanks for your help >>> >> no specific ideas, but this somewhat sounds familiar. >> >> One thing first, you already stopped client traffic but to make sure your >> cluster really becomes quiescent, stop all scrubs as well. >> That's always a good idea in any recovery, overload situation. this is what we did. >> Have you verified CPU load (are those OSD processes busy), memory status, >> etc? >> How busy are the actual disks? The CPU and memory seem to not be overloaded, with journal on disk maybe a little bit busy. >> Sudden deaths like this often are the results of network changes, like a >> switch rebooting and loosing jumbo frame configuration or whatnot. We manage all equipments of the cluster, none of them have reboot. We decided to reboot node by node yesterday but the switch is healthy. In the log I found that the problem has started after I start to copy data on the RadosGW EC pool (8+2). At the same time, we had 6 process reading on the rbd partition, three of those process was writing on a replicated pool through the RadosGW s3 of the cluster itself and one was writing on a EC pool through the RadosGW s3 too, 2 other was not writing on the cluster. Maybe that pressure may slow down enough the disk to create the suicide timeout of the OSD ? but now, we have no more I/O on the cluster and as soon as I re enable scrub and rebalancing, OSDs start to fail again... > just an additional comment: > > you can disable backfilling and recovery temporarily by setting the > 'nobackfill' and 'norecover' flags. It will reduce the backfilling traffic > and may help the cluster and its OSD to recover. Afterwards you should set > the backfill traffic settings to the minimum (e.g. max_backfills = 1) > and unset the flags to allow the cluster to perform the outstanding recovery > operation. > > As the others already pointed out, these actions might help to get the > cluster up and running again, but you need to find the actual reason for > the problems. This is exactly what I want Thanks for the help ! -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"
Hello, >> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose >> by 12 nodes, each nodes have 10 OSD with journal on disk. >> >> We have one rbd partition and a radosGW with 2 data pool, one replicated, >> one EC (8+2) >> >> in attachment few details on our cluster. >> >> Currently, our cluster is not usable at all due to too much OSD instability. >> OSDs daemon die randomly with "hit suicide timeout". Yesterday, all >> of 120 OSDs died at least 12 time (max 74 time) with an average around 40 >> time >> >> here logs from ceph mon and from one OSD : >> >> http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB) > > Do you have an older log showing the start of the incident? The > cluster was already down when this log started. Here the log from Saturday, OSD 134 is the first which had error : http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.134.log.4.bz2 http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.4.bz2 http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.4.bz2 >> http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB) > > In this log the thread which is hanging is doing deep-scrub: > > 2016-10-18 22:16:23.985462 7f12da4af700 0 log_channel(cluster) log > [INF] : 39.54 deep-scrub starts > 2016-10-18 22:16:39.008961 7f12e4cc4700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f12da4af700' had timed out after 15 > 2016-10-18 22:18:54.175912 7f12e34c1700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7f12da4af700' had suicide timed out after 150 > > So you can disable scrubbing completely with > > ceph osd set noscrub > ceph osd set nodeep-scrub > > in case you are hitting some corner case with the scrubbing code. Now the cluster seem to be healthy. but as soon as I re enable scrubbing and rebalancing OSD start to flap and the cluster switch to HEATH_ERR cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4 health HEALTH_WARN noout,noscrub,nodeep-scrub,sortbitwise flag(s) set monmap e1: 3 mons at {iccluster002.iccluster.epfl.ch=10.90.37.3:6789/0,iccluster010.iccluster.epfl.ch=10.90.37.11:6789/0,iccluster018.iccluster.epfl.ch=10.90.37.19:6789/0} election epoch 64, quorum 0,1,2 iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch fsmap e131: 1/1/1 up {0=iccluster022.iccluster.epfl.ch=up:active}, 2 up:standby osdmap e72932: 144 osds: 144 up, 120 in flags noout,noscrub,nodeep-scrub,sortbitwise pgmap v4834810: 9408 pgs, 28 pools, 153 TB data, 75849 kobjects 449 TB used, 203 TB / 653 TB avail 9408 active+clean >> We have stopped all clients i/o to see if the cluster get stable without >> success, to avoid endless rebalancing with OSD flapping, we had to >> "set noout" the cluster. For now we have no idea what's going on. >> >> Anyone can help us to understand what's happening ? > > Is your network OK? We have one 10G nic for the private network and one 10G nic for the public network. The network is far under loaded right now and there is no error. We don't use jumbo frame. > It will be useful to see the start of the incident to better > understand what caused this situation. > > Also, maybe useful for you... you can increase the suicide timeout, e.g.: > >osd op thread suicide timeout: > > If the cluster is just *slow* somehow, then increasing that might > help. If there is something systematically broken, increasing would > just postpone the inevitable. Ok, I'm going to study this option with my colleagues thanks -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"
Dear List, We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose by 12 nodes, each nodes have 10 OSD with journal on disk. We have one rbd partition and a radosGW with 2 data pool, one replicated, one EC (8+2) in attachment few details on our cluster. Currently, our cluster is not usable at all due to too much OSD instability. OSDs daemon die randomly with "hit suicide timeout". Yesterday, all of 120 OSDs died at least 12 time (max 74 time) with an average around 40 time here logs from ceph mon and from one OSD : http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB) http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB) We have stopped all clients i/o to see if the cluster get stable without success, to avoid endless rebalancing with OSD flapping, we had to "set noout" the cluster. For now we have no idea what's going on. Anyone can help us to understand what's happening ? thanks for your help -- Yoann Moulin EPFL IC-IT $ ceph --version ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) $ uname -a Linux icadmin004 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ ceph osd pool ls detail pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 4927 flags hashpspool stripe_width 0 removed_snaps [1~3] pool 3 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 258 flags hashpspool stripe_width 0 pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 259 flags hashpspool stripe_width 0 pool 5 'default.rgw.data.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 260 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 6 'default.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 261 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 262 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 8 'erasure.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 271 flags hashpspool stripe_width 0 pool 9 'erasure.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 272 flags hashpspool stripe_width 0 pool 11 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 276 flags hashpspool stripe_width 0 pool 12 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 277 flags hashpspool stripe_width 0 pool 14 'default.rgw.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 311 flags hashpspool stripe_width 0 pool 15 'default.rgw.users.keys' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 313 flags hashpspool stripe_width 0 pool 16 'default.rgw.meta' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 315 flags hashpspool stripe_width 0 pool 17 'default.rgw.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 320 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 18 'default.rgw.users.email' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 322 owner 18446744073709551615 flags hashpspool stripe_width 0 pool 19 'default.rgw.usage' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 353 flags hashpspool stripe_width 0 pool 20 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 4918 flags hashpspool stripe_width 0 pool 26 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3549 flags hashpspool stripe_width 0 pool 27 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3551 flags hashpspool stripe_width 0 pool 28 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3552 flags hashpspool stripe_width 0 pool 29 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3553 flags hashpspool stripe_width 0 pool 30 'test' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 4910 flags hashpspool stripe_width 0 pool 31 'data' replicated size 3 min_si
[ceph-users] Loop in radosgw-admin orphan find
1 entries at orphan.scan.erasure.linked.19 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.43 > storing 1 entries at orphan.scan.erasure.linked.47 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.63 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.2 > storing 1 entries at orphan.scan.erasure.linked.5 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.19 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.43 > storing 1 entries at orphan.scan.erasure.linked.47 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.63 > storing 1 entries at orphan.scan.erasure.linked.2 > storing 1 entries at orphan.scan.erasure.linked.5 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.19 > storing 1 entries at orphan.s can.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.43 > storing 1 entries at orphan.scan.erasure.linked.47 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.63 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.2 > storing 1 entries at orphan.scan.erasure.linked.5 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.19 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.43 > storing 1 entries at orphan.scan.erasure.linked.47 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.63 > storing 1 entries at orphan.scan.erasure.linked.2 > storing 1 entries at orphan.scan.erasure.linked.5 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.19 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.43 > storing 1 entries at orphan.scan.erasure.linked.47 > storing 1 entries at orphan.scan.erasure.linked.56 > storing 1 entries at orphan.scan.erasure.linked.63 > storing 1 entries at orphan.scan.erasure.linked.9 > storing 1 entries at orphan.scan.erasure.linked.25 > storing 1 entries at orphan.scan.erasure.linked.40 > storing 1 entries at orphan.scan.erasure.linked.56 Thanks for your help -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph full cluster
Hello, > Yes, you are right! > I've changed this for all pools, but not for last two! > > pool 1 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 object_hash > rjenkins pg_num 8 pgp_num 8 last_change 27 owner > 18446744073709551615 flags hashpspool strip > e_width 0 > pool 2 'default.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 29 owner > 18446744073709551615 flags hashps > pool stripe_width 0 > pool 3 'default.rgw.data.root' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner > 18446744073709551615 flags hash > pspool stripe_width 0 > pool 4 'default.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 33 owner > 18446744073709551615 flags hashpspool > stripe_width 0 > pool 5 'default.rgw.log' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 35 owner > 18446744073709551615 flags hashpspool > stripe_width 0 > pool 6 'default.rgw.users.uid' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 37 owner > 18446744073709551615 flags hash > pspool stripe_width 0 > pool 7 'default.rgw.users.keys' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 owner > 18446744073709551615 flags has > hpspool stripe_width 0 > pool 8 'default.rgw.meta' replicated size 2 min_size 2 crush_ruleset 0 > object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 owner > 18446744073709551615 flags hashpspoo > l stripe_width 0 > pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset > 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 43 flags > hashpspool stripe_width 0 > pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset > 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 45 flags > hashpspool stripe_width 0 Be-careful, if you set size 2 and min_size 2, your cluster will be in HEALTH_ERR state if you loose only OSD, if you want to set "size 2" (which is not recommended) you should set min_size to 1. Best Regards. Yoann Moulin > On Mon, Sep 26, 2016 at 2:05 PM, Burkhard Linke > <burkhard.li...@computational.bio.uni-giessen.de > <mailto:burkhard.li...@computational.bio.uni-giessen.de>> wrote: > > Hi, > > > On 09/26/2016 12:58 PM, Dmitriy Lock wrote: >> Hello all! >> I need some help with my Ceph cluster. >> I've installed ceph cluster with two physical servers with osd /data 40G >> on each. >> Here is ceph.conf: >> [global] >> fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c >> mon_initial_members = vm35, vm36 >> mon_host = 192.168.1.35,192.168.1.36 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> >> osd pool default size = 2 # Write an object 2 times. >> osd pool default min size = 1 # Allow writing one copy in a degraded >> state. >> >> osd pool default pg num = 200 >> osd pool default pgp num = 200 >> >> Right after creation it was HEALTH_OK, and i've started with filling it. >> I've wrote 40G data to cluster using Rados gateway, but cluster >> uses all avaiable space and keep growing after i've added two another >> osd - 10G /data1 on each server. >> Here is tree output: >> # ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 0.09756 root default >> -2 0.04878 host vm35 >> 0 0.03899 osd.0 up 1.0 1.0 >> 2 0.00980 osd.2 up 1.0 1.0 >> -3 0.04878 host vm36 >> 1 0.03899 osd.1 up 1.0 1.0 >> 3 0.00980 osd.3 up 1.0 1.0 >> >> and health: >> root@vm35:/etc# ceph health >> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck >> unclean; 15 pgs undersized; recovery 87176/300483 objects degraded >> (29.012%); recovery 62272/300483 obj >> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool >> default.rgw.buckets.data has many more objects per pg than average (too >> few pgs?) >> root@vm35:/etc# ceph health detail >> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck >> unclean; 15 pgs undersiz
Re: [ceph-users] RadosGW index-sharding on Jewel
Hello, > i curently setup my new testcluster (Jewel) and found out the index > sharding configuration had changed? > > i did so far: > 1. radosgw-admin realm create --rgw-realm=default --default > 2. radosgw-admin zonegroup get --rgw-zonegroup=default > zonegroup.json > 3. chaned value "bucket_index_max_shards": 64 > 4. radosgw-admin zonegroup set --rgw-zonegroup=default < zonegroup.json > 5. radosgw-admin region get --rgw-zonegroup=default > region.json > 6. chaned value "bucket_index_max_shards": 64 > 7. radosgw-admin region set --rgw-region=default --rgw-zone=default > --rgw-zonegroup=default < region.json As far as I know, region and zonegroup are the same in jewel : http://docs.ceph.com/docs/jewel/radosgw/multisite/ « Zonegroup: A zonegroup consists of multiple zones, this approximately corresponds to what used to be called as a region in pre Jewel releases for federated deployments. There should be a master zonegroup that will handle changes to the system configuration. » > but bukets are created with ot sharding: > rados -p default.rgw.buckets.index ls | grep $(radosgw-admin metadata > get bucket:images-eu-v1 | jq .data.bucket.bucket_id| tr -d '"') On that point, I don't know, I never configure index sharding Best Regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW : troubleshoooting zone / zonegroup / period
max_objects": -1 > }, > "num_shards": 0, > "bi_shard_hash_type": 0, > "requester_pays": "false", > "has_website": "false", > "swift_versioning": "false", > "swift_ver_location": "" > }, > "attrs": [ > { > "key": "user.rgw.acl", > "val": > "AgKRAwIaCQAAAHJlcGxpY2F0ZQkAAAByZXBsaWNhdGUDA2sBAQkAAAByZXBsaWNhdGUPAQkAAAByZXBsaWNhdGUEAzoCAgQACQAAAHJlcGxpY2F0ZQAAAgIEDwkAAAByZXBsaWNhdGUAAA==" > }, > { > "key": "user.rgw.idtag", > "val": "" > }, > { > "key": "user.rgw.manifest", > "val": "" > } > ] > } > } We try to fixe this by creating new zonegroup and zone with the good IDs, set as default and delete the other one but we fall back on the bug on period update 3. Troubleshooting #2 Restart from scratch the process : We stop all the radosgw daemon, delete the .rgw.root pool, start the radosgw, create the realm again Then we decide to try to create the zonegroup and the zone from json we save with good IDs set We have to be careful to change the realm id in the 2 json with the new one, if not it won't work. After edition the 2 files again default_zonegroup.json default_zone.json we can create the zonegroup and zone like that : > radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json > radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < > default_zone.json At this point, the new zonegroup and zone were successfully created but their IDs wasn't those in the json, during the set, the radosgw-admin create a new IDs for both zonegroup and zone. In this situation we are still not able to access to the data. We have to start again from scratch... 4. Troubleshooting #3 We decide to restart the process but leave the radosgw stopped, we have the intuition that may affect the behaviour by creation default zone and zonegroup itself. Finally we did that : Stop all RadosGW ! Purge the .rgw.root pool > rados purge .rgw.root --yes-i-really-really-mean-it create a new realm id and set it as default > radosgw-admin realm create --rgw-realm=default --default Edit the 2 json files to change the realm id with the new one > vim default_zone.json #change realm with the new one > vim default_zonegroup.json #change realm with the new one Create the zonegroup and the zone like that (the order is really important here !) > radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json > radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < > default_zone.json Set zonegroup and zone as default > radosgw-admin zonegroup default --rgw-zonegroup default > radosgw-admin zone default --rgw-zone default We can check if the zone and the zonegroup are good by doing this > radosgw-admin zonegroup list > radosgw-admin zonegroup get > radosgw-admin zone list > radosgw-admin zone get We have to update the period (do not commit first and read if the data in the update are good) > radosgw-admin period update Then we can commit the period update to apply the configuration > radosgw-admin period update --commit We can now safely restart the radosgw ! -- Yoann Moulin EPFL IC-IT default_zone.json Description: application/json default_zonegroup.json Description: application/json ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured
Le 06/09/2016 à 11:13, Orit Wasserman a écrit : > you can try: > radosgw-admin zonegroup modify --zonegroup-id --master=false I try but I don't have any zonegroup with this ID listed, the zonegroup with this Id appear only in the zonegroup-map. anyway I can do a zonegroup get --zonegroup-id 4d982760-7853-4174-8c05-cec2ef148cf0 I might try to change the name of this zonegroup ? because I have 2 zone wiht the same name but with 2 different IDs $ radosgw-admin zonegroup get --zonegroup-id 4d982760-7853-4174-8c05-cec2ef148cf0 { "id": "4d982760-7853-4174-8c05-cec2ef148cf0", "name": "default", "api_name": "", "is_master": "false", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e", "zones": [ { "id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e", "name": "default", "endpoints": [], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "custom-placement", "tags": [] }, { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a" } and the default : $ radosgw-admin zonegroup get --zonegroup-id default { "id": "default", "name": "default", "api_name": "", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "", "zones": [ { "id": "default", "name": "default", "endpoints": [], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a" } $ radosgw-admin bucket list 2016-09-06 11:21:04.787391 7fb8a1f0b900 0 Error updating periodmap, multiple master zonegroups configured 2016-09-06 11:21:04.787407 7fb8a1f0b900 0 master zonegroup: 4d982760-7853-4174-8c05-cec2ef148cf0 and default 2016-09-06 11:21:04.787409 7fb8a1f0b900 0 ERROR: updating period map: (22) Invalid argument 2016-09-06 11:21:04.787424 7fb8a1f0b900 0 failed to add zonegroup to current_period: (22) Invalid argument 2016-09-06 11:21:04.787432 7fb8a1f0b900 -1 failed converting region to zonegroup : ret -22 (22) Invalid argument couldn't init storage provider > On Tue, Sep 6, 2016 at 11:08 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >> Hello Orit, >> >>> you have two (or more) zonegroups that are set as master. >> >> Yes I know, but I don't know how to fix this >> >>> First detect which zonegroup are the problematic >>> get zonegroup list by running: radosgw-admin zonegroup list >> >> I only see one zonegroup : >> >> $ radosgw-admin zonegroup list >> read_default_id : 0 >> { >> "default_info": "default", >> "zonegroups": [ >> "default" >> ] >> } >> >>> than on each zonegroup run: >>> radosgw-admin zonegroup get --rgw-zonegroup >>> see in which is_master is true. >> >> $ radosgw-admin zonegroup get --rgw-zonegroup default >> { >> "id": "default", >> "name": "default", >> "api_name": "", >> "is_master": "true", >> "endpoints": [], >> "hostnames": [], >> "hostnames_s3website": [], >> "master_zone": "", >> "zones": [ >> { >> "id": "default", >> "name": "default", >> "endpoints": [],
Re: [ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured
Hello Orit, > you have two (or more) zonegroups that are set as master. Yes I know, but I don't know how to fix this > First detect which zonegroup are the problematic > get zonegroup list by running: radosgw-admin zonegroup list I only see one zonegroup : $ radosgw-admin zonegroup list read_default_id : 0 { "default_info": "default", "zonegroups": [ "default" ] } > than on each zonegroup run: > radosgw-admin zonegroup get --rgw-zonegroup > see in which is_master is true. $ radosgw-admin zonegroup get --rgw-zonegroup default { "id": "default", "name": "default", "api_name": "", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "", "zones": [ { "id": "default", "name": "default", "endpoints": [], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a" } > Now you need to clear the master flag for all zonegroups except one, > this can be done by running: > radsogw-admin zonegroup modify --rgw-zonegroup --master=false if you check in files in my previous mail in metadata_zonegroup-map.json and metadata_zonegroup.json, there is only one zonegroup with name "default" but in metadata_zonegroup.json, the id is "default" and in metadata_zonegroup-map.json it is "4d982760-7853-4174-8c05-cec2ef148cf0" so for the zonegroup with the name "default", I have 2 differents ID, I guess the problem is there Thanks for your help Best regards Yoann Moulin > On Tue, Sep 6, 2016 at 9:22 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >> Dear List, >> >> I have an issue with my radosGW. >> >> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) >> Linux cluster002 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 >> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux >> Ubuntu 16.04 LTS >> >>> $ ceph -s >>> cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4 >>> health HEALTH_OK >>> monmap e1: 3 mons at >>> {cluster002.localdomain=10.90.37.3:6789/0,cluster010.localdomain=10.90.37.11:6789/0,cluster018.localdomain=10.90.37.19:6789/0} >>> election epoch 40, quorum 0,1,2 >>> cluster002.localdomain,cluster010.localdomain,cluster018.localdomain >>> fsmap e47: 1/1/1 up {0=cluster006.localdomain=up:active}, 2 up:standby >>> osdmap e3784: 144 osds: 144 up, 120 in >>> flags sortbitwise >>> pgmap v1146863: 7024 pgs, 26 pools, 71470 GB data, 41466 kobjects >>> 209 TB used, 443 TB / 653 TB avail >>> 7013 active+clean >>>7 active+clean+scrubbing+deep >>>4 active+clean+scrubbing >> >> Example of the error message I have : >> >>> $ radosgw-admin bucket list >>> 2016-09-06 09:04:14.810198 7fcbb01d5900 0 Error updating periodmap, >>> multiple master zonegroups configured >>> 2016-09-06 09:04:14.810213 7fcbb01d5900 0 master zonegroup: >>> 4d982760-7853-4174-8c05-cec2ef148cf0 and default >>> 2016-09-06 09:04:14.810215 7fcbb01d5900 0 ERROR: updating period map: (22) >>> Invalid argument >>> 2016-09-06 09:04:14.810230 7fcbb01d5900 0 failed to add zonegroup to >>> current_period: (22) Invalid argument >>> 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to >>> zonegroup : ret -22 (22) Invalid argument >> >> in attachment, you have the result of those commands : >> >>> $ radosgw-admin metadata zonegroup-map get > metadata_zonegroup-map.json >>> $ radosgw-admin metadata zonegroup get > metadata_zonegroup.json >>> $ radosgw-admin metadata zone get > metadata_zone.json >>> $ radosgw-admin metadata region-map get > metadata_region-map.json >>> $ radosgw-admin metadata region get > metadata_region.json >>> $ radosgw-admin zonegroup-map get > zonegroup-map.json >>> $ radosgw-admin zonegroup get > zonegroup.json >>> $ radosgw-admin zone get > zone.json >>> $ radosgw-admin region-map get > region-map.json >>> $ radosgw-admin region get > region.json >>> $ radosgw-admin period get > period.json >>> $ radosgw-admin period list > period_list.json >> >> I have 60TB of data in this RadosGW, can I fix this issue without having to >> repupload all those data ? >> >> Thanks for you help ! >> >> Best regards >> >> -- >> Yoann Moulin >> EPFL IC-IT >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured
Dear List, I have an issue with my radosGW. ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) Linux cluster002 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux Ubuntu 16.04 LTS > $ ceph -s > cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4 > health HEALTH_OK > monmap e1: 3 mons at > {cluster002.localdomain=10.90.37.3:6789/0,cluster010.localdomain=10.90.37.11:6789/0,cluster018.localdomain=10.90.37.19:6789/0} > election epoch 40, quorum 0,1,2 > cluster002.localdomain,cluster010.localdomain,cluster018.localdomain > fsmap e47: 1/1/1 up {0=cluster006.localdomain=up:active}, 2 up:standby > osdmap e3784: 144 osds: 144 up, 120 in > flags sortbitwise > pgmap v1146863: 7024 pgs, 26 pools, 71470 GB data, 41466 kobjects > 209 TB used, 443 TB / 653 TB avail > 7013 active+clean >7 active+clean+scrubbing+deep >4 active+clean+scrubbing Example of the error message I have : > $ radosgw-admin bucket list > 2016-09-06 09:04:14.810198 7fcbb01d5900 0 Error updating periodmap, multiple > master zonegroups configured > 2016-09-06 09:04:14.810213 7fcbb01d5900 0 master zonegroup: > 4d982760-7853-4174-8c05-cec2ef148cf0 and default > 2016-09-06 09:04:14.810215 7fcbb01d5900 0 ERROR: updating period map: (22) > Invalid argument > 2016-09-06 09:04:14.810230 7fcbb01d5900 0 failed to add zonegroup to > current_period: (22) Invalid argument > 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to > zonegroup : ret -22 (22) Invalid argument in attachment, you have the result of those commands : > $ radosgw-admin metadata zonegroup-map get > metadata_zonegroup-map.json > $ radosgw-admin metadata zonegroup get > metadata_zonegroup.json > $ radosgw-admin metadata zone get > metadata_zone.json > $ radosgw-admin metadata region-map get > metadata_region-map.json > $ radosgw-admin metadata region get > metadata_region.json > $ radosgw-admin zonegroup-map get > zonegroup-map.json > $ radosgw-admin zonegroup get > zonegroup.json > $ radosgw-admin zone get > zone.json > $ radosgw-admin region-map get > region-map.json > $ radosgw-admin region get > region.json > $ radosgw-admin period get > period.json > $ radosgw-admin period list > period_list.json I have 60TB of data in this RadosGW, can I fix this issue without having to repupload all those data ? Thanks for you help ! Best regards -- Yoann Moulin EPFL IC-IT metadata_region-map.json Description: application/json metadata_zonegroup-map.json Description: application/json region.json Description: application/json region-map.json Description: application/json metadata_region.json Description: application/json zonegroup.json Description: application/json zone.json Description: application/json zonegroup-map.json Description: application/json metadata_zone.json Description: application/json metadata_zonegroup.json Description: application/json period_list.json Description: application/json period.json Description: application/json ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW zonegroup id error
Hello, >>> I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I >>> don't >>> know when this occured, but I think I did a wrong command during the >>> manipulation of zones and regions. Now the ID of my zonegroup is "default" >>> instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or >>> regions anymore. >>> >>> Is that possible to change the ID of the zonegroup, I try to update the json >>> then set the zonegroup but it doesn't work (certainly because it's not the >>> same >>> ID...) >> >> if I create a new zonegroup then set as the default zonegroup, update the >> zonegroup-map, zone etc, then delete the zonegroup with the ID >> "default" it should work ? >> > It should work. Do you have any existing data on the zone group? There is only one zonegroup, so I guess yes. Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW zonegroup id error
Hello, > I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I > don't > know when this occured, but I think I did a wrong command during the > manipulation of zones and regions. Now the ID of my zonegroup is "default" > instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or > regions anymore. > > Is that possible to change the ID of the zonegroup, I try to update the json > then set the zonegroup but it doesn't work (certainly because it's not the > same > ID...) if I create a new zonegroup then set as the default zonegroup, update the zonegroup-map, zone etc, then delete the zonegroup with the ID "default" it should work ? Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW zonegroup id error
Hello, I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I don't know when this occured, but I think I did a wrong command during the manipulation of zones and regions. Now the ID of my zonegroup is "default" instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or regions anymore. Is that possible to change the ID of the zonegroup, I try to update the json then set the zonegroup but it doesn't work (certainly because it's not the same ID...) see below the zonegroup and zonegroup-map metadata $ radosgw-admin zonegroup get { "id": "default", "name": "default", "api_name": "", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "", "zones": [ { "id": "default", "name": "default", "endpoints": [], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a" } $ radosgw-admin zonegroup-map get { "zonegroups": [ { "key": "4d982760-7853-4174-8c05-cec2ef148cf0", "val": { "id": "4d982760-7853-4174-8c05-cec2ef148cf0", "name": "default", "api_name": "", "is_master": "true", "endpoints": [], "hostnames": [], "hostnames_s3website": [], "master_zone": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e", "zones": [ { "id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e", "name": "default", "endpoints": [], "log_meta": "false", "log_data": "false", "bucket_index_max_shards": 0, "read_only": "false" } ], "placement_targets": [ { "name": "default-placement", "tags": [] } ], "default_placement": "default-placement", "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a" } } ], "master_zonegroup": "4d982760-7853-4174-8c05-cec2ef148cf0", "bucket_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 }, "user_quota": { "enabled": false, "max_size_kb": -1, "max_objects": -1 } } Thanks for your help, Best regards -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)
Hello Mark, > FWIW, on CentOS7 I actually saw a performance increase when upgrading from the > stock 3.10 kernel to 4.4.5 with Intel P3700 NVMe devices. I was encountering > some kind of strange concurrency/locking issues at the driver level that 4.4.5 > resolved. I think your best bet is to try different intermediate kernels, > track > it down as much as you can and then look through the kernel changelog. The point here is I have only installed kernel from linux-image-virtual-lts package, I expect for my future environment to stay on lts kernel package maintained by security team. anyway, I'm still in test, I can test kernels to try to find from which one the regression start. > Sorry I can't be of more help! no problems :) -- Yoann > On 07/25/2016 10:45 AM, Yoann Moulin wrote: >> Hello, >> >> (this is a repost, my previous message seems to be slipping under the radar) >> >> Does anyone get a similar behaviour to the one described below ? >> >> I found a big performance drop between kernel 3.13.0-88 (default kernel on >> Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default kernel >> on >> Ubuntu Xenial 16.04) >> >> - ceph version is Jewel (10.2.2). >> - All tests have been done under Ubuntu 14.04 on >> - Each cluster has 5 nodes strictly identical. >> - Each node has 10 OSDs. >> - Journals are on the disk. >> >> Kernel 4.4 has a drop of more than 50% compared to 4.2 >> Kernel 4.4 has a drop of 40% compared to 3.13 >> >> details below : >> >> With the 3 kernel I have the same performance on disks : >> >> Raw benchmark: >> dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average >> ~230MB/s >> dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average >> ~220MB/s >> >> Filesystem mounted benchmark: >> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average >> ~205MB/s >> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average >> ~214MB/s >> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average >> ~190MB/s >> >> Ceph osd Benchmark: >> Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average ~81MB/s >> Kernel 4.2.0-38-generic : ceph tell osd.ID bench => average ~109MB/s >> Kernel 4.4.0-24-generic : ceph tell osd.ID bench => average ~50MB/s >> >> I did new benchmarks then on 3 new fresh clusters. >> >> - Each cluster has 3 nodes strictly identical. >> - Each node has 10 OSDs. >> - Journals are on the disk. >> >> bench5 : Ubuntu 14.04 / Ceph Infernalis >> bench6 : Ubuntu 14.04 / Ceph Jewel >> bench7 : Ubuntu 16.04 / Ceph jewel >> >> this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 >> x 30 >> OSDs) >> >> bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s >> bench6 / 14.04 / Jewel / kernel 3.13 : 86.47 MB/s >> >> bench5 / 14.04 / Infernalis / kernel 4.2 : 63.38 MB/s >> bench6 / 14.04 / Jewel / kernel 4.2 : 107.75 MB/s >> bench7 / 16.04 / Jewel / kernel 4.2 : 101.54 MB/s >> >> bench5 / 14.04 / Infernalis / kernel 4.4 : 53.61 MB/s >> bench6 / 14.04 / Jewel / kernel 4.4 : 65.82 MB/s >> bench7 / 16.04 / Jewel / kernel 4.4 : 61.57 MB/s >> >> If needed, I have the raw output of "ceph tell osd.* bench" >> >> Best regards >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)
Hello, > Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are similar. do you have journal on disk too ? > I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have around > 80-90MB/s of OSD speeds in both operating systems ok and could you do bench with kernel 4.2 ? just to see if you have better throughput. Thanks > Only issue am observing now with ubuntu 16 is sometime osd fails on rebooting > until i start them manually or adding starting commands in rc.local. in my case, it's a test environment, so I don't have notice those behaviours -- Yoann > On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > (this is a repost, my previous message seems to be slipping under the > radar) > > Does anyone get a similar behaviour to the one described below ? > > I found a big performance drop between kernel 3.13.0-88 (default kernel on > Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default > kernel on > Ubuntu Xenial 16.04) > > - ceph version is Jewel (10.2.2). > - All tests have been done under Ubuntu 14.04 on > - Each cluster has 5 nodes strictly identical. > - Each node has 10 OSDs. > - Journals are on the disk. > > Kernel 4.4 has a drop of more than 50% compared to 4.2 > Kernel 4.4 has a drop of 40% compared to 3.13 > > details below : > > With the 3 kernel I have the same performance on disks : > > Raw benchmark: > dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average > ~230MB/s > dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average > ~220MB/s > > Filesystem mounted benchmark: > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average > ~205MB/s > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average > ~214MB/s > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average > ~190MB/s > > Ceph osd Benchmark: > Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average ~81MB/s > Kernel 4.2.0-38-generic : ceph tell osd.ID bench => average ~109MB/s > Kernel 4.4.0-24-generic : ceph tell osd.ID bench => average ~50MB/s > > I did new benchmarks then on 3 new fresh clusters. > > - Each cluster has 3 nodes strictly identical. > - Each node has 10 OSDs. > - Journals are on the disk. > > bench5 : Ubuntu 14.04 / Ceph Infernalis > bench6 : Ubuntu 14.04 / Ceph Jewel > bench7 : Ubuntu 16.04 / Ceph jewel > > this is the average of 2 runs of "ceph tell osd.* bench" on each cluster > (2 x 30 > OSDs) > > bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s > bench6 / 14.04 / Jewel / kernel 3.13 : 86.47 MB/s > > bench5 / 14.04 / Infernalis / kernel 4.2 : 63.38 MB/s > bench6 / 14.04 / Jewel / kernel 4.2 : 107.75 MB/s > bench7 / 16.04 / Jewel / kernel 4.2 : 101.54 MB/s > > bench5 / 14.04 / Infernalis / kernel 4.4 : 53.61 MB/s > bench6 / 14.04 / Jewel / kernel 4.4 : 65.82 MB/s > bench7 / 16.04 / Jewel / kernel 4.4 : 61.57 MB/s > > If needed, I have the raw output of "ceph tell osd.* bench" > > Best regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)
Hello, (this is a repost, my previous message seems to be slipping under the radar) Does anyone get a similar behaviour to the one described below ? I found a big performance drop between kernel 3.13.0-88 (default kernel on Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04) - ceph version is Jewel (10.2.2). - All tests have been done under Ubuntu 14.04 on - Each cluster has 5 nodes strictly identical. - Each node has 10 OSDs. - Journals are on the disk. Kernel 4.4 has a drop of more than 50% compared to 4.2 Kernel 4.4 has a drop of 40% compared to 3.13 details below : With the 3 kernel I have the same performance on disks : Raw benchmark: dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average ~220MB/s Filesystem mounted benchmark: dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average ~205MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average ~190MB/s Ceph osd Benchmark: Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average ~81MB/s Kernel 4.2.0-38-generic : ceph tell osd.ID bench => average ~109MB/s Kernel 4.4.0-24-generic : ceph tell osd.ID bench => average ~50MB/s I did new benchmarks then on 3 new fresh clusters. - Each cluster has 3 nodes strictly identical. - Each node has 10 OSDs. - Journals are on the disk. bench5 : Ubuntu 14.04 / Ceph Infernalis bench6 : Ubuntu 14.04 / Ceph Jewel bench7 : Ubuntu 16.04 / Ceph jewel this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 x 30 OSDs) bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s bench6 / 14.04 / Jewel / kernel 3.13 : 86.47 MB/s bench5 / 14.04 / Infernalis / kernel 4.2 : 63.38 MB/s bench6 / 14.04 / Jewel / kernel 4.2 : 107.75 MB/s bench7 / 16.04 / Jewel / kernel 4.2 : 101.54 MB/s bench5 / 14.04 / Infernalis / kernel 4.4 : 53.61 MB/s bench6 / 14.04 / Jewel / kernel 4.4 : 65.82 MB/s bench7 / 16.04 / Jewel / kernel 4.4 : 61.57 MB/s If needed, I have the raw output of "ceph tell osd.* bench" Best regards -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Re: Infernalis -> Jewel, 10x+ RBD latency increase
Hi, >>> I just upgraded from Infernalis to Jewel and see an approximate 10x >>> latency increase. >>> >>> Quick facts: >>> - 3x replicated pool >>> - 4x 2x-"E5-2690 v3 @ 2.60GHz", 128GB RAM, 6x 1.6 TB Intel S3610 >>> SSDs, >>> - LSI3008 controller with up-to-date firmware and upstream driver, >>> and up-to-date firmware on SSDs. >>> - 40GbE (Mellanox, with up-to-date drivers & firmware) >>> - CentOS 7.2 Which kernel do you runs ? I found performance drop on the troughput (~40%) with kernel 4.4 compare of kernel 4.2. I didn't do the bench on latency but maybe the issue impact the latency too. -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)
Hello, >>>>>>> I found a performance drop between kernel 3.13.0-88 (default kernel on >>>>>>> Ubuntu >>>>>>> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial >>>>>>> 16.04) >>>>>>> >>>>>>> ceph version is Jewel (10.2.2). >>>>>>> All tests have been done under Ubuntu 14.04 >>>>>> >>>>>> Knowing that you also have an internalis cluster on almost identical >>>>>> hardware, can you please let the list know whether you see the same >>>>>> behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on >>>>>> that cluster as well? >>>>> >>>>> ceph version is infernalis (9.2.0) >>>>> >>>>> Ceph osd Benchmark: >>>>> >>>>> Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s >>>>> Kernel 4.2.0-38-generic : ceph tell osd.ID => average ~90MB/s >>>>> Kernel 4.4.0-24-generic : ceph tell osd.ID => average ~75MB/s >>>>> >>>>> The slow down is not as much as I have with Jewel but it is still present. >>>> >>>> But this is not on precisely identical hardware, is it? >>> >>> All the benchmarks were run on strictly identical hardware setups per node. >>> Clusters differ slightly in sizes (infernalis vs jewel) but nodes and OSDs >>> are identical. >> >> One thing differ in the osd configuration, on the Jewel cluster, we have >> journal >> on disk, on the Infernalis cluster, we have journal on SSD (S3500) >> >> I can restart my test on a Jewel cluster with journal on SSD if needed. >> I can do as well a test on an Infernalis cluster with journal on disk. > > I'd suggest that the second option is probably more meaningful to test. I did new benchmarks on 3 clusters. Each cluster has 3 nodes strictly identical. Each node has 10 OSDs. Journals are on the disk. bench5 : Ubuntu 14.04 / Ceph Infernalis bench6 : Ubuntu 14.04 / Ceph Jewel bench7 : Ubuntu 16.04 / Ceph jewel this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 x 30 OSDs) bench5 / 14.04 / Infernalis / kernel 3.13 : 54.35 MB/s bench6 / 14.04 / Jewel / kernel 3.13 : 86.47 MB/s bench5 / 14.04 / Infernalis / kernel 4.2 : 63.38 MB/s bench6 / 14.04 / Jewel / kernel 4.2 : 107.75 MB/s bench7 / 16.04 / Jewel / kernel 4.2 : 101.54 MB/s bench5 / 14.04 / Infernalis / kernel 4.4 : 53.61 MB/s bench6 / 14.04 / Jewel / kernel 4.4 : 65.82 MB/s bench7 / 16.04 / Jewel / kernel 4.4 : 61.57 MB/s If needed, I have the raw output of "ceph tell osd.* bench" > What I find curious is that no-one else on the list has apparently run > into this. Any Ubuntu xenial users out there, or perhaps folks on > trusty who choose to install linux-image-generic-lts-xenial? Anyone to try on their side if they have the same behaviour ? Cheers, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)
Le 23/06/2016 08:25, Sarni Sofiane a écrit : > Hi Florian, > > On 23.06.16 06:25, "ceph-users on behalf of Florian Haas" > <ceph-users-boun...@lists.ceph.com on behalf of flor...@hastexo.com> wrote: > >> On Wed, Jun 22, 2016 at 10:56 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >>> Hello Florian, >>> >>>> On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >>>>> Hello, >>>>> >>>>> I found a performance drop between kernel 3.13.0-88 (default kernel on >>>>> Ubuntu >>>>> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial >>>>> 16.04) >>>>> >>>>> ceph version is Jewel (10.2.2). >>>>> All tests have been done under Ubuntu 14.04 >>>> >>>> Knowing that you also have an internalis cluster on almost identical >>>> hardware, can you please let the list know whether you see the same >>>> behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on >>>> that cluster as well? >>> >>> ceph version is infernalis (9.2.0) >>> >>> Ceph osd Benchmark: >>> >>> Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s >>> Kernel 4.2.0-38-generic : ceph tell osd.ID => average ~90MB/s >>> Kernel 4.4.0-24-generic : ceph tell osd.ID => average ~75MB/s >>> >>> The slow down is not as much as I have with Jewel but it is still present. >> >> But this is not on precisely identical hardware, is it? > > All the benchmarks were run on strictly identical hardware setups per node. > Clusters differ slightly in sizes (infernalis vs jewel) but nodes and OSDs > are identical. One thing differ in the osd configuration, on the Jewel cluster, we have journal on disk, on the Infernalis cluster, we have journal on SSD (S3500) I can restart my test on a Jewel cluster with journal on SSD if needed. I can do as well a test on an Infernalis cluster with journal on disk. Best regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)
Hello Florian, > On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote: >> Hello, >> >> I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu >> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04) >> >> ceph version is Jewel (10.2.2). >> All tests have been done under Ubuntu 14.04 > > Knowing that you also have an internalis cluster on almost identical > hardware, can you please let the list know whether you see the same > behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on > that cluster as well? ceph version is infernalis (9.2.0) Ceph osd Benchmark: Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s Kernel 4.2.0-38-generic : ceph tell osd.ID => average ~90MB/s Kernel 4.4.0-24-generic : ceph tell osd.ID => average ~75MB/s The slow down is not as much as I have with Jewel but it is still present. Best Regards, -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] performance issue with jewel on ubuntu xenial (kernel)
Hello, I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04) ceph version is Jewel (10.2.2). All tests have been done under Ubuntu 14.04 Kernel 4.4 has a drop of 50% compared to 4.2 Kernel 4.4 has a drop of 40% compared to 3.13 details below: With the 3 kernel I have the same performance on disks : Raw benchmark: dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average ~220MB/s Filesystem mounted benchmark: dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average ~205MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average ~190MB/s Ceph osd Benchmark: Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~81MB/s Kernel 4.2.0-38-generic : ceph tell osd.ID => average ~109MB/s Kernel 4.4.0-24-generic : ceph tell osd.ID => average ~50MB/s Does anyone get a similar behaviour on their cluster ? Best regards -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] performance issue with jewel on ubuntu xenial (kernel)
Hello, I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04) ceph version is Jewel (10.2.2). All tests have been done under Ubuntu 14.04 Kernel 4.4 has a drop of 50% compared to 4.2 Kernel 4.4 has a drop of 40% compared to 3.13 details below: With the 3 kernel I have the same performance on disks : Raw benchmark: dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct => average ~220MB/s Filesystem mounted benchmark: dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 => average ~205MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync => average ~190MB/s Ceph osd Benchmark: Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~81MB/s Kernel 4.2.0-38-generic : ceph tell osd.ID => average ~109MB/s Kernel 4.4.0-24-generic : ceph tell osd.ID => average ~50MB/s Does anyone get a similar behaviour on their cluster ? Best regards -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] journal or cache tier on SSDs ?
Hello, I'd like some advices about the setup of a new ceph cluster. Here the use case : RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage. Most of the access will be in read only mode. Write access will only be done by the admin to update the datasets. We might use rbd some time to sync data as temp storage (when POSIX is needed) but performance will not be an issue here. We might use cephfs in the futur if that can replace a filesystem on rdb. We gonna start with 16 nodes (up to 24). The configuration of each node is : CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (12c/48t) Memory : 128GB OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) Journal or cache Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) OSD Disk : 10 x HGST ultrastar-7k6000 6TB Public Network : 1 x 10Gb/s Private Network : 1 x 10Gb/s OS : Ubuntu 16.04 Ceph version : Jewel The question is : journal or cache tier (read only) on the SD 400GB Intel S3300 DC ? Each disk is able to write sequentially at 220MB/s. SSDs can write at ~500MB/s. if we set 5 journals on each SSDs, SSD will still be the bottleneck (1GB/s vs 2GB/s). If we set the journal on OSDs, we can expect a good throughput in read on the disk (in case of data not in the cache) and write shouldn't be so bad too, even if we have random read on the OSD during the write ? SSDs as cache tier seem to be a better usage than only 5 journal on each ? Is that correct ? We gonna use an EC pool for big files (jerasure 8+2 I think) and a replicated pool for small files. If I check on http://ceph.com/pgcalc/, in this use case replicated pool: pg_num = 8192 for 160 OSDs but 16384 for 240 OSDs Ec pool : pg_num = 4096 and pgp_num = pg_num Should I set the pg_num to 8192 or 16384 ? what is the impact on the cluster if we set the pg_num to 16384 at the beginning ? 16384 is high, isn't it ? Thanks for your help -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to choose EC plugins and rulesets
Le 10/03/2016 09:26, Nick Fisk a écrit : > What is your intended use case RBD/FS/RGW? There are no major improvements > in Jewel that I am aware of. The big one will be when EC pools allow direct > partial overwrites without the use of a cache tier. The main goal is for RadosGW. Most of the access will be read only. We are interested also to use block device and later cephfs but it's not in our priority. And in those cases, we did not discuss about replicate or erasure yet. If you have some insight about this cases, we are also interested. Thnaks, Yoann >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Yoann Moulin >> Sent: 09 March 2016 16:01 >> To: ceph-us...@ceph.com >> Subject: [ceph-users] how to choose EC plugins and rulesets >> >> Hello, >> >> We are looking for recommendations and guidelines for using erasure codes >> (EC) with Ceph. >> >> Our setup consists of 25 identical nodes which we dedicate to Ceph. Each >> node contains 10 HDDs (full specs below) >> >> We started with 10 nodes (comprising 100 OSDs) and created a pool with 3- >> times replication. >> >> In order to increase the usable capacity, we would like to go for EC > instead of >> replication. >> >> - Can anybody share with us recommendations regarding the choice of >> plugins and rulesets? >> - In particular, how do we relate to the number of nodes and OSDs? Any >> formulas or rules of thumb? >> - Is it possible to change rulesets live on a pool? >> >> We currently use Infernalis but plan to move to Jewel. >> >> - Are there any improvement in Jewel with regard to erasure codes? >> >> Looking forward for your answers. >> >> >> = >> >> Full specs of nodes >> >> CPU: 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz >> Memory: 128GB of Memory >> OS Storage: 2 x SSD 240GB Intel S3500 DC (raid 1) Journal Storage: 2 x SSD >> 400GB Intel S3300 DC (no Raid) OSD Disk: 10 x HGST ultrastar-7k6000 6TB >> Network: 1 x 10Gb/s >> OS: Ubuntu 14.04 >> >> -- >> Yoann Moulin >> EPFL IC-IT >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to choose EC plugins and rulesets
Hello, We are looking for recommendations and guidelines for using erasure codes (EC) with Ceph. Our setup consists of 25 identical nodes which we dedicate to Ceph. Each node contains 10 HDDs (full specs below) We started with 10 nodes (comprising 100 OSDs) and created a pool with 3-times replication. In order to increase the usable capacity, we would like to go for EC instead of replication. - Can anybody share with us recommendations regarding the choice of plugins and rulesets? - In particular, how do we relate to the number of nodes and OSDs? Any formulas or rules of thumb? - Is it possible to change rulesets live on a pool? We currently use Infernalis but plan to move to Jewel. - Are there any improvement in Jewel with regard to erasure codes? Looking forward for your answers. = Full specs of nodes CPU: 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Memory: 128GB of Memory OS Storage: 2 x SSD 240GB Intel S3500 DC (raid 1) Journal Storage: 2 x SSD 400GB Intel S3300 DC (no Raid) OSD Disk: 10 x HGST ultrastar-7k6000 6TB Network: 1 x 10Gb/s OS: Ubuntu 14.04 -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSDs go down with infernalis
Hello, > If you manually create your journal partition, you need to specify the correct > Ceph partition GUID in order for the system and Ceph to identify the partition > as Ceph journal and affect correct ownership and permissions at boot via udev. In my latest run, I let ceph-ansible creating partitions, everything seem to be fine. > I used something like this to create the partition : > sudo sgdisk --new=1:0G:15G --typecode=1:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 > --partition-guid=$(uuidgen -r) --mbrtogpt -- /dev/sda > > 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 being the GUID. More info on GTP GUID is > available on wikipedia [1]. > > I think the issue with the label you had was linked to some bugs in the disk > initialization process. This was discussed a few weeks back on this mailing > list. > > [1] https://en.wikipedia.org/wiki/GUID_Partition_Table That what I read on the irc channel, it seem to be a common mistake, might be good to talk about that in the doc or FAQ ? Yoann > On Tue, Mar 8, 2016 at 5:21 PM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello Adrien, > > > I think I faced the same issue setting up my own cluster. If it is the > same, > > it's one of the many people encounter(ed) during disk initialization. > > Could you please give the output of : > > - ll /dev/disk/by-partuuid/ > > - ll /var/lib/ceph/osd/ceph-* > > unfortunately, I already reinstall my test cluster, but I got some > information > that might explain this issue. > > I was creating the journal partition before running the ansible playbook. > firstly, owner and right was not persistent at boot (had to add udev's > rules). > And I strongly suspect a side effect of not let ceph-disk create journal > partition. > > Yoann > > > On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch> > > <mailto:yoann.mou...@epfl.ch <mailto:yoann.mou...@epfl.ch>>> wrote: > > > > Hello, > > > > I'm (almost) a new user of ceph (couple of month). In my university, > we start to > > do some test with ceph a couple of months ago. > > > > We have 2 clusters. Each cluster have 100 OSDs on 10 servers : > > > > Each server as this setup : > > > > CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > > Memory : 128GB of Memory > > OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) > > Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) > > OSD Disk : 10 x HGST ultrastar-7k6000 6TB > > Network : 1 x 10Gb/s > > OS : Ubuntu 14.04 > > Ceph version : infernalis 9.2.0 > > > > One cluster give access to some user through a S3 gateway (service > is > still in > > beta). We call this cluster "ceph-beta". > > > > One cluster is for our internal need to learn more about ceph. We > call > this > > cluster "ceph-test". (those servers will be integrated into the > ceph-beta > > cluster when we will need more space) > > > > We have deploy both clusters with the ceph-ansible playbook[1] > > > > Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no > raid. 5 > > journals partitions on each SSDs. > > > > OSDs disk are format in XFS. > > > > 1. https://github.com/ceph/ceph-ansible > > > > We have an issue. Some OSDs go down and don't start. It seem to be > related to > > the fsid of the journal partition : > > > > > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal > FileJournal::open: > > ondisk fsid ---- doesn't match > expected > > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) > journal > > > > in attachment, the full logs of one of the dead OSDs > > > > We had this issue with 2 OSDs on ceph-beta cluster fixed by > removing, > zapping > > and readding it. > > > > Now, we have the same issue on ceph-test cluster but on 18 OSDs. > > > > Now the stats of this cluster > > > > > root@icadmin004:~# ceph -s > > > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8 > > > health HEA
Re: [ceph-users] OSDs go down with infernalis
Hello Adrien, > I think I faced the same issue setting up my own cluster. If it is the same, > it's one of the many people encounter(ed) during disk initialization. > Could you please give the output of : > - ll /dev/disk/by-partuuid/ > - ll /var/lib/ceph/osd/ceph-* unfortunately, I already reinstall my test cluster, but I got some information that might explain this issue. I was creating the journal partition before running the ansible playbook. firstly, owner and right was not persistent at boot (had to add udev's rules). And I strongly suspect a side effect of not let ceph-disk create journal partition. Yoann > On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.mou...@epfl.ch > <mailto:yoann.mou...@epfl.ch>> wrote: > > Hello, > > I'm (almost) a new user of ceph (couple of month). In my university, we > start to > do some test with ceph a couple of months ago. > > We have 2 clusters. Each cluster have 100 OSDs on 10 servers : > > Each server as this setup : > > CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > Memory : 128GB of Memory > OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) > Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) > OSD Disk : 10 x HGST ultrastar-7k6000 6TB > Network : 1 x 10Gb/s > OS : Ubuntu 14.04 > Ceph version : infernalis 9.2.0 > > One cluster give access to some user through a S3 gateway (service is > still in > beta). We call this cluster "ceph-beta". > > One cluster is for our internal need to learn more about ceph. We call > this > cluster "ceph-test". (those servers will be integrated into the ceph-beta > cluster when we will need more space) > > We have deploy both clusters with the ceph-ansible playbook[1] > > Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5 > journals partitions on each SSDs. > > OSDs disk are format in XFS. > > 1. https://github.com/ceph/ceph-ansible > > We have an issue. Some OSDs go down and don't start. It seem to be > related to > the fsid of the journal partition : > > > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal > FileJournal::open: > ondisk fsid ---- doesn't match expected > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal > > in attachment, the full logs of one of the dead OSDs > > We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, > zapping > and readding it. > > Now, we have the same issue on ceph-test cluster but on 18 OSDs. > > Now the stats of this cluster > > > root@icadmin004:~# ceph -s > > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8 > > health HEALTH_WARN > > 1024 pgs incomplete > > 1024 pgs stuck inactive > > 1024 pgs stuck unclean > > monmap e1: 3 mons at > > {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0 > > <http://10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0>} > > election epoch 62, quorum 0,1,2 > iccluster003,iccluster014,iccluster022 > > osdmap e242: 100 osds: 82 up, 82 in > > flags sortbitwise > > pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects > > 4812 MB used, 447 TB / 447 TB avail > > 1280 active+clean > > 1024 creating+incomplete > > We have install this cluster at the begin of February. We did not use that > cluster at all even at the begin to troubleshoot an issue with > ceph-ansible. We > did not push any data neither create pool. What could explain this > behaviour ? > > Thanks for your help > > Best regards, > > -- > Yoann Moulin > EPFL IC-IT > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSDs go down with infernalis
Hello, I'm (almost) a new user of ceph (couple of month). In my university, we start to do some test with ceph a couple of months ago. We have 2 clusters. Each cluster have 100 OSDs on 10 servers : Each server as this setup : CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Memory : 128GB of Memory OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1) Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid) OSD Disk : 10 x HGST ultrastar-7k6000 6TB Network : 1 x 10Gb/s OS : Ubuntu 14.04 Ceph version : infernalis 9.2.0 One cluster give access to some user through a S3 gateway (service is still in beta). We call this cluster "ceph-beta". One cluster is for our internal need to learn more about ceph. We call this cluster "ceph-test". (those servers will be integrated into the ceph-beta cluster when we will need more space) We have deploy both clusters with the ceph-ansible playbook[1] Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5 journals partitions on each SSDs. OSDs disk are format in XFS. 1. https://github.com/ceph/ceph-ansible We have an issue. Some OSDs go down and don't start. It seem to be related to the fsid of the journal partition : > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal FileJournal::open: > ondisk fsid ---- doesn't match expected > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal in attachment, the full logs of one of the dead OSDs We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, zapping and readding it. Now, we have the same issue on ceph-test cluster but on 18 OSDs. Now the stats of this cluster > root@icadmin004:~# ceph -s > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8 > health HEALTH_WARN > 1024 pgs incomplete > 1024 pgs stuck inactive > 1024 pgs stuck unclean > monmap e1: 3 mons at > {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0} > election epoch 62, quorum 0,1,2 > iccluster003,iccluster014,iccluster022 > osdmap e242: 100 osds: 82 up, 82 in > flags sortbitwise > pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects > 4812 MB used, 447 TB / 447 TB avail > 1280 active+clean > 1024 creating+incomplete We have install this cluster at the begin of February. We did not use that cluster at all even at the begin to troubleshoot an issue with ceph-ansible. We did not push any data neither create pool. What could explain this behaviour ? Thanks for your help Best regards, -- Yoann Moulin EPFL IC-IT 2016-03-03 14:09:00.433074 7efd1a9d5940 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 4446 2016-03-03 14:09:01.315583 7efd1a9d5940 0 filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342) 2016-03-03 14:09:01.338328 7efd1a9d5940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2016-03-03 14:09:01.338335 7efd1a9d5940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2016-03-03 14:09:01.338362 7efd1a9d5940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: splice is supported 2016-03-03 14:09:01.341468 7efd1a9d5940 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2016-03-03 14:09:01.341517 7efd1a9d5940 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: extsize is supported and your kernel >= 3.5 2016-03-03 14:09:01.411145 7efd1a9d5940 0 filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2016-03-03 14:09:01.692400 7efd1a9d5940 -1 journal FileJournal::open: ondisk fsid ---- doesn't match expected eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal 2016-03-03 14:09:01.694251 7efd1a9d5940 -1 os/FileJournal.h: In function 'virtual FileJournal::~FileJournal()' thread 7efd1a9d5940 time 2016-03-03 14:09:01.692413 os/FileJournal.h: 406: FAILED assert(fd == -1) ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7efd1a4cbf2b] 2: (()+0x2c2f80) [0x7efd19ed1f80] 3: (FileJournal::~FileJournal()+0x67e) [0x7efd1a1b476e] 4: (JournalingObjectStore::journal_replay(unsigned long)+0xbfa) [0x7efd1a1c353a] 5: (FileStore::mount()+0x3b42) [0x7efd1a198a62] 6: (OSD::init()+0x26d) [0x7efd19f51a5d] 7: (main()+0x2954) [0x7efd19ed7474] 8: (__libc_start_main()+0xf5) [0x7efd16d59ec5] 9: (()+0x2f82b7) [0x7efd19f072b7] NOTE: a copy of the executable, or `objdump -rdS ` is needed to inte
Re: [ceph-users] can not umount ceph osd partition
Hello, >>> I am using 0.94.5. When I try to umount partition and fsck it I have issue: >>> root@storage003:~# stop ceph-osd id=13 >>> ceph-osd stop/waiting >>> root@storage003:~# umount /var/lib/ceph/osd/ceph-13 >>> root@storage003:~# fsck -yf /dev/sdf >>> fsck from util-linux 2.20.1 >>> e2fsck 1.42.9 (4-Feb-2014) >>> /dev/sdf is in use. >>> e2fsck: Cannot continue, aborting. >>> >>> There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to >>> check >>> fs. >>> I can mount -o remount,rw, but I would like to umount device for maintenance >>> and, maybe, replace it. >>> >>> Why I can't umount? > >> is "lsof -n | grep /dev/sdf" give something ? > > Nothing. > >> and are you sure /dev/sdf is the disk for osd 13 ? > > Absolutelly. I have even tried fsck -yf /dev/disk/by-label/osd-13. No luck. > > Disk is mounted using LABEL in fstab, journal is symlink to > /dev/disk/by-partlabel/j-13. I think it's more linux related. could you try to look with lsof if something hold the device by the label or uuid instead of /dev/sdf ? you can try to delete the device from the scsi bus with something like : echo 1 > /sys/block//device/delete be careful, it is like removing the disk physically, if a process holds the device, you might expect that process gonna switch into kernel status "D+" . You won't be able to kill that process even by kill -9. To stop it, you will have to reboot the server. you can give a look here how to manipulate scsi bus: http://fibrevillage.com/storage/279-hot-add-remove-rescan-of-scsi-devices-on-linux you can install the package "scsitools" that provide rescan-scsi-bus.sh to rescan you scsi bus to get back your disk removed. http://manpages.ubuntu.com/manpages/precise/man8/rescan-scsi-bus.8.html hope that can help you -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can not umount ceph osd partition
Hello, > I am using 0.94.5. When I try to umount partition and fsck it I have issue: > root@storage003:~# stop ceph-osd id=13 > ceph-osd stop/waiting > root@storage003:~# umount /var/lib/ceph/osd/ceph-13 > root@storage003:~# fsck -yf /dev/sdf > fsck from util-linux 2.20.1 > e2fsck 1.42.9 (4-Feb-2014) > /dev/sdf is in use. > e2fsck: Cannot continue, aborting. > > There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to check > fs. > I can mount -o remount,rw, but I would like to umount device for maintenance > and, maybe, replace it. > > Why I can't umount? is "lsof -n | grep /dev/sdf" give something ? and are you sure /dev/sdf is the disk for osd 13 ? -- Yoann Moulin EPFL IC-IT ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com