from:"Yoann Moulin"

Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage

2019-09-24 Thread Yoann Moulin

Hello,

>> I have a Ceph Nautilus Cluster 14.2.1 for cephfs only on 40x 1.8T SAS disk 
>> (no SSD) in 20 servers.
>>
>> I often get "MDSs report slow requests" and plenty of "[WRN] 3 slow 
>> requests, 0 included below; oldest blocked for > 60281.199503 secs"
>>
>> After a few investigations, I saw that ALL ceph-osd process eat a lot of 
>> memory, up to 130GB RSS each. It this value normal? May this related to
>> slow requests? Is disk only increasing the probability to get slow requests?
>
> If you haven't set:
> 
> osd op queue cut off = high
> 
> in /etc/ceph/ceph.conf on your OSDs, I'd give that a try. It should
> help quite a bit with pure HDD clusters.

OK I'll try this, thanks.

If I want to add this my ceph-ansible playbook parameters, in which files I 
should add it and what is the best way to do it ?

Add those 3 lines in all.yml or osds.yml ?

ceph_conf_overrides:
  global:
osd_op_queue_cut_off: high

Is there another (better?) way to do that?

Thanks for your help.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage

2019-09-19 Thread Yoann Moulin

cluded 
> below; oldest blocked for > 62456.242289 secs
> 2019-09-19 08:52:58.960777 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62452.674936 secs
> 2019-09-19 08:53:03.960853 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62457.675011 secs
> 2019-09-19 08:53:07.528033 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62461.242354 secs
> 2019-09-19 08:53:12.528177 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62466.242487 secs
> 2019-09-19 08:53:08.960965 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62462.675123 secs
> 2019-09-19 08:53:13.961034 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62467.675195 secs
> 2019-09-19 08:53:17.528276 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62471.242592 secs
> 2019-09-19 08:53:22.528407 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62476.242729 secs
> 2019-09-19 08:53:18.961149 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62472.675310 secs
> 2019-09-19 08:53:23.961234 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62477.675392 secs
> 2019-09-19 08:53:27.528509 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62481.242832 secs
> 2019-09-19 08:53:32.528651 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62486.242961 secs
> 2019-09-19 08:53:28.961314 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62482.675471 secs
> 2019-09-19 08:53:33.961393 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62487.675549 secs
> 2019-09-19 08:53:37.528706 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62491.243031 secs
> 2019-09-19 08:53:42.528790 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62496.243105 secs
> 2019-09-19 08:53:38.961476 mds.icadmin006 [WRN] 10 slow requests, 1 included 
> below; oldest blocked for > 62492.675617 secs
> 2019-09-19 08:53:38.961485 mds.icadmin006 [WRN] slow request 61441.151061 
> seconds old, received at 2019-09-18 17:49:37.810351: 
> client_request(client.21441:176429 getattr pAsLsXsFs #0x1f2b1b3 
> 2019-09-18 17:49:37.806002 caller_uid=204878, caller_gid=11233{}) currently 
> failed to rdlock, waiting
> 2019-09-19 08:53:43.961569 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62497.675728 secs
> 2019-09-19 08:53:47.528891 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62501.243214 secs
> 2019-09-19 08:53:52.529021 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62506.243337 secs
> 2019-09-19 08:53:48.961685 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62502.675839 secs
> 2019-09-19 08:53:53.961792 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62507.675948 secs
> 2019-09-19 08:53:57.529113 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62511.243437 secs
> 2019-09-19 08:54:02.529224 mds.icadmin007 [WRN] 3 slow requests, 0 included 
> below; oldest blocked for > 62516.243546 secs
> 2019-09-19 08:53:58.961866 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62512.676025 secs
> 2019-09-19 08:54:03.961939 mds.icadmin006 [WRN] 10 slow requests, 0 included 
> below; oldest blocked for > 62517.676099 secs

Thanks for your help.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
[
{
"id": 651292,
"num_leases": 0,
"num_caps": 0,
"state": "open",
"request_load_avg": 0,
"uptime": 65094.458896163,
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.651292 v1:10.90.47.29:0/2037483206",
"client_metadata": {
"features": "00ff",
"entity_id": "labo04",
"hostname": "iccluster177.",
"kernel_version": "4.15.0-43-generic",
"root": "/labo04-scratch"
}
},
{
"id": 89226,
"num_leases": 0,
"num_caps": 0,
"state": "open",
"request_load_avg":

Re: [ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Yoann Moulin

>> I am doing some tests with Nautilus and cephfs on erasure coding pool.
>>
>> I noticed something strange between k+m in my erasure profile and 
>> size+min_size in the pool created:
>>
>>> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2
>>> crush-device-class=
>>> crush-failure-domain=osd
>>> crush-root=default
>>> jerasure-per-chunk-alignment=false
>>> k=4
>>> m=2
>>> plugin=jerasure
>>> technique=reed_sol_van
>>> w=8
>>
>>> test@icadmin004:~$ ceph --cluster test osd pool create cephfs_data 8 8 
>>> erasure ecpool-4-2
>>> pool 'cephfs_data' created
>>
>>> test@icadmin004:~$ ceph osd pool ls detail | grep cephfs_data
>>> pool 14 'cephfs_data' erasure size 6 min_size 5 crush_rule 1 object_hash 
>>> rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 2646
>>> flags hashpspool stripe_width 16384
>>
>> Why min_size = 5 and not 4 ?
>>
> this question comes up regularly and is been discussed just now:
> 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html

Oh thanks, I missed that thread, make sense. I agree with some comment that it 
is a little bit confusing.

Best,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Nautilus, k+m erasure coding a profile vs size+min_size

2019-05-21 Thread Yoann Moulin

Dear all,

I am doing some tests with Nautilus and cephfs on erasure coding pool.

I noticed something strange between k+m in my erasure profile and size+min_size 
in the pool created:

> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2
> crush-device-class=
> crush-failure-domain=osd
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=4
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8

> test@icadmin004:~$ ceph --cluster test osd pool create cephfs_data 8 8 
> erasure ecpool-4-2
> pool 'cephfs_data' created

> test@icadmin004:~$ ceph osd pool ls detail | grep cephfs_data
> pool 14 'cephfs_data' erasure size 6 min_size 5 crush_rule 1 object_hash 
> rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 2646 flags 
> hashpspool stripe_width 16384

Why min_size = 5 and not 4 ?

Best,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs free space issue

2019-01-09 Thread Yoann Moulin

iB 1.26TiB  391GiB 76.70 1.01 198 
> 181   hdd 1.63739  1.0 1.64TiB 1.27TiB  380GiB 77.33 1.02 186 
> 186   hdd 1.63739  1.0 1.64TiB 1.20TiB  451GiB 73.10 0.96 190 
> 182   hdd 1.63739  1.0 1.64TiB 1.31TiB  332GiB 80.20 1.06 204 
> 187   hdd 1.63739  1.0 1.64TiB 1.22TiB  424GiB 74.72 0.98 189 
> 183   hdd 1.63739  1.0 1.64TiB 1.33TiB  318GiB 81.05 1.07 206 
> 189   hdd 1.63739  1.0 1.64TiB 1.08TiB  576GiB 65.66 0.86 169 
> 184   hdd 1.63739  1.0 1.64TiB 1.21TiB  441GiB 73.70 0.97 183 
> 188   hdd 1.63739  1.0 1.64TiB 1.17TiB  474GiB 71.70 0.94 182 
> 190   hdd 1.63739  1.0 1.64TiB 1.27TiB  373GiB 77.75 1.02 195 
> 195   hdd 1.63739  1.0 1.64TiB 1.32TiB  327GiB 80.47 1.06 198 
> 191   hdd 1.63739  1.0 1.64TiB 1.16TiB  484GiB 71.15 0.94 183 
> 197   hdd 1.63739  1.0 1.64TiB 1.28TiB  370GiB 77.94 1.03 197 
> 192   hdd 1.63739  1.0 1.64TiB 1.26TiB  382GiB 77.24 1.02 200 
> 196   hdd 1.63739  1.0 1.64TiB 1.24TiB  402GiB 76.02 1.00 201 
> 193   hdd 1.63739  1.0 1.64TiB 1.24TiB  409GiB 75.59 1.00 186 
> 198   hdd 1.63739  1.0 1.64TiB 1.15TiB  501GiB 70.13 0.92 175 
> 194   hdd 1.63739  1.0 1.64TiB 1.29TiB  353GiB 78.98 1.04 202 
> 199   hdd 1.63739  1.0 1.64TiB 1.34TiB  309GiB 81.58 1.07 221 
>  TOTAL 65.5TiB 49.7TiB 15.8TiB 75.94  
> MIN/MAX VAR: 0.86/1.09  STDDEV: 3.92

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Yoann Moulin


> root@pf-us1-dfs3:/home/rodrigo# ceph osd crush rule dump
> [
>    {
>    "rule_id": 0,
>    "rule_name": "replicated_rule",
>    "ruleset": 0,
>    "type": 1,
>    "min_size": 1,
>    "max_size": 10,
>    "steps": [
>    {
>    "op": "take",
>    "item": -1,
>    "item_name": "default"
>    },
>    {
>    "op": "chooseleaf_firstn",
>    "num": 0,
>    "type": "host"
>    }

This means the failure domain is set to "host", the cluster will try to balance 
objects between "hosts" to be able to lose one host and be able
to keep data online.

You can change this to "disk" but in that case, your cluster will tolerate the 
failure of one disk but you won't be able to lose one server, you
won't have the warranty that all replica of an object will be on different 
hosts.

The best thing you can do here is added two disks to pf-us1-dfs3.

The second one would be, moving one disk from one of the 2 other servers to 
pf-us1-dfs3 if you can't quickly get  new disks. I don't know what
is the best way to do that, I never had this case on my cluster.

Best regards,

Yoann

> On Tue, Jan 8, 2019 at 11:35 AM Yoann Moulin  <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> > Hi Yoann, thanks for your response.
> > Here are the results of the commands.
> >
> > root@pf-us1-dfs2:/var/log/ceph# ceph osd df
> > ID CLASS WEIGHT  REWEIGHT SIZE    USE AVAIL   %USE  VAR  PGS  
> > 0   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310  
> > 5   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271  
> > 6   hdd 7.27739  1.0 7.3 TiB 609 GiB 6.7 TiB  8.17 0.15  49  
> > 8   hdd 7.27739  1.0 7.3 TiB 2.5 GiB 7.3 TiB  0.03    0  42  
> > 1   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285  
> > 3   hdd 7.27739  1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296  
> > 7   hdd 7.27739  1.0 7.3 TiB 360 GiB 6.9 TiB  4.84 0.09  53  
> > 9   hdd 7.27739  1.0 7.3 TiB 4.1 GiB 7.3 TiB  0.06 0.00  38  
> > 2   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321  
> > 4   hdd 7.27739  1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351  
> >    TOTAL  73 TiB  39 TiB  34 TiB 53.13   
> > MIN/MAX VAR: 0/1.79  STDDEV: 41.15
> 
> It looks like you don't have a good balance between your OSD, what is 
> your failure domain ?
> 
> could you provide your crush map 
> http://docs.ceph.com/docs/luminous/rados/operations/crush-map/
> 
> ceph osd crush tree
> ceph osd crush rule ls
> ceph osd crush rule dump
> 
> 
> > root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail
> > pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 128 pgp_num 128 last_change 471 fla
> > gs hashpspool,full stripe_width 0
> > pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 471 lf
> > or 0/439 flags hashpspool,full stripe_width 0 application cephfs
> > pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 47
> > 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs
> > pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha
> > shpspool,full stripe_width 0 application rgw
> > pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 47
> > 1 flags hashpspool,full stripe_width 0 application rgw
> > pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f
> > lags hashpspool,full stripe_width 0 application rgw
> > pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl
> > ags hashpspool,full stripe_width 0 application rgw
> 
> You may need to increase the pg num for cephfs_data pool. But before, you 
> must understand what is the impact https://ceph.com/pgcalc/
> you can't decrease pg_num, if it set too high you may have trouble in 
> your cluster.
> 
> > root@pf-us1-dfs2:/var/log/ceph

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Yoann Moulin

Hello,

> Hi Yoann, thanks for your response.
> Here are the results of the commands.
> 
> root@pf-us1-dfs2:/var/log/ceph# ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZE    USE AVAIL   %USE  VAR  PGS  
> 0   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 571 GiB 92.33 1.74 310  
> 5   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.18 1.45 271  
> 6   hdd 7.27739  1.0 7.3 TiB 609 GiB 6.7 TiB  8.17 0.15  49  
> 8   hdd 7.27739  1.0 7.3 TiB 2.5 GiB 7.3 TiB  0.03    0  42  
> 1   hdd 7.27739  1.0 7.3 TiB 5.6 TiB 1.7 TiB 77.28 1.45 285  
> 3   hdd 7.27739  1.0 7.3 TiB 6.9 TiB 371 GiB 95.02 1.79 296  
> 7   hdd 7.27739  1.0 7.3 TiB 360 GiB 6.9 TiB  4.84 0.09  53  
> 9   hdd 7.27739  1.0 7.3 TiB 4.1 GiB 7.3 TiB  0.06 0.00  38  
> 2   hdd 7.27739  1.0 7.3 TiB 6.7 TiB 576 GiB 92.27 1.74 321  
> 4   hdd 7.27739  1.0 7.3 TiB 6.1 TiB 1.2 TiB 84.10 1.58 351  
>    TOTAL  73 TiB  39 TiB  34 TiB 53.13   
> MIN/MAX VAR: 0/1.79  STDDEV: 41.15

It looks like you don't have a good balance between your OSD, what is your 
failure domain ?

could you provide your crush map 
http://docs.ceph.com/docs/luminous/rados/operations/crush-map/

ceph osd crush tree
ceph osd crush rule ls
ceph osd crush rule dump


> root@pf-us1-dfs2:/var/log/ceph# ceph osd pool ls detail
> pool 1 'poolcephfs' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 128 pgp_num 128 last_change 471 fla
> gs hashpspool,full stripe_width 0
> pool 2 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 256 pgp_num 256 last_change 471 lf
> or 0/439 flags hashpspool,full stripe_width 0 application cephfs
> pool 3 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 256 pgp_num 256 last_change 47
> 1 lfor 0/448 flags hashpspool,full stripe_width 0 application cephfs
> pool 4 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash 
> rjenkins pg_num 8 pgp_num 8 last_change 471 flags ha
> shpspool,full stripe_width 0 application rgw
> pool 5 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 47
> 1 flags hashpspool,full stripe_width 0 application rgw
> pool 6 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 f
> lags hashpspool,full stripe_width 0 application rgw
> pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 471 fl
> ags hashpspool,full stripe_width 0 application rgw

You may need to increase the pg num for cephfs_data pool. But before, you must 
understand what is the impact https://ceph.com/pgcalc/
you can't decrease pg_num, if it set too high you may have trouble in your 
cluster.

> root@pf-us1-dfs2:/var/log/ceph# ceph osd tree
> ID CLASS WEIGHT   TYPE NAME    STATUS REWEIGHT PRI-AFF  
> -1   72.77390 root default  
> -3   29.10956 host pf-us1-dfs1  
> 0   hdd  7.27739 osd.0    up  1.0 1.0  
> 5   hdd  7.27739 osd.5    up  1.0 1.0  
> 6   hdd  7.27739 osd.6    up  1.0 1.0  
> 8   hdd  7.27739 osd.8    up  1.0 1.0  
> -5   29.10956 host pf-us1-dfs2  
> 1   hdd  7.27739 osd.1    up  1.0 1.0  
> 3   hdd  7.27739 osd.3    up  1.0 1.0  
> 7   hdd  7.27739 osd.7    up  1.0 1.0  
> 9   hdd  7.27739 osd.9    up  1.0 1.0  
> -7   14.55478 host pf-us1-dfs3  
> 2   hdd  7.27739 osd.2    up  1.0 1.0  
> 4   hdd  7.27739 osd.4    up  1.0 1.0

You really should add 2 disks to pf-us1-dfs3, currently, the cluster tries to 
balance data between the 3 hosts, (replica 3, failure domain set to
'host' I guess). Each host will store 1/3 of data (1 replica) pf-us1-dfs3  only 
have half of the 2 others, you won't be able to put more than 3x
(osd.2+osd.4) even though there are free spaces on others OSDs.

Best regards,

Yoann

> On Tue, Jan 8, 2019 at 10:36 AM Yoann Moulin  <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> > Hi guys, I need your help.
> > I'm new with Cephfs and we started using it as file storage.
> > Today we are getting no space left on device but I'm seeing that we 
> have plenty space on the filesystem.
> > Filesystem              Size  Used Avail Use% Mounted on
> > 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts   
> 73T   39T   35T  54% /mnt/cephfs
>

Re: [ceph-users] Problem with CephFS - No space left on device

2019-01-08 Thread Yoann Moulin

Hello,

> Hi guys, I need your help.
> I'm new with Cephfs and we started using it as file storage.
> Today we are getting no space left on device but I'm seeing that we have 
> plenty space on the filesystem.
> Filesystem              Size  Used Avail Use% Mounted on
> 192.168.51.8,192.168.51.6,192.168.51.118:6789:/pagefreezer/smhosts   73T   
> 39T   35T  54% /mnt/cephfs
> 
> We have 35TB of disk space. I've added 2 additional OSD disks with 7TB each 
> but I'm getting the error "No space left on device" every time that
> I want to add a new file.
> After adding the 2 additional OSD disks I'm seeing that the load is beign 
> distributed among the cluster.
> Please I need your help.

Could you give us the output of

ceph osd df
ceph osd pool ls detail
ceph osd tree

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread Yoann Moulin

Hello John,

> Hello Yoann. I am working with similar issues at the moment in a biotech 
> company in Denmark.
> 
> First of all what authentication setup are you using?

ldap with sssd

> If you are using sssd there is a very simple and useful utility called 
> sss_override
> You can óverride' the uid which you get from LDAP with the genuine one.

That's one of the option, I'm just asking if there are others or simpler 
solution.

> Oops. On reading your email more closely.
> Why not just add ceph to your /etc/group  file?

I tried but there is some side effect.

I gave a look to the postinst script in ceph-common and I may find a way to fix 
this issue :

> # Let the admin override these distro-specified defaults.  This is NOT
> # recommended!
> [ -f "/etc/default/ceph" ] && . /etc/default/ceph
> 
> [ -z "$SERVER_HOME" ] && SERVER_HOME=/var/lib/ceph
> [ -z "$SERVER_USER" ] && SERVER_USER=ceph
> [ -z "$SERVER_NAME" ] && SERVER_NAME="Ceph storage service"
> [ -z "$SERVER_GROUP" ] && SERVER_GROUP=ceph
> [ -z "$SERVER_UID" ] && SERVER_UID=64045  # alloc by Debian base-passwd 
> maintainer
> [ -z "$SERVER_GID" ] && SERVER_GID=$SERVER_UID

I can change the SERVER_UID / SERVER_GID and or SERVER_USER

I'm gonna try to create a specific ceph user in the ldap and use it for ceph 
install.

Yoann


> On 15 May 2018 at 08:58, Yoann Moulin <yoann.mou...@epfl.ch 
> <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I 
> have to install ceph-common to mount a cephfs filesystem but ceph-common
> fails because a user with uid 65045 already exist with a group also set 
> at 65045.
> 
> Server under Ubuntu 16.04.4 LTS
> 
> > Setting up ceph-common (12.2.5-1xenial) ...
> > Adding system user cephdone
> > Setting system user ceph properties..usermod: group 'ceph' does not 
> exist
> > dpkg: error processing package ceph-common (--configure):
> >  subprocess installed post-installation script returned error exit 
> status 6
> 
> The user is correctly created but the group not.
> 
> > # grep ceph /etc/passwd           
> > ceph:x:64045:64045::/home/ceph:/bin/false
> > # grep ceph /etc/group
> > #
> Is there a workaround for that?
> 
> -- 
> Yoann Moulin
> EPFL IC-IT
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread Yoann Moulin

Hello,

I'm facing an issue with ceph's UID/GID 65045 on an LDAPized server, I have to 
install ceph-common to mount a cephfs filesystem but ceph-common
fails because a user with uid 65045 already exist with a group also set at 
65045.

Server under Ubuntu 16.04.4 LTS

> Setting up ceph-common (12.2.5-1xenial) ...
> Adding system user cephdone
> Setting system user ceph properties..usermod: group 'ceph' does not exist
> dpkg: error processing package ceph-common (--configure):
>  subprocess installed post-installation script returned error exit status 6

The user is correctly created but the group not.

> # grep ceph /etc/passwd   
> ceph:x:64045:64045::/home/ceph:/bin/false
> # grep ceph /etc/group
> # 
Is there a workaround for that?

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent

2018-02-22 Thread Yoann Moulin

Le 22/02/2018 à 05:23, Brad Hubbard a écrit :
> On Wed, Feb 21, 2018 at 6:40 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>> Hello,
>>
>> I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible 
>> playbook), a few days after, ceph status told me "PG_DAMAGED
>> Possible data damage: 1 pg inconsistent", I tried to repair the PG without 
>> success, I tried to stop the OSD, flush the journal and restart the
>> OSDs but the OSD refuse to start due to a bad journal. I decided to destroy 
>> the OSD and recreated it from scratch. After that, everything seemed
>> to be all right, but, I just saw now I have exactly the same error again on 
>> the same PG on the same OSD (78).
>>
>>> $ ceph health detail
>>> HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent
>>> OSD_SCRUB_ERRORS 3 scrub errors
>>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>>> pg 11.5f is active+clean+inconsistent, acting [78,154,170]
>>
>>> $ ceph -s
>>>   cluster:
>>> id: f9dfd27f-c704-4d53-9aa0-4a23d655c7c4
>>> health: HEALTH_ERR
>>> 3 scrub errors
>>> Possible data damage: 1 pg inconsistent
>>>
>>>   services:
>>> mon: 3 daemons, quorum 
>>> iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch
>>> mgr: iccluster001(active), standbys: iccluster009, iccluster017
>>> mds: cephfs-3/3/3 up  
>>> {0=iccluster022.iccluster.epfl.ch=up:active,1=iccluster006.iccluster.epfl.ch=up:active,2=iccluster014.iccluster.epfl.ch=up:active}
>>> osd: 180 osds: 180 up, 180 in
>>> rgw: 6 daemons active
>>>
>>>   data:
>>> pools:   29 pools, 10432 pgs
>>> objects: 82862k objects, 171 TB
>>> usage:   515 TB used, 465 TB / 980 TB avail
>>> pgs: 10425 active+clean
>>>  6 active+clean+scrubbing+deep
>>>  1 active+clean+inconsistent
>>>
>>>   io:
>>> client:   21538 B/s wr, 0 op/s rd, 33 op/s wr
>>
>>> ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous 
>>> (stable)
>>
>> Short log :
>>
>>> 2018-02-21 09:08:33.408396 7fb7b8222700  0 log_channel(cluster) log [DBG] : 
>>> 11.5f repair starts
>>> 2018-02-21 09:08:33.727277 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
>>> 11.5f shard 78: soid 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
>>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9- 
>>> b494-57bdb48fab4e.314528.19:head(98394'20014544 osd.78.0:1623704 
>>> dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd  od d46bb5a1 
>>> alloc_hint [0 0 0])
>>> 2018-02-21 09:08:33.727290 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
>>> 11.5f shard 154: soid 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
>>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544
>>>  osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd 
>>>  od d46bb5a1 alloc_hint [0 0 0])
>>> 2018-02-21 09:08:33.727293 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
>>> 11.5f shard 170: soid 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
>>> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544
>>>  osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd 
>>>  od d46bb5a1 alloc_hint [0 0 0])
>>> 2018-02-21 09:08:33.727295 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
>>> 11.5f soid 
>>> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head: 
>>> failed to pick suitable auth object
>>> 2018-02-21 09:08:33.727333 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
>>> 11.5f repair 3 errors, 0 fixed
>>
>> I set "debug_osd 20/20" on osd.78 and start the repair again, the log file 
>> is here :
>>
>> ceph-post-file: 1ccac8ea-0947-4fe4-90b1-32d1048548f1
>>
>> What can I do in that situation ?
> 
> Take a look and see if http://tracker.ceph.com/issues/21388 is
> relevant as well as the debugging and advice therein.

Indeed, it looks like similar to my issue.

I sent a comment directly on tracker, Thanks.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG_DAMAGED Possible data damage: 1 pg inconsistent

2018-02-21 Thread Yoann Moulin

Hello,

I migrated my cluster from jewel to luminous 3 weeks ago (using ceph-ansible 
playbook), a few days after, ceph status told me "PG_DAMAGED
Possible data damage: 1 pg inconsistent", I tried to repair the PG without 
success, I tried to stop the OSD, flush the journal and restart the
OSDs but the OSD refuse to start due to a bad journal. I decided to destroy the 
OSD and recreated it from scratch. After that, everything seemed
to be all right, but, I just saw now I have exactly the same error again on the 
same PG on the same OSD (78).

> $ ceph health detail
> HEALTH_ERR 3 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 3 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 11.5f is active+clean+inconsistent, acting [78,154,170]

> $ ceph -s
>   cluster:
> id: f9dfd27f-c704-4d53-9aa0-4a23d655c7c4
> health: HEALTH_ERR
> 3 scrub errors
> Possible data damage: 1 pg inconsistent
>  
>   services:
> mon: 3 daemons, quorum 
> iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch
> mgr: iccluster001(active), standbys: iccluster009, iccluster017
> mds: cephfs-3/3/3 up  
> {0=iccluster022.iccluster.epfl.ch=up:active,1=iccluster006.iccluster.epfl.ch=up:active,2=iccluster014.iccluster.epfl.ch=up:active}
> osd: 180 osds: 180 up, 180 in
> rgw: 6 daemons active
>  
>   data:
> pools:   29 pools, 10432 pgs
> objects: 82862k objects, 171 TB
> usage:   515 TB used, 465 TB / 980 TB avail
> pgs: 10425 active+clean
>  6 active+clean+scrubbing+deep
>  1 active+clean+inconsistent
>  
>   io:
> client:   21538 B/s wr, 0 op/s rd, 33 op/s wr

> ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous 
> (stable)

Short log :

> 2018-02-21 09:08:33.408396 7fb7b8222700  0 log_channel(cluster) log [DBG] : 
> 11.5f repair starts
> 2018-02-21 09:08:33.727277 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
> 11.5f shard 78: soid 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9- 
> b494-57bdb48fab4e.314528.19:head(98394'20014544 osd.78.0:1623704 
> dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd  od d46bb5a1 
> alloc_hint [0 0 0])
> 2018-02-21 09:08:33.727290 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
> 11.5f shard 154: soid 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544
>  osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd 
>  od d46bb5a1 alloc_hint [0 0 0])
> 2018-02-21 09:08:33.727293 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
> 11.5f shard 170: soid 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head 
> omap_digest 0x29fdd712 != omap_digest 0xd46bb5a1 from auth oi 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head(98394'20014544
>  osd.78.0:1623704 dirty|omap|data_digest|omap_digest s 0 uv 20014543 dd 
>  od d46bb5a1 alloc_hint [0 0 0])
> 2018-02-21 09:08:33.727295 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
> 11.5f soid 
> 11:fb71fe10:::.dir.c9724aff-5fa0-4dd9-b494-57bdb48fab4e.314528.19:head: 
> failed to pick suitable auth object
> 2018-02-21 09:08:33.727333 7fb7b8222700 -1 log_channel(cluster) log [ERR] : 
> 11.5f repair 3 errors, 0 fixed

I set "debug_osd 20/20" on osd.78 and start the repair again, the log file is 
here :

ceph-post-file: 1ccac8ea-0947-4fe4-90b1-32d1048548f1

What can I do in that situation ?

Thanks for your help.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous/Ubuntu 16.04 kernel recommendation ?

2018-02-04 Thread Yoann Moulin

Hello,

What is the best kernel for Luminous on Ubuntu 16.04 ?

Is linux-image-virtual-lts-xenial still the best one ? Or 
linux-virtual-hwe-16.04 will offer some improvement ?

Thanks,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] [Docs] s/ceph-disk/ceph-volume/g ?

2017-12-04 Thread Yoann Moulin

Hello,

By the fact ceph-disk is now deprecated, that would be great to update 
documentation to have also processes with ceph-volume.

for example :

add-or-rm-osds => 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

bluestore-migration => 
http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/

In my opinion, documentation for luminous branch should keep both options 
(ceph-disk and ceph-volume) but with a warning message to
encourage people to use ceph-volume instead of ceph-disk.

I guess, there is plenty of reference to ceph-disk that need to be updated.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph-disk is now deprecated

2017-11-29 Thread Yoann Moulin

Le 27/11/2017 à 14:36, Alfredo Deza a écrit :
> For the upcoming Luminous release (12.2.2), ceph-disk will be
> officially in 'deprecated' mode (bug fixes only). A large banner with
> deprecation information has been added, which will try to raise
> awareness.
> 
> We are strongly suggesting using ceph-volume for new (and old) OSD
> deployments. The only current exceptions to this are encrypted OSDs
> and FreeBSD systems
> 
> Encryption support is planned and will be coming soon to ceph-volume.
> 
> A few items to consider:
> 
> * ceph-disk is expected to be fully removed by the Mimic release
> * Existing OSDs are supported by ceph-volume. They can be "taken over" [0]
> * ceph-ansible already fully supports ceph-volume and will soon default to it
> * ceph-deploy support is planned and should be fully implemented soon
> 
> 
> [0] http://docs.ceph.com/docs/master/ceph-volume/simple/
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Is that possible to update the "add-or-rm-osds" documentation to have also the 
process with ceph-volume. That would help to the adoption.

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

This page should be updated as well with ceph-volume command.

http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/

Documentation (at least for master, maybe for luminous) should keep both 
options (ceph-disk and ceph-volume) but with a warning message to
encourage people to use ceph-volume instead of ceph-disk.

I agree with comments here that say changing the status of ceph-disk as 
deprecated in a minor release is not what I expect for a stable storage
systems but I also understand the necessity to move forward with ceph-volume 
(and bluestore). I think keeping ceph-disk in mimic is necessary,
even though there is no update, just for compatibility with old scripts.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph S3 nginx Proxy

2017-11-03 Thread Yoann Moulin

Hello,

>> I am trying to set up an ceph cluster with an s3 buckets setup with an
>> nignx proxy.
>>
>> I have the ceph and s3 parts working. :D
>>
>> when i run my php script through the nginx proxy i get an error
>> "> encoding="UTF-8"?>SignatureDoesNotMatch"
>>
>>
>> but direct it works fine.
>>
>> Has any one come across this before and can help out?
>
> My conf (may not be optimal):
> 
> server {
>   listen 443 ssl http2;
>   listen [::]:443 ssl http2;
>   server_name FQDN;
> 
>   ssl_certificate /etc/ssl/certs/FQDN.crt;
>   ssl_certificate_key /etc/ssl/private/FQDN.key;
>   add_header Strict-Transport-Security 'max-age=31536000; preload';
> 
>   location / {
>   include proxy_params;
>   proxy_redirect off;
>   proxy_pass http://127.0.0.1:1234;
>   client_max_body_size 0;
>   proxy_buffering off;
>   }
> }

By default in proxy_params, I don't see this line :

  proxy_set_header Host $host;

here, the default proxy_parms on ubuntu 16.04 :

$ cat proxy_params
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

I don't know if "Host $http_host;" is equivalent to "Host $host;"

> And ceph's:
> [client.radosgw.gateway]
> host = rgw
> rgw_frontends = civetweb port=127.0.0.1:1234
> keyring = /etc/ceph/keyring.radosgw.gateway

In my rgw section I also have this :

  rgw dns name = 

that allows s3cmd to access to bucket with %(bucket)s.test.iccluster.epfl.ch URL

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Yoann Moulin


>> I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) 
>> and we hit the "X clients failing to respond to cache pressure" message.
>> I have 3 mds servers active.
> 
> What type of client? Kernel? FUSE?
> 
> If it's a kernel client, what kernel are you running?

kernel client, version 4.10.0-35-generic, it's for kubernetes environment

https://kubernetes.io/docs/concepts/storage/volumes/#cephfs
https://github.com/kubernetes/examples/tree/master/staging/volumes/cephfs/

containers use this yaml template :

https://github.com/kubernetes/examples/blob/master/staging/volumes/cephfs/cephfs.yaml

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Luminous : 3 clients failing to respond to cache pressure

2017-10-17 Thread Yoann Moulin

Hello,

I have a luminous (12.2.1) cluster with 3 nodes for cephfs (no rbd or rgw) and 
we hit the "X clients failing to respond to cache pressure" message.
I have 3 mds servers active.

Is this something I have to worry about ?

here some information about the cluster :

> root@iccluster054:~# ceph --cluster container -s
>   cluster:
> id: a294a95a-0baa-4641-81c1-7cd70fd93216
> health: HEALTH_WARN
> 3 clients failing to respond to cache pressure
>  
>   services:
> mon: 3 daemons, quorum 
> iccluster041.iccluster.epfl.ch,iccluster042.iccluster.epfl.ch,iccluster054.iccluster.epfl.ch
> mgr: iccluster042(active), standbys: iccluster054
> mds: cephfs-3/3/3 up  
> {0=iccluster054.iccluster.epfl.ch=up:active,1=iccluster041.iccluster.epfl.ch=up:active,2=iccluster042.iccluster.epfl.ch=up:active}
> osd: 18 osds: 18 up, 18 in
>  
>   data:
> pools:   3 pools, 544 pgs
> objects: 2357k objects, 564 GB
> usage:   2011 GB used, 65055 GB / 67066 GB avail
> pgs: 544 active+clean
>  



> root@iccluster041:~# ceph --cluster container daemon 
> mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 193508283,
> "reply": 192815355,
> "reply_latency": {
> "avgcount": 192815355,
> "sum": 457371.475011160,
> "avgtime": 0.002372069
> },
> "forward": 692928,
> "dir_fetch": 1717132,
> "dir_commit": 43521,
> "dir_split": 4197,
> "dir_merge": 4244,
> "inode_max": 2147483647,
> "inodes": 11098,
> "inodes_top": 7668,
> "inodes_bottom": 3404,
> "inodes_pin_tail": 26,
> "inodes_pinned": 143,
> "inodes_expired": 138623,
> "inodes_with_caps": 87,
> "caps": 239,
> "subtrees": 15,
> "traverse": 195425369,
> "traverse_hit": 192867085,
> "traverse_forward": 692723,
> "traverse_discover": 476,
> "traverse_dir_fetch": 1714684,
> "traverse_remote_ino": 0,
> "traverse_lock": 6,
> "load_cent": 19465322425,
> "q": 0,
> "exported": 1211,
> "exported_inodes": 845556,
> "imported": 1082,
> "imported_inodes": 1209280
> }
> }


> root@iccluster041:~# ceph --cluster container daemon 
> mds.iccluster041.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 193508283,
> "reply": 192815355,
> "reply_latency": {
> "avgcount": 192815355,
> "sum": 457371.475011160,
> "avgtime": 0.002372069
> },
> "forward": 692928,
> "dir_fetch": 1717132,
> "dir_commit": 43521,
> "dir_split": 4197,
> "dir_merge": 4244,
> "inode_max": 2147483647,
> "inodes": 11098,
> "inodes_top": 7668,
> "inodes_bottom": 3404,
> "inodes_pin_tail": 26,
> "inodes_pinned": 143,
> "inodes_expired": 138623,
> "inodes_with_caps": 87,
> "caps": 239,
> "subtrees": 15,
> "traverse": 195425369,
> "traverse_hit": 192867085,
> "traverse_forward": 692723,
> "traverse_discover": 476,
> "traverse_dir_fetch": 1714684,
> "traverse_remote_ino": 0,
> "traverse_lock": 6,
> "load_cent": 19465322425,
> "q": 0,
> "exported": 1211,
> "exported_inodes": 845556,
> "imported": 1082,
> "imported_inodes": 1209280
> }
> }

> root@iccluster054:~# ceph --cluster container daemon 
> mds.iccluster054.iccluster.epfl.ch perf dump mds
> {
> "mds": {
> "request": 267620366,
> "reply": 255792944,
> "reply_latency": {
> "avgcount": 255792944,
> "sum": 42256.407340600,
> "avgtime": 0.000165197
> },
> "forward": 11827411,
> "dir_fetch": 183,
> "dir_commit": 2607,
> "dir_split": 27,
> "dir_merge": 19,
> "inode_max": 2147483647,
> "inodes": 3740,
> "inodes_top": 2517,
> "inodes_bottom": 1149,
> "inodes_pin_tail": 74,
> "inodes_pinned": 143,
> "inodes_expired": 2103018,
> "inodes_with_caps": 57,
> "caps": 272,
> "subtrees": 8,
> "traverse": 267626346,
> "traverse_hit": 255796915,
> "traverse_forward": 11826902,
> "traverse_discover": 77,
> "traverse_dir_fetch": 30,
> "traverse_remote_ino": 0,
> "traverse_lock": 0,
> "load_cent": 26824996745,
> "q": 3,
> "exported": 1319,
> "exported_inodes": 2037400,
> "imported": 418,
> "imported_inodes": 7347
> }
> }

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unable to restrict a CephFS client to a subdirectory

2017-10-10 Thread Yoann Moulin


>> I am trying to follow the instructions at:
>> http://docs.ceph.com/docs/master/cephfs/client-auth/
>> to restrict a client to a subdirectory of  Ceph filesystem, but always get
>> an error.
>>
>> We are running the latest stable release of Ceph (v12.2.1) on CentOS 7
>> servers. The user 'hydra' has the following capabilities:
>> # ceph auth get client.hydra
>> exported keyring for client.hydra
>> [client.hydra]
>> key = AQ==
>> caps mds = "allow rw"
>> caps mgr = "allow r"
>> caps mon = "allow r"
>> caps osd = "allow rw"
>>
>> When I tried to restrict the client to only mount and work within the
>> directory /hydra of the Ceph filesystem 'pulpos', I got an error:
>> # ceph fs authorize pulpos client.hydra /hydra rw
>> Error EINVAL: key for client.dong exists but cap mds does not match
>>
>> I've tried a few combinations of user caps and CephFS client caps; but
>> always got the same error!
> 
> The "fs authorize" command isn't smart enough to edit existing
> capabilities safely, so it is cautious and refuses to overwrite what
> is already there.  If you remove your client.hydra user and try again,
> it should create it for you with the correct capabilities.

I confirm it works perfectly ! it should be added to the documentation. :)

# ceph fs authorize cephfs client.foo1 /foo1 rw
[client.foo1]
key = XXX1
# ceph fs authorize cephfs client.foo2 / r /foo2 rw
[client.foo2]
key = XXX2

# ceph auth get client.foo1
exported keyring for client.foo1
[client.foo1]
key = XXX1
caps mds = "allow rw path=/foo1"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data"

# ceph auth get client.foo2
exported keyring for client.foo2
[client.foo2]
key = XXX2
caps mds = "allow r, allow rw path=/foo2"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data"

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Unable to restrict a CephFS client to a subdirectory

2017-10-10 Thread Yoann Moulin

Hello,

> I am trying to follow the instructions at:
> http://docs.ceph.com/docs/master/cephfs/client-auth/
> to restrict a client to a subdirectory of  Ceph filesystem, but always get an 
> error.
> 
> We are running the latest stable release of Ceph (v12.2.1) on CentOS 7 
> servers. The user 'hydra' has the following capabilities:
> # ceph auth get client.hydra
> exported keyring for client.hydra
> [client.hydra]
>         key = AQ==
>         caps mds = "allow rw"
>         caps mgr = "allow r"
>         caps mon = "allow r"
>         caps osd = "allow rw"
> 
> When I tried to restrict the client to only mount and work within the 
> directory /hydra of the Ceph filesystem 'pulpos', I got an error:
> # ceph fs authorize pulpos client.hydra /hydra rw
> Error EINVAL: key for client.dong exists but cap mds does not match
> 
> I've tried a few combinations of user caps and CephFS client caps; but always 
> got the same error!
> 
> Has anyone able to get this to work? What is your recipe?

In the case, the client runs an old kernel (at least 4.4 is old, 4.10 is not), 
you need to give a read access to the entire cephfs fs, if not,
you won't be able to mount the subdirectory.

1/ give read access to the mds and rw to the subdirectory :

  # ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
pool=cephfs_data" mds "allow r, allow rw path=/foo"

or, if client.foo already exist :

  # ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds 
"allow r, allow rw path=/foo"

[client.foo]
key = XXX
caps mds = "allow r, allow rw path=/foo"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data"

2/ you give read access to / and rw access to the subdirectory :

  # ceph fs authorize cephfs client.foo / r /foo rw

Then you get the secret key and mount :

  # ceph --cluster container auth get-key client.foo > foo.secret
  # mount.ceph mds1,mds2,mds3:/foo /foo -v -o 
name=foo,secretfile=/path/to/foo.secret

With an old kernel, you will always be able to mount the root of the cephfs fs.

  # mount.ceph mds1,mds2,mds3:/ /foo -v -o 
name=foo,secretfile=/path/to/foo.secret

if your client runs a not so old kernel you can do this :

1/ you need to give an access to the specific path like :

  # ceph auth get-or-create client.bar mon "allow r" osd "allow rw 
pool=cephfs_data" mds "allow rw path=/bar"

or, if the client.bar already exist :

  # ceph auth caps client.bar mon "allow r" osd "allow rw pool=cephfs_data" mds 
"allow rw path=/bar"

[client.bar]
key = XXX
caps mds = "allow rw path=/bar"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data"

2/ you give rw access only on the subdirectory :

  # ceph fs authorize cephfs client.bar /bar rw

Then you get the secret key and mount :

  # ceph --cluster container auth get-key client.bar > bar.secret
  # mount.ceph mds1,mds2,mds3:/bar /bar -v -o 
name=bar,secretfile=/path/to/bar.secret

if you try to mount the cephfs root, you should get an access denied

  # mount.ceph mds1,mds2,mds3:/ /bar -v -o 
name=bar,secretfile=/path/to/bar.secret


In the case you want to increase the security, you might give a look to 
namespace and file layout

http://docs.ceph.com/docs/master/cephfs/file-layouts/

I don't have given a look at yet but looks like really interesting !


> 
> Thanks,
> Shaw
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] zone, zonegroup and resharding bucket on luminous

2017-10-03 Thread Yoann Moulin

Hello,

>> I'm doing some tests on the radosgw on luminous (12.2.1), I have a few 
>> questions.
>>
>> In the documentation[1], there is a reference to "radosgw-admin region get" 
>> but it seems not to be available anymore.
>> It should be "radosgw-admin zonegroup get" I guess.
>>
>> 1. http://docs.ceph.com/docs/luminous/install/install-ceph-gateway/
>>
>> I have installed my luminous cluster with ceph-ansible playbook.
>>
>> but when I try to manipulate zonegroup or zone, I have this
>>
>>> # radosgw-admin zonegroup get
>>> failed to init zonegroup: (2) No such file or directory
> 
> try with --rgw-zonegroup=default
> 
>>> # radosgw-admin  zone get
>>> unable to initialize zone: (2) No such file or directory
>
> try with --rgw-zone=default
> 
>> I guessed it's because I don't have a realm set and not default zone and 
>> zonegroup ?
> 
> The default zone and zonegroup are  part of the realm so without a
> realm you cannot set them as defaults.
> This means you have to specifiy --rgw-zonegroup=default and --rgw-zone=default
>  I am guessing our documentation needs updating :(
> I think we can improve our behavior and make those command works
> without a realm , i.e return the default zonegroup and zone. I will
> open a tracker issue for that.

a bug seems to be already open :

http://tracker.ceph.com/issues/21583

>> Is that the default behaviour not to create default realm on a fresh radosgw 
>> ? Or is it a side effect of ceph-ansible installation ?
>>
> It is the default behavior, there is no default realm.
> 
>> I have a bucket that referred to a zonegroup but without realm. Can I create 
>> a default realm ? Is that safe for the bucket that has already been
>> uploaded ?
>>
> Yes You can create a realm and add the zonegroup to it.
> Don't forgot to do "radosgw-admin period update --commit" to commit the 
> changes.

I did that :

# radosgw-admin realm create --rgw-realm=default --default
{
"id": "b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a",
"name": "default",
"current_period": "e7bfcb5a-829b-418f-ae26-d6573a5cc8b9",
"epoch": 2
}

# radosgw-admin zonegroup modify 
--realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a --rgw-zonegroup=default 
--default

# radosgw-admin zone modify --realm-id=b5cc8a8e-bd96-4b19-8cdd-e87a58ed518a 
--rgw-zone=default --default

# radosgw-admin period update --commit

and it works now, I can edit zone and zonegroup :)

>> On the "default" zonegroup (which is not set as default), the  
>> "bucket_index_max_shards" is set to "0", can I modify it without reaml ?
>>
> I just updated this section in this pr: 
> https://github.com/ceph/ceph/pull/18063

as discuss on irc, I did that but found on a bug :

# radosgw-admin bucket reshard process --bucket image-net --num-shards=150

=> http://tracker.ceph.com/issues/21619

Thanks,

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin


>> In cases like this you also want to set RADOS namespaces for each tenant’s 
>> directory in the CephFS layout and give them OSD access to only that
>> namespace. That will prevent malicious users from tampering with the raw 
>> RADOS objects of other users.
> 
> You mean by doing something like :
> 
> ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data 
> namespace=foo" mds "allow rw path=/foo" ?
> 
> [client.foo]
>         key = [snip]
>         caps mds = "allow rw path=/foo"
>         caps mon = "allow r"
>         caps osd = "allow rw pool=cephfs_data namespace=foo"
> 
> or you are referring also to :
> 
> http://docs.ceph.com/docs/master/cephfs/file-layouts/
> 
> Yes, both of those. The "auth caps" portion gives the client permission on 
> the OSD to access the namespace "foo". The file layouts place the
> CephFS file data into that namespace.

OK, I will give a look next week.

Thank you.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] zone, zonegroup and resharding bucket on luminous

2017-09-29 Thread Yoann Moulin

; }
> }
> ],
> "metadata_heap": "",
> "tier_config": [],
> "realm_id": ""
> }

> # radosgw-admin metadata get bucket:image-net
> {
> "key": "bucket:image-net",
> "ver": {
> "tag": "_2_RFnI5pKQV7XEc5s2euJJW",
> "ver": 1
> },
> "mtime": "2017-08-28 12:27:35.629882Z",
> "data": {
> "bucket": {
> "name": "image-net",
> "marker": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1",
> "bucket_id": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1",
> "tenant": "",
> "explicit_placement": {
> "data_pool": "",
> "data_extra_pool": "",
> "index_pool": ""
> }
> },
> "owner": "rgwadmin",
> "creation_time": "2017-08-28 12:27:33.492997Z",
> "linked": "true",
> "has_bucket_info": "false"
> }
> }

> # radosgw-admin metadata get 
> bucket.instance:image-net:69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1
> {
> "key": 
> "bucket.instance:image-net:69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1",
> "ver": {
> "tag": "_HJUIdLuc8HJdxWhortpLiE7",
> "ver": 3
> },
> "mtime": "2017-09-26 14:14:47.749267Z",
> "data": {
> "bucket_info": {
> "bucket": {
> "name": "image-net",
> "marker": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1",
> "bucket_id": "69d2fd65-fcf9-461b-865f-3dbb053803c4.44353.1",
> "tenant": "",
> "explicit_placement": {
> "data_pool": "",
> "data_extra_pool": "",
> "index_pool": ""
> }
> },
> "creation_time": "2017-08-28 12:27:33.492997Z",
> "owner": "rgwadmin",
> "flags": 0,
> "zonegroup": "43d23097-56b9-48a6-ad52-de42341be4bd",
> "placement_rule": "default-placement",
> "has_instance_obj": "true",
> "quota": {
> "enabled": false,
> "check_on_raw": false,
> "max_size": -1,
> "max_size_kb": 0,
> "max_objects": -1
> },
> "num_shards": 0,
> "bi_shard_hash_type": 0,
> "requester_pays": "false",
> "has_website": "false",
> "swift_versioning": "false",
> "swift_ver_location": "",
> "index_type": 0,
> "mdsearch_config": [],
> "reshard_status": 0,
> "new_bucket_instance_id": ""
> },
> "attrs": [
> {
> "key": "user.rgw.acl",
> "val": 
> "AgKdAwIdCHJnd2FkbWluDQAAAFJhZG9zZ3cgQWRtaW4EA3QBAQgAAAByZ3dhZG1pbg8BCHJnd2FkbWluBQNBAgIEAAgAAAByZ3dhZG1pbgAAAgIEDw0AAABSYWRvc2d3IEFkbWluAA=="
> },
> {
> "key": "user.rgw.idtag",
> "val": ""
> }
> ]
> }
> }




-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin

Hi,

>>>>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96
>>>>>
>>>>> What is exactly an older kernel client ? 4.4 is old ?
>>>>>
>>>>> See
>>>>> http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version
>>>>>
>>>>> If you're on Ubuntu Xenial I would advise to use
>>>>> "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel.
>>>>
>>>> OK, but I still cannot set caps without read access to "/" on cephfs 
>>>> volume, is there something else I must do ?
>>>>
>>>> # ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
>>>> pool=cephfs_data" mds "allow rw path=/foo"
>>>> Error EINVAL: key for client.foo exists but cap mds does not match
>>>>
>>>> # ceph fs authorize cephfs client.foo /foo rw
>>>> Error EINVAL: key for client.foo exists but cap mds does not match
>>>
>>> Use "ceph auth list" to check the current caps for the client. With ceph
>>> auth caps (note, _not_ get-or-create) you can update the caps:
>>>
>>> ceph auth caps client.foo mon "allow r" osd "allow rw
>>> pool=cephfs_data" mds "allow rw path=/foo"
>>>
>>> The command should return "updated caps for client.foo"
>> 
>> oops, you're right I must use "ceph auth caps" and not "ceph auth 
>> get-or-create" 
>>
>> # ceph auth caps client.foo mon "allow r" osd "allow rw 
>> pool=cephfs_data" mds "allow rw path=/foo"
>> updated caps for client.foo
>
> In cases like this you also want to set RADOS namespaces for each tenant’s 
> directory in the CephFS layout and give them OSD access to only that
> namespace. That will prevent malicious users from tampering with the raw 
> RADOS objects of other users.

You mean by doing something like :

ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data 
namespace=foo" mds "allow rw path=/foo" ?

[client.foo]
key = [snip]
caps mds = "allow rw path=/foo"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data namespace=foo"

or you are referring also to :

http://docs.ceph.com/docs/master/cephfs/file-layouts/

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin


>>>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96
>>>>
>>>> What is exactly an older kernel client ? 4.4 is old ?
>>>
>>> See
>>> http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version
>>>
>>> If you're on Ubuntu Xenial I would advise to use
>>> "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel.
>>
>> OK, but I still cannot set caps without read access to "/" on cephfs volume, 
>> is there something else I must do ?
>>
>> # ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
>> pool=cephfs_data" mds "allow rw path=/foo"
>> Error EINVAL: key for client.foo exists but cap mds does not match
>>
>> # ceph fs authorize cephfs client.foo /foo rw
>> Error EINVAL: key for client.foo exists but cap mds does not match
> 
> Use "ceph auth list" to check the current caps for the client. With ceph
> auth caps (note, _not_ get-or-create) you can update the caps:
> 
> ceph auth caps client.foo mon "allow r" osd "allow rw
> pool=cephfs_data" mds "allow rw path=/foo"
> 
> The command should return "updated caps for client.foo"

oops, you're right I must use "ceph auth caps" and not "ceph auth get-or-create"

so finally I did that :

# ceph auth caps client.foo mon "allow r" osd "allow rw pool=cephfs_data" mds 
"allow rw path=/foo"
updated caps for client.foo

# ceph fs authorize cephfs client.foo /foo rw
[client.foo]
key = [snip]

On the client :

# uname -a
Linux ntxvm006 4.10.0-33-generic #37~16.04.1-Ubuntu SMP Fri Aug 11 14:07:24 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux

# mount.ceph iccluster041,iccluster042,iccluster054:/ /mnt -v -o 
name=foo,secret=[snip]
parsing options: name=foo,secret=[snip]
mount error 13 = Permission denied

# mount.ceph iccluster041,iccluster042,iccluster054:/foo /mnt -v -o 
name=foo,secret=[snip]
parsing options: name=foo,secret=[snip]

# df /mnt
Filesystem1K-blocks Used   Available Use% 
Mounted on
10.90.38.17,10.90.38.18,10.90.39.5:/foo 70324469760 26267648 70298202112   1% 
/mnt

It seems to work as I want.

Thanks a lot !

Cheers,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin


>> Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96
>>
>> What is exactly an older kernel client ? 4.4 is old ?
> 
> See
> http://docs.ceph.com/docs/master/cephfs/best-practices/#which-kernel-version
> 
> If you're on Ubuntu Xenial I would advise to use
> "linux-generic-hwe-16.04". Currently gives you 4.10.0-* kernel.

OK, but I still cannot set caps without read access to "/" on cephfs volume, is 
there something else I must do ?

# ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
pool=cephfs_data" mds "allow rw path=/foo"
Error EINVAL: key for client.foo exists but cap mds does not match

# ceph fs authorize cephfs client.foo /foo rw
Error EINVAL: key for client.foo exists but cap mds does not match

Thanks,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin

>>>> We are working on a POC with containers (kubernetes) and cephfs (for 
>>>> permanent storage).
>>>>
>>>> The main idea is to give to a user access to a subdirectory of the 
>>>> cephfs but be sure he won't be able to access to the rest of the 
>>>> storage. As k8s works, the user will have access to the yml file 
>>>> where the cephfs mount point is defined. He will be able to change 
>>>> the subdirectory mounted inside the container (and set it to /). And 
>>>> inside the container, the user is root…
>>>>
>>>> So if even the user doesn't have access to the secret, he will be 
>>>> able to mount the whole cephfs volume with read access.
>>>>
>>>> Is there a possibility to have "root_squash" option on cephfs volume 
>>>> for a specific client.user + secret?
>>>>
>>>> Is it possible to allow a specific user to mount only /bla and 
>>>> disallow to mount the cephfs root "/"?
>>>>
>>>> Or is there another way to do that?
>>>
>>> Maybe this will get you started with the permissions for only this fs 
>>> path /smb
>>>
>>> sudo ceph auth get-or-create client.cephfs.smb mon 'allow r' mds 
>>> 'allow r, allow rw path=/smb' osd 'allow rwx pool=fs_meta,allow rwx 
>>> pool=fs_data'
>> 
>> What I currently do is :
>> 
>> mkdir /cephfs/foo
>> chown nobody:foogrp /cephfs/foo
>> chmod 770 /cephfs/foo
>> ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
>> pool=cephfs_data" mds "allow r, allow rw path=/foo"
>> ceph fs authorize cephfs client.foo / r /foo rw
>> 
>> so I have this for client.foo
>> 
>> [client.foo]
>>  key = [secret]
>>  caps mds = "allow r, allow rw path=/foo"
>>  caps mon = "allow r"
>>  caps osd = "allow rw pool=cephfs_data"
>> 
>> With this, the user foo is able to mount the root of the cephfs and read 
>> everything, of course, he cannot modify but my problem here is he is 
>> still able to have read access to everything with uid=0.
> 
> I think that is because of the older kernel client, like mentioned here?>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39734.html

Kernels on client is 4.4.0-93 and on ceph node are 4.4.0-96

What is exactly an older kernel client ? 4.4 is old ?

if I remove "/ r" in the "auth caps" or "fs authorize" :

# ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
pool=cephfs_data" mds "allow rw path=/foo"
Error EINVAL: key for client.foo exists but cap mds does not match

# ceph fs authorize cephfs client.foo /foo rw
Error EINVAL: key for client.foo exists but cap mds does not match

# ceph fs authorize cephfs client.foo / r /foo rw
[client.foo]
key = [secret]

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin


>> We are working on a POC with containers (kubernetes) and cephfs (for 
>> permanent storage).
>> 
>> The main idea is to give to a user access to a subdirectory of the 
>> cephfs but be sure he won't be able to access to the rest of the 
>> storage. As k8s works, the user will have access to the yml file where 
>> the cephfs mount point is defined. He will be able to change the 
>> subdirectory mounted inside the container (and set it to /). And inside 
>> the container, the user is root…
>> 
>> So if even the user doesn't have access to the secret, he will be able 
>> to mount the whole cephfs volume with read access.
>> 
>> Is there a possibility to have "root_squash" option on cephfs volume for 
>> a specific client.user + secret?
>> 
>> Is it possible to allow a specific user to mount only /bla and disallow 
>> to mount the cephfs root "/"?
>> 
>> Or is there another way to do that?
>
> Maybe this will get you started with the permissions for only this fs
> path /smb
>
> sudo ceph auth get-or-create client.cephfs.smb mon 'allow r' mds 'allow
> r, allow rw path=/smb' osd 'allow rwx pool=fs_meta,allow rwx
> pool=fs_data'

What I currently do is :

mkdir /cephfs/foo
chown nobody:foogrp /cephfs/foo
chmod 770 /cephfs/foo
ceph auth get-or-create client.foo mon "allow r" osd "allow rw 
pool=cephfs_data" mds "allow r, allow rw path=/foo"
ceph fs authorize cephfs client.foo / r /foo rw

so I have this for client.foo

[client.foo]
key = [secret]
caps mds = "allow r, allow rw path=/foo"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_data"

With this, the user foo is able to mount the root of the cephfs and read 
everything, of course, he cannot modify but my problem here is he is
still able to have read access to everything with uid=0.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cephfs : security questions?

2017-09-29 Thread Yoann Moulin

Hello,

We are working on a POC with containers (kubernetes) and cephfs (for permanent 
storage).

The main idea is to give to a user access to a subdirectory of the cephfs but 
be sure he won't be able to access to the rest of the storage. As
k8s works, the user will have access to the yml file where the cephfs mount 
point is defined. He will be able to change the subdirectory mounted
inside the container (and set it to /). And inside the container, the user is 
root…

So if even the user doesn't have access to the secret, he will be able to mount 
the whole cephfs volume with read access.

Is there a possibility to have "root_squash" option on cephfs volume for a 
specific client.user + secret?

Is it possible to allow a specific user to mount only /bla and disallow to 
mount the cephfs root "/"?

Or is there another way to do that?

Thanks,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Minimum requirements to mount luminous cephfs ?

2017-09-27 Thread Yoann Moulin

Le 27/09/2017 à 15:15, David Turner a écrit :
> You can also use ceph-fuse instead of the kernel driver to mount cephfs. It 
> supports all of the luminous features.

OK thanks, I will try this after, I need to be able to mount the cephfs 
directly into containers, I don't know what will the best way to do it
so if I have multiple solutions, that will be great.

Thanks,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Minimum requirements to mount luminous cephfs ?

2017-09-27 Thread Yoann Moulin

Hello,

> Try to work with the tunables:
> 
> $ *ceph osd crush show-tunables*
> {
>     "choose_local_tries": 0,
>     "choose_local_fallback_tries": 0,
>     "choose_total_tries": 50,
>     "chooseleaf_descend_once": 1,
>     "chooseleaf_vary_r": 1,
>     "chooseleaf_stable": 0,
>     "straw_calc_version": 1,
>     "allowed_bucket_algs": 54,
>     "profile": "hammer",
>     "optimal_tunables": 0,
>     "legacy_tunables": 0,
>     "minimum_required_version": "firefly",
>     "require_feature_tunables": 1,
>     "require_feature_tunables2": 1,
>     "has_v2_rules": 0,
>     "require_feature_tunables3": 1,
>     "has_v3_rules": 0,
>     "has_v4_buckets": 0,
>     "require_feature_tunables5": 0,
>     "has_v5_rules": 0
> }
> 
> try to 'disable' the '*require_feature_tunables5*', with that I think you 
> should be ok, maybe there's another way, but that works for me. One
> way to change it, is to comment out in the crushmap the option "*tunable 
> chooseleaf_stable 1*" and inject the crushmap again in the cluster (of
> course that would produce on a lot of data moving on the pgs)

Thanks a lot, I removed the line "tunable chooseleaf_stable 1" from the 
crushmap and it works now !

root@iccluster013:~# df -h /mnt/
FilesystemSize  Used Avail Use% Mounted on
10.90.38.17,10.90.38.18,10.90.39.5:/   66T   19G   66T   1% /mnt

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Minimum requirements to mount luminous cephfs ?

2017-09-27 Thread Yoann Moulin

Hello,

I try to mount a cephfs filesystem from fresh luminous cluster.

With the latest kernel 4.13.3, it works

> $ sudo mount.ceph 
> iccluster041.iccluster,iccluster042.iccluster,iccluster054.iccluster:/ /mnt 
> -v -o name=container001,secretfile=/tmp/secret
> parsing options: name=container001,secretfile=/tmp/secret

> $ df -h /mnt
> FilesystemSize  Used Avail Use% Mounted on
> 10.90.38.17,10.90.38.18,10.90.39.5:/   66T   19G   66T   1% /mnt


> root@iccluster054:~# ceph auth get client.container001
> exported keyring for client.container001
> [client.container001]
>   key = 
>   caps mds = "allow rw"
>   caps mon = "allow r"
>   caps osd = "allow rw pool=cephfs_data"

> root@iccluster05:~#:/var/log# ceph --cluster container fs authorize cephfs 
> client.container001 / rw
> [client.container001]
>   key = 

With the latest Ubuntu 16.04 LTS Kernel and ceph-common 12.2.0, I'm not able to 
mount it

> Linux iccluster013 4.4.0-96-generic #119~14.04.1-Ubuntu SMP Wed Sep 13 
> 08:40:48 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> ii  ceph-common  12.2.0-1trusty   
>   amd64common utilities to mount and interact with a ceph storage 
> cluster

> root@iccluster013:~# mount.ceph  iccluster041,iccluster042,iccluster054:/ 
> /mnt -v -o name=container001,secretfile=/tmp/secret
> parsing options: name=container001,secretfile=/tmp/secret
> mount error 110 = Connection timed out

here the dmesg :

> [  417.528621] Key type ceph registered
> [  417.528996] libceph: loaded (mon/osd proto 15/24)
> [  417.540534] FS-Cache: Netfs 'ceph' registered for caching
> [  417.540546] ceph: loaded (mds proto 32)
> [...]
> [ 2596.609885] libceph: mon1 10.90.38.18:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 400
> [ 2596.626797] libceph: mon1 10.90.38.18:6789 missing required protocol 
> features
> [ 2606.960704] libceph: mon0 10.90.38.17:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 400
> [ 2606.977621] libceph: mon0 10.90.38.17:6789 missing required protocol 
> features
> [ 2616.944998] libceph: mon0 10.90.38.17:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 400
> [ 2616.961917] libceph: mon0 10.90.38.17:6789 missing required protocol 
> features
> [ 2626.961329] libceph: mon0 10.90.38.17:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 400
> [ 2626.978290] libceph: mon0 10.90.38.17:6789 missing required protocol 
> features
> [ 2636.945765] libceph: mon0 10.90.38.17:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 400
> [ 2636.962677] libceph: mon0 10.90.38.17:6789 missing required protocol 
> features
> [ 2646.962255] libceph: mon1 10.90.38.18:6789 feature set mismatch, my 
> 107b84a842aca < server's 40107b84a842aca, missing 4000000
> [ 2646.979228] libceph: mon1 10.90.38.18:6789 missing required protocol 
> features

Is there specific option to set on the cephfs to be able to mount it on a 
kernel 4.4 ?

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Access to rbd with a user key

2017-09-26 Thread Yoann Moulin


>> ok, I don't know where I read the -o option to write the key but the file 
>> was empty I do a ">" and seems to work to list or create rbd now.
>>
>> and for what I have tested then, the good syntax is « mon 'profile rbd' osd 
>> 'profile rbd pool=rbd' »
>>
>>> In the case we give access to those rbd inside the container, how I can be 
>>> sure users in each container do not have access to others rbd ? Is
>>> the namespace good to isolate each user ?
>>
>> The question about namespace is still open, if I have a namespace in the osd 
>> caps, I can't create rbd volume. How I can isolate each client to
>> only his own volumes ?
> 
> Unfortunately, RBD doesn't currently support namespaces, but it's on
> our backlog.

So if I want to separate data between each container, I need to create a pool 
per user (one user can have multiple containers).

I'm gonna give a look to cephfs, it seems possible to allow access only to a 
subdirectory per user, could you confirm it ?

Thanks,

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Access to rbd with a user key

2017-09-26 Thread Yoann Moulin

Hello,

> I try to give access to a rbd to a client on a fresh Luminous cluster
> 
> http://docs.ceph.com/docs/luminous/rados/operations/user-management/
> 
> first of all, I'd like to know the exact syntax for auth caps
> 
> the result of "ceph auth ls" give this :
> 
>> osd.9
>>  key: AQDjAsVZ+nI7NBAA14X9U5Xjunlk/9ovTht3Og==
>>  caps: [mgr] allow profile osd
>>  caps: [mon] allow profile osd
>>  caps: [osd] allow *
> 
> but in the documentation, it writes :
> 
>> osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]'
> 
> Does the "allow" needed before "profile" ? it's not clear
> 
> If I create a user like this :
> 
>> # ceph --cluster container auth get-or-create client.container001 \
>>  mon 'allow profile rbd' \
>>  osd 'allow profile rbd \
>>  pool=rbd namespace=container001' \
>>  -o /etc/ceph/container.client.container001.keyring

ok, I don't know where I read the -o option to write the key but the file was 
empty I do a ">" and seems to work to list or create rbd now.

and for what I have tested then, the good syntax is « mon 'profile rbd' osd 
'profile rbd pool=rbd' »

> In the case we give access to those rbd inside the container, how I can be 
> sure users in each container do not have access to others rbd ? Is
> the namespace good to isolate each user ?

The question about namespace is still open, if I have a namespace in the osd 
caps, I can't create rbd volume. How I can isolate each client to
only his own volumes ?
Thanks for your help

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Access to rbd with a user key

2017-09-26 Thread Yoann Moulin

Hello,

I try to give access to a rbd to a client on a fresh Luminous cluster

http://docs.ceph.com/docs/luminous/rados/operations/user-management/

first of all, I'd like to know the exact syntax for auth caps

the result of "ceph auth ls" give this :

> osd.9
>   key: AQDjAsVZ+nI7NBAA14X9U5Xjunlk/9ovTht3Og==
>   caps: [mgr] allow profile osd
>   caps: [mon] allow profile osd
>   caps: [osd] allow *

but in the documentation, it writes :

> osd 'profile {name} [pool={pool-name} [namespace={namespace-name}]]'

Does the "allow" needed before "profile" ? it's not clear

If I create a user like this :

> # ceph --cluster container auth get-or-create client.container001 \
>   mon 'allow profile rbd' \
>   osd 'allow profile rbd \
>   pool=rbd namespace=container001' \
>   -o /etc/ceph/container.client.container001.keyring

Is this user able to create an rbd volume ?

> # rbd --cluster container  create --size 1024 rbd/container003 --id 
> client.container001 --keyring /etc/ceph/container.client.container001.keyring 
> 2017-09-26 09:54:10.158234 7fbda23270c0  0 librados: 
> client.client.container001 authentication error (22) Invalid argument
> rbd: couldn't connect to the cluster!

In that case client.client.container001 does not exist, I tried without 
"client." but failed as well with another error.

> # rbd --cluster container  create --size 1024 rbd/container003 --id 
> container001 --keyring /etc/ceph/container.client.container001.keyring 
> 2017-09-26 09:55:11.869745 7f10de6d30c0  0 librados: client.container001 
> authentication error (22) Invalid argument
> rbd: couldn't connect to the cluster!

it works if I create the rbd volume like :

> # rbd --cluster container  create --size 1024 rbd/container003

Then I can get rbd volume information with the admin key but not with the user 
key.

> # rbd --cluster container info rbd/container003  
> rbd image 'container003':
>   size 1024 MB in 256 objects
>   order 22 (4096 kB objects)
>   block_name_prefix: rbd_data.5f7c74b0dc51
>   format: 2
>   features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
>   flags: 
>   create_timestamp: Tue Sep 26 09:54:50 2017

> # rbd --cluster container info rbd/container003   --keyring 
> /etc/ceph/container.client.container001.keyring 
> 2017-09-26 09:58:29.864348 7f2fe60780c0  0 librados: client.admin 
> authentication error (22) Invalid argument
> rbd: couldn't connect to the cluster!

> # rbd --cluster container info rbd/container003   --keyring 
> /etc/ceph/container.client.container001.keyring  --id client.container001
> 2017-09-26 09:58:38.971827 7fcafa7aa0c0  0 librados: 
> client.client.container001 authentication error (22) Invalid argument
> rbd: couldn't connect to the cluster!

> # rbd --cluster container info rbd/container003   --keyring 
> /etc/ceph/container.client.container001.keyring  --id container001
> 2017-09-26 09:58:45.515253 7fbb0208c0c0  0 librados: client.container001 
> authentication error (22) Invalid argument
> rbd: couldn't connect to the cluster!

I might have missed something somewhere, but I don't know where.

Does the "rbd profile" give the capability to create rbd volumes to the user ? 
or it just gives the access to rbd volume previously create by
the admin ?

In the case we give access to those rbd inside the container, how I can be sure 
users in each container do not have access to others rbd ? Is
the namespace good to isolate each user ?

I haven't used a lot rbd before and never use client keys capabilities, it 
might a bit confuse for me.

Thanks for your help

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3cmd not working with luminous radosgw

2017-09-21 Thread Yoann Moulin

Hi Matt,

>>>> Does anyone have tested s3cmd or other tools to manage ACL on luminous 
>>>> radosGW ?
>>>
>>> Don't know about ACL, but s3cmd for other things works for me.  Version 
>>> 1.6.1
>>
>> Finally, I found out what happened, I had 2 issues. One, on s3cmd config 
>> file, radosgw with luminous does not support signature v2 anymore, only
>> v4 is supported, I had to add this to my .s3cfg file :
> 
> V4 is supported, but to the best of my knowledge, you can use sigv2 if 
> desired.

Indeed, it seems to work in sigv2 :)

>> The second was in the rgw section into ceph.conf file. The line "rgw dns 
>> name" was missing.
> 
> Depending on your setup, "rgw dns name" may be required, yes.

in my case, it seems to be mandatory

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] s3cmd not working with luminous radosgw

2017-09-21 Thread Yoann Moulin

Hello,

>> Does anyone have tested s3cmd or other tools to manage ACL on luminous 
>> radosGW ?
> 
> Don't know about ACL, but s3cmd for other things works for me.  Version 1.6.1

Finally, I found out what happened, I had 2 issues. One, on s3cmd config file, 
radosgw with luminous does not support signature v2 anymore, only
v4 is supported, I had to add this to my .s3cfg file :

The second was in the rgw section into ceph.conf file. The line "rgw dns name" 
was missing. I have deployed my cluster with ceph-ansible and it
seems that I need a new option in the all.ym file :

I have added it manually and now it works (ansible-playbook didn't add it, I 
must figure out why).

Thanks for you help

Best regards,

Yoann Moulin

>>> I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 
>>> 1.5M files) with rclone, I'm able to list/copy files with rclone but
>>> s3cmd does not work at all, it is just able to give the bucket list but I 
>>> can't list files neither update ACL.
>>>
>>> does anyone already test this ?
>>>
>>> root@iccluster012:~# rclone --version
>>> rclone v1.37
>>>
>>> root@iccluster012:~# s3cmd --version
>>> s3cmd version 2.0.0
>>>
>>>
>>> ### rclone ls files ###
>>>
>>> root@iccluster012:~# rclone ls testadmin:image-net/LICENSE
>>>  1589 LICENSE
>>> root@iccluster012:~#
>>>
>>> nginx (as revers proxy) log :
>>>
>>>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
>>>> HTTP/1.1" 200 0 "-" "rclone/v1.37"
>>>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET 
>>>> /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" 
>>>> "rclone/v1.37"
>>>
>>> rgw logs :
>>>
>>>> 2017-09-15 10:30:02.620266 7ff1f58f7700  1 == starting new request 
>>>> req=0x7ff1f58f11f0 =
>>>> 2017-09-15 10:30:02.622245 7ff1f58f7700  1 == req done 
>>>> req=0x7ff1f58f11f0 op status=0 http_status=200 ==
>>>> 2017-09-15 10:30:02.622324 7ff1f58f7700  1 civetweb: 0x56061584b000: 
>>>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
>>>> HTTP/1.0" 1 0 - rclone/v1.37
>>>> 2017-09-15 10:30:02.623361 7ff1f50f6700  1 == starting new request 
>>>> req=0x7ff1f50f01f0 =
>>>> 2017-09-15 10:30:02.689632 7ff1f50f6700  1 == req done 
>>>> req=0x7ff1f50f01f0 op status=0 http_status=200 ==
>>>> 2017-09-15 10:30:02.689719 7ff1f50f6700  1 civetweb: 0x56061585: 
>>>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET 
>>>> /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37
>>>
>>>
>>>
>>> ### s3cmds ls files ###
>>>
>>> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls 
>>> s3://image-net/LICENSE
>>> root@iccluster012:~#
>>>
>>> nginx (as revers proxy) log :
>>>
>>>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
>>>> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-"
>>>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
>>>> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE 
>>>> HTTP/1.1" 200 318 "-" "-"
>>>
>>> rgw logs :
>>>
>>>> 2017-09-15 10:30:04.295355 7ff1f48f5700  1 == starting new request 
>>>> req=0x7ff1f48ef1f0 =
>>>> 2017-09-15 10:30:04.295913 7ff1f48f5700  1 == req done 
>>>> req=0x7ff1f48ef1f0 op status=0 http_status=200 ==
>>>> 2017-09-15 10:30:04.295977 7ff1f48f5700  1 civetweb: 0x560615855000: 
>>>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location 
>>>> HTTP/1.0" 1 0 - -
>>>> 2017-09-15 10:30:04.299303 7ff1f40f4700  1 == starting new request 
>>>> req=0x7ff1f40ee1f0 =
>>>> 2017-09-15 10:30:04.300993 7ff1f40f4700  1 == req done 
>>>> req=0x7ff1f40ee1f0 op status=0 http_status=200 ==
>>>> 2017-09-15 10:30:04.301070 7ff1f40f4700  1 civetweb: 0x56061585a000: 
>>>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET 
>>>> /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - 
>>>
>>>
>>>
>>> ### s3cmd : list bucket ###
>>>
>>> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://
>>> 2017-08-

Re: [ceph-users] s3cmd not working with luminous radosgw

2017-09-19 Thread Yoann Moulin

Hello,

Does anyone have tested s3cmd or other tools to manage ACL on luminous radosGW ?

I have opened an issue on s3cmd too

https://github.com/s3tools/s3cmd/issues/919

Thanks for your help

Yoann

> I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 
> 1.5M files) with rclone, I'm able to list/copy files with rclone but
> s3cmd does not work at all, it is just able to give the bucket list but I 
> can't list files neither update ACL.
> 
> does anyone already test this ?
> 
> root@iccluster012:~# rclone --version
> rclone v1.37
> 
> root@iccluster012:~# s3cmd --version
> s3cmd version 2.0.0
> 
> 
> ### rclone ls files ###
> 
> root@iccluster012:~# rclone ls testadmin:image-net/LICENSE
>  1589 LICENSE
> root@iccluster012:~#
> 
> nginx (as revers proxy) log :
> 
>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
>> HTTP/1.1" 200 0 "-" "rclone/v1.37"
>> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET 
>> /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" 
>> "rclone/v1.37"
> 
> rgw logs :
> 
>> 2017-09-15 10:30:02.620266 7ff1f58f7700  1 == starting new request 
>> req=0x7ff1f58f11f0 =
>> 2017-09-15 10:30:02.622245 7ff1f58f7700  1 == req done 
>> req=0x7ff1f58f11f0 op status=0 http_status=200 ==
>> 2017-09-15 10:30:02.622324 7ff1f58f7700  1 civetweb: 0x56061584b000: 
>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
>> HTTP/1.0" 1 0 - rclone/v1.37
>> 2017-09-15 10:30:02.623361 7ff1f50f6700  1 == starting new request 
>> req=0x7ff1f50f01f0 =
>> 2017-09-15 10:30:02.689632 7ff1f50f6700  1 == req done 
>> req=0x7ff1f50f01f0 op status=0 http_status=200 ==
>> 2017-09-15 10:30:02.689719 7ff1f50f6700  1 civetweb: 0x56061585: 
>> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET 
>> /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37
> 
> 
> 
> ### s3cmds ls files ###
> 
> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls 
> s3://image-net/LICENSE
> root@iccluster012:~#
> 
> nginx (as revers proxy) log :
> 
>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
>> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-"
>> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
>> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE 
>> HTTP/1.1" 200 318 "-" "-"
> 
> rgw logs :
> 
>> 2017-09-15 10:30:04.295355 7ff1f48f5700  1 == starting new request 
>> req=0x7ff1f48ef1f0 =
>> 2017-09-15 10:30:04.295913 7ff1f48f5700  1 == req done 
>> req=0x7ff1f48ef1f0 op status=0 http_status=200 ==
>> 2017-09-15 10:30:04.295977 7ff1f48f5700  1 civetweb: 0x560615855000: 
>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location 
>> HTTP/1.0" 1 0 - -
>> 2017-09-15 10:30:04.299303 7ff1f40f4700  1 == starting new request 
>> req=0x7ff1f40ee1f0 =
>> 2017-09-15 10:30:04.300993 7ff1f40f4700  1 == req done 
>> req=0x7ff1f40ee1f0 op status=0 http_status=200 ==
>> 2017-09-15 10:30:04.301070 7ff1f40f4700  1 civetweb: 0x56061585a000: 
>> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET 
>> /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - 
> 
> 
> 
> ### s3cmd : list bucket ###
> 
> root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://
> 2017-08-28 12:27  s3://image-net
> root@iccluster012:~#
> 
> nginx (as revers proxy) log :
> 
>> ==> nginx/access.log <==
>> 10.90.37.13 - - [15/Sep/2017:10:36:10 +0200] "GET 
>> http://test.iccluster.epfl.ch/ HTTP/1.1" 200 318 "-" "-"
> 
> rgw logs :
> 
>> 2017-09-15 10:36:10.645354 7ff1f38f3700  1 == starting new request 
>> req=0x7ff1f38ed1f0 =
>> 2017-09-15 10:36:10.647419 7ff1f38f3700  1 == req done 
>> req=0x7ff1f38ed1f0 op status=0 http_status=200 ==
>> 2017-09-15 10:36:10.647488 7ff1f38f3700  1 civetweb: 0x56061585f000: 
>> 127.0.0.1 - - [15/Sep/2017:10:36:10 +0200] "GET / HTTP/1.0" 1 0 - -
> 
> 
> 
> ### rclone : list bucket ###
> 
> 
> root@iccluster012:~# rclone lsd testadmin:
>   -1 2017-08-28 12:27:33-1 image-net
> root@iccluster012:~#
> 
> nginx (as revers proxy) log :
> 
>> ==> nginx/access.log <==
>> 10.90.37.13 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.1" 200 318 "-" 
>> "rclone/v1.37"
> 
> rgw logs :
> 
>> ==> ceph/luminous-rgw-iccluster015.log <==
>> 2017-09-15 10:37:53.005424 7ff1f28f1700  1 == starting new request 
>> req=0x7ff1f28eb1f0 =
>> 2017-09-15 10:37:53.007192 7ff1f28f1700  1 == req done 
>> req=0x7ff1f28eb1f0 op status=0 http_status=200 ==
>> 2017-09-15 10:37:53.007282 7ff1f28f1700  1 civetweb: 0x56061586e000: 
>> 127.0.0.1 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.0" 1 0 - 
>> rclone/v1.37


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] s3cmd not working with luminous radosgw

2017-09-15 Thread Yoann Moulin

Hello,

I have a fresh luminous cluster in test and I made a copy of a bucket (4TB 1.5M 
files) with rclone, I'm able to list/copy files with rclone but
s3cmd does not work at all, it is just able to give the bucket list but I can't 
list files neither update ACL.

does anyone already test this ?

root@iccluster012:~# rclone --version
rclone v1.37

root@iccluster012:~# s3cmd --version
s3cmd version 2.0.0


### rclone ls files ###

root@iccluster012:~# rclone ls testadmin:image-net/LICENSE
 1589 LICENSE
root@iccluster012:~#

nginx (as revers proxy) log :

> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE 
> HTTP/1.1" 200 0 "-" "rclone/v1.37"
> 10.90.37.13 - - [15/Sep/2017:10:30:02 +0200] "GET 
> /image-net?delimiter=%2F=1024= HTTP/1.1" 200 779 "-" 
> "rclone/v1.37"

rgw logs :

> 2017-09-15 10:30:02.620266 7ff1f58f7700  1 == starting new request 
> req=0x7ff1f58f11f0 =
> 2017-09-15 10:30:02.622245 7ff1f58f7700  1 == req done req=0x7ff1f58f11f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:30:02.622324 7ff1f58f7700  1 civetweb: 0x56061584b000: 
> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "HEAD /image-net/LICENSE HTTP/1.0" 
> 1 0 - rclone/v1.37
> 2017-09-15 10:30:02.623361 7ff1f50f6700  1 == starting new request 
> req=0x7ff1f50f01f0 =
> 2017-09-15 10:30:02.689632 7ff1f50f6700  1 == req done req=0x7ff1f50f01f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:30:02.689719 7ff1f50f6700  1 civetweb: 0x56061585: 
> 127.0.0.1 - - [15/Sep/2017:10:30:02 +0200] "GET 
> /image-net?delimiter=%2F=1024= HTTP/1.0" 1 0 - rclone/v1.37



### s3cmds ls files ###

root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls 
s3://image-net/LICENSE
root@iccluster012:~#

nginx (as revers proxy) log :

> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> http://test.iccluster.epfl.ch/image-net/?location HTTP/1.1" 200 127 "-" "-"
> 10.90.37.13 - - [15/Sep/2017:10:30:04 +0200] "GET 
> http://image-net.test.iccluster.epfl.ch/?delimiter=%2F=LICENSE 
> HTTP/1.1" 200 318 "-" "-"

rgw logs :

> 2017-09-15 10:30:04.295355 7ff1f48f5700  1 == starting new request 
> req=0x7ff1f48ef1f0 =
> 2017-09-15 10:30:04.295913 7ff1f48f5700  1 == req done req=0x7ff1f48ef1f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:30:04.295977 7ff1f48f5700  1 civetweb: 0x560615855000: 
> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET /image-net/?location 
> HTTP/1.0" 1 0 - -
> 2017-09-15 10:30:04.299303 7ff1f40f4700  1 == starting new request 
> req=0x7ff1f40ee1f0 =
> 2017-09-15 10:30:04.300993 7ff1f40f4700  1 == req done req=0x7ff1f40ee1f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:30:04.301070 7ff1f40f4700  1 civetweb: 0x56061585a000: 
> 127.0.0.1 - - [15/Sep/2017:10:30:04 +0200] "GET 
> /?delimiter=%2F=LICENSE HTTP/1.0" 1 0 - 



### s3cmd : list bucket ###

root@iccluster012:~# s3cmd -v -c ~/.s3cfg-test-rgwadmin ls s3://
2017-08-28 12:27  s3://image-net
root@iccluster012:~#

nginx (as revers proxy) log :

> ==> nginx/access.log <==
> 10.90.37.13 - - [15/Sep/2017:10:36:10 +0200] "GET 
> http://test.iccluster.epfl.ch/ HTTP/1.1" 200 318 "-" "-"

rgw logs :

> 2017-09-15 10:36:10.645354 7ff1f38f3700  1 == starting new request 
> req=0x7ff1f38ed1f0 =
> 2017-09-15 10:36:10.647419 7ff1f38f3700  1 == req done req=0x7ff1f38ed1f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:36:10.647488 7ff1f38f3700  1 civetweb: 0x56061585f000: 
> 127.0.0.1 - - [15/Sep/2017:10:36:10 +0200] "GET / HTTP/1.0" 1 0 - -



### rclone : list bucket ###


root@iccluster012:~# rclone lsd testadmin:
  -1 2017-08-28 12:27:33-1 image-net
root@iccluster012:~#

nginx (as revers proxy) log :

> ==> nginx/access.log <==
> 10.90.37.13 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.1" 200 318 "-" 
> "rclone/v1.37"

rgw logs :

> ==> ceph/luminous-rgw-iccluster015.log <==
> 2017-09-15 10:37:53.005424 7ff1f28f1700  1 == starting new request 
> req=0x7ff1f28eb1f0 =
> 2017-09-15 10:37:53.007192 7ff1f28f1700  1 == req done req=0x7ff1f28eb1f0 
> op status=0 http_status=200 ==
> 2017-09-15 10:37:53.007282 7ff1f28f1700  1 civetweb: 0x56061586e000: 
> 127.0.0.1 - - [15/Sep/2017:10:37:53 +0200] "GET / HTTP/1.0" 1 0 - rclone/v1.37


Thanks for you help

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How to change the owner of a bucket

2017-02-14 Thread Yoann Moulin

Dear list,

I was looking on how to change the owner of a bucket. There is a lack of 
documentation on that point (even the man page is not clear), I found
how with the Help of Orit.

> radosgw-admin metadata get bucket:
> radosgw-admin bucket link --uid= --bucket= 
> --bucket-id=

this issue helped me : http://tracker.ceph.com/issues/14949

Also, in the radosgw-admin man page, unlink is described as "Remove a bucket", 
what does "remove" means in that case ? Delete ?

> Remove a bucket:
> $ radosgw-admin bucket unlink --bucket=foo

http://docs.ceph.com/docs/master/man/8/radosgw-admin/

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)

2016-12-19 Thread Yoann Moulin

Hello,

Finally, I found time to do some new benchmarks with the latest jewel release 
(10.2.5) on 4 nodes. Each node has 10 OSDs.

I ran 2 times "ceph tell osd.* bench" over 40 OSDs, here the average speed :

4.2.0-42-generic  97.45 MB/s
4.4.0-53-generic  55.73 MB/s
4.8.15-040815-generic 62.41 MB/s
4.9.0-040900-generic  60.88 MB/s

I have the same behaviour with at least 35 to 40% performance drop between 
kernel 4.2 and kernel > 4.4

I can do further benches if needed.

Yoann

Le 26/07/2016 à 09:09, Lomayani S. Laizer a écrit :
> Hello,
> do you have journal on disk too ?
> 
> Yes am having journal on same hard disk.
> 
> ok and could you do bench with kernel 4.2 ? just to see if you have better
> throughput. Thanks
> 
> In ubuntu 14 I was running 4.2 kernel. the throughput was the same around 
> 80-90MB/s per osd. I cant tell the difference because each test gives
> the speeds on same range. I did not test kernel 4.4 in ubuntu 14
> 
> 
> --
> Lomayani
> 
> On Tue, Jul 26, 2016 at 9:39 AM, Yoann Moulin <yoann.mou...@epfl.ch 
> <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> > Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are 
> similar.
> 
> do you have journal on disk too ?
> 
> > I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have 
> around
> > 80-90MB/s of OSD speeds in both operating systems
> 
> ok and could you do bench with kernel 4.2 ? just to see if you have better
> throughput. Thanks
> 
> > Only issue am observing now with ubuntu 16 is sometime osd fails on 
> rebooting
> > until i start them manually or adding starting commands in rc.local.
> 
> in my case, it's a test environment, so I don't have notice those 
> behaviours
> 
> --
> Yoann
> 
> > On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin <yoann.mou...@epfl.ch 
> <mailto:yoann.mou...@epfl.ch>
> > <mailto:yoann.mou...@epfl.ch <mailto:yoann.mou...@epfl.ch>>> wrote:
> >
> > Hello,
> >
> > (this is a repost, my previous message seems to be slipping under 
> the radar)
> >
> > Does anyone get a similar behaviour to the one described below ?
> >
> > I found a big performance drop between kernel 3.13.0-88 (default 
> kernel on
> > Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 
> (default kernel on
> > Ubuntu Xenial 16.04)
> >
> > - ceph version is Jewel (10.2.2).
> > - All tests have been done under Ubuntu 14.04 on
> > - Each cluster has 5 nodes strictly identical.
> > - Each node has 10 OSDs.
> > - Journals are on the disk.
> >
> > Kernel 4.4 has a drop of more than 50% compared to 4.2
> > Kernel 4.4 has a drop of 40% compared to 3.13
> >
> > details below :
> >
> > With the 3 kernel I have the same performance on disks :
> >
> > Raw benchmark:
> > dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> 
> average ~230MB/s
> > dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => 
> average ~220MB/s
> >
> > Filesystem mounted benchmark:
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => 
> average ~205MB/s
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => 
> average ~214MB/s
> > dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => 
> average ~190MB/s
> >
> > Ceph osd Benchmark:
> > Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average  
> ~81MB/s
> > Kernel 4.2.0-38-generic  : ceph tell osd.ID bench => average 
> ~109MB/s
> > Kernel 4.4.0-24-generic  : ceph tell osd.ID bench => average  
> ~50MB/s
> >
> > I did new benchmarks then on 3 new fresh clusters.
> >
> > - Each cluster has 3 nodes strictly identical.
> > - Each node has 10 OSDs.
> > - Journals are on the disk.
> >
> > bench5 : Ubuntu 14.04 / Ceph Infernalis
> > bench6 : Ubuntu 14.04 / Ceph Jewel
> > bench7 : Ubuntu 16.04 / Ceph jewel
> >
> > this is the average of 2 runs of "ceph tell osd.* bench" on each 
> cluster (2 x 30
> > OSDs)
> >
> > bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
> > bench6 / 14.04 / Jewel  /

Re: [ceph-users] stalls caused by scrub on jewel

2016-12-01 Thread Yoann Moulin

Hello,

> We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0 
> and is no more capable to scrub neither deep-scrub.
> 
> [1] http://tracker.ceph.com/issues/17859
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
> [3] https://github.com/ceph/ceph/pull/11898
> 
> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub 
> until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
> 
> Can we have this fix any sooner ?

As far as I know about that bug, it appears if you have big PGs, a workaround 
could be increasing the pg_num of the pool that has the biggest PGs.

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] index-sharding on existing bucket ?

2016-11-17 Thread Yoann Moulin

Hello,

is that possible to shard the index of existing buckets ?

I have more than 100TB of data in a couples of buckets, I'd like to avoid to re 
upload everythings.

Thanks for your help,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] How files are split into PGs ?

2016-11-11 Thread Yoann Moulin

Hello,

I have a 1GB file and 2 pools, one replicated and one EC 8+2, and I want to 
make a copy of this file through the radosgw with s3.
I'd like to know how this file will be split into PGs in both pools.

Some details for my use case :

12 hosts
10 OSDs per Host
failure domain set to Host
PG=1024

If I push this file through ma radosgw, How I can find all replicats on the 
OSDs ?

And another question, for really small files, on an EC pool, files will be 
replicated with k+m replica, won't they ?

Thanks

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw - http status 400 while creating a bucket

2016-11-09 Thread Yoann Moulin

Hello,

> many thanks for your help. I've tried setting the zone to master, followed by 
> the period update --commit command. This is what i've had:

maybe it's related to this issue :

http://tracker.ceph.com/issues/16839 (fixe in Jewel 10.2.3)

or this one :

http://tracker.ceph.com/issues/17239

the "id" of the zonegroup shouldn't be "default" but an uuid afaik

Best regards

Yoann Moulin

> root@arh-ibstorage1-ib:~# radosgw-admin zonegroup get --rgw-zonegroup=default
> {
> "id": "default",
> "name": "default",
> "api_name": "",
> "is_master": "true",  
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "default",
> "zones": [
> {
> "id": "default",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "5b41b1b2-0f92-463d-b582-07552f83e66c"
> }
> 
> 
> root@arh-ibstorage1-ib:~# radosgw-admin period update --commit
> cannot commit period: period does not have a master zone of a master zonegroup
> failed to commit period: (22) Invalid argument
> 
> 
> root@arh-ibstorage1-ib:~# radosgw-admin zonegroup get --rgw-zonegroup=default
> {
> "id": "default",
> "name": "default",
> "api_name": "",
> "is_master": "true",  
> "endpoints": [],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "",
> "zones": [
> {
> "id": "default",
> "name": "default",
> "endpoints": [],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": ""
> }
> 
> 
> 
> 
> 
> The strange thing as you can see, following the "radosgw-admin period update 
> --commit" command, the master_zone and the realm_id values reset to blank. 
> What could be causing this?
> 
> Here is my ceph infrastructure setup, perhaps it will help with finding the 
> issue?:
> 
> ceph osd and mon servers:
> arh-ibstorage1-ib (also radosgw server)
> arh-ibstorage2-ib (also radosgw server)
> arh-ibstorage3-ib
> 
> ceph mon server:
> arh-cloud13-ib
> 
> 
> 
> Thus, overall, i have 4 mon servers, 3 osd servers and 2 radosgw servers
> 
> Thanks
> 
> 
> 
> - Original Message -
>> From: "Yehuda Sadeh-Weinraub" <yeh...@redhat.com>
>> To: "Andrei Mikhailovsky" <and...@arhont.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
>> Sent: Wednesday, 9 November, 2016 17:12:30
>> Subject: Re: [ceph-users] radosgw - http status 400 while creating a bucket
> 
>> On Wed, Nov 9, 2016 at 1:30 AM, Andrei Mikhailovsky <and...@arhont.com> 
>> wrote:
>>> Hi Yehuda,
>>>
>>> just tried to run the command to set the master_zone to default followed by 
>>> the
>>> bucket create without doing the restart and I still have the same error on 
>>> the
>>> client:
>>>
>>> >> encoding="UTF-8"?>InvalidArgumentmy-new-bucket-31337tx00010-005822ebbd-9951ad8-default9951ad8-default-default
>>>
>>
>> After setting the master zone, try running:
>>
>> $ radosgw-admin period update --commit
>>
>> Yehuda
>>
>>>
>>> Andrei
>>>
>>> - Original Mess

Re: [ceph-users] rgw / s3website, MethodNotAllowed on Jewel 10.2.3

2016-10-26 Thread Yoann Moulin

Hello,

> I'm trying to get s3website working on one of our Rados Gateway 
> installations, and I'm having some problems finding out what needs to be
> done for this to work. It looks like this is a halfway secret feature, as I 
> can only find it briefly mentioned in the release notes for v10.0.4 - and
> nowhere in the documentation - so I've tried to wrap my head around this by 
> looking through the source code without much luck.
> 
> My cluster is running Jewel 10.2.3, and I've tried to enable the s3website 
> API specifically on the RGW-server. (But looking at the source
> code, it should be enabled by default)
>
> Using s3cmd --debug ws-create s3://acme.example.org, I get served with 405 
> Method Not Allowed
>
> DEBUG: Sending request method_string='PUT', uri='/?website', 
> headers={'x-amz-content-sha256': 
> '3fcf37205b114f03a910d11d74206358f1681381f0f9498b25aa1cc65e168937', 
> 'Authorization': 'AWS4-HMAC-SHA256 
> Credential=V4NZ37SLP3VOPR2BI5UW/20161026/US/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=4cbd6a7c26dc149fc8fb352dae2d42c27e9bdc254cecc467802941cfc0e200a2',
>  'x-amz-date': '20161026T094022Z'}, body=(159 bytes)
> DEBUG: Response: {'status': 405, 'headers': {'content-length': '195', 
> 'accept-ranges': 'bytes', 'server': 'Apache/2.4.6 (CentOS) 
> OpenSSL/1.0.1e-fips', 'connection': 'close', 'x-amz-request-id': 
> 'tx3-0058107a06-20d3274-default', 'date': 'Wed, 26 Oct 
> 2016 09:40:22 GMT', 'content-type': 'application/xml'}, 'reason': 'Method Not 
> Allowed', 'data': ' encoding="UTF-8"?>MethodNotAllowedtx3-0058107a06-20d3274-default20d3274-default-default'}
>  
> Has anyone have had any luck with this?

does apache send $host variable to the backend ?

something like "ProxyPreserveHost On"

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HELP ! Cluster unusable with lots of "hitsuicidetimeout"

2016-10-19 Thread Yoann Moulin

Hello,

>>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is 
>>> compose by 12 nodes, each nodes have 10 OSD with journal on disk.
>>>
>>> We have one rbd partition and a radosGW with 2 data pool, one replicated, 
>>> one EC (8+2)
>>>
>>> in attachment few details on our cluster.
>>>
>>> Currently, our cluster is not usable at all due to too much OSD 
>>> instability. OSDs daemon die randomly with "hit suicide timeout". 
>>> Yesterday, all
>>> of 120 OSDs died at least 12 time (max 74 time) with an average around 40 
>>> time
>>>
>>> here logs from ceph mon and from one OSD :
>>>
>>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB)
>>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB)
>>>
>>> We have stopped all clients i/o to see if the cluster get stable without 
>>> success, to avoid  endless rebalancing with OSD flapping, we had to
>>> "set noout" the cluster. For now we have no idea what's going on.
>>>
>>> Anyone can help us to understand what's happening ?
>>>
>>> thanks for your help
>>>
>> no specific ideas, but this somewhat sounds familiar.
>>
>> One thing first, you already stopped client traffic but to make sure your
>> cluster really becomes quiescent, stop all scrubs as well.
>> That's always a good idea in any recovery, overload situation.

this is what we did.

>> Have you verified CPU load (are those OSD processes busy), memory status,
>> etc?
>> How busy are the actual disks?

The CPU and memory seem to not be overloaded, with journal on disk maybe a 
little bit busy.

>> Sudden deaths like this often are the results of network changes,  like a
>> switch rebooting and loosing jumbo frame configuration or whatnot.

We manage all equipments of the cluster, none of them have reboot. We decided 
to reboot node by node yesterday but the switch is healthy.

In the log I found that the problem has started after I start to copy data on 
the RadosGW EC pool (8+2).

At the same time, we had 6 process reading on the rbd partition, three of those 
process was writing on a replicated pool through the RadosGW s3
of the cluster itself and one was writing on a EC pool through the RadosGW s3 
too, 2 other was not writing on the cluster.
Maybe that pressure may slow down enough the disk to create the suicide timeout 
of the OSD ?

but now, we have no more I/O on the cluster and as soon as I re enable scrub 
and  rebalancing, OSDs start to fail again...

> just an additional comment:
> 
> you can disable backfilling and recovery temporarily by setting the 
> 'nobackfill' and 'norecover' flags. It will reduce the backfilling traffic
> and may help the cluster and its OSD to recover. Afterwards you should set 
> the backfill traffic settings to the minimum (e.g. max_backfills = 1)
> and unset the flags to allow the cluster to perform the outstanding recovery 
> operation.
>
> As the others already pointed out, these actions might help to get the 
> cluster up and running again, but you need to find the actual reason for
> the problems.

This is exactly what I want

Thanks for the help !

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin

Hello,

>> We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose 
>> by 12 nodes, each nodes have 10 OSD with journal on disk.
>>
>> We have one rbd partition and a radosGW with 2 data pool, one replicated, 
>> one EC (8+2)
>>
>> in attachment few details on our cluster.
>>
>> Currently, our cluster is not usable at all due to too much OSD instability. 
>> OSDs daemon die randomly with "hit suicide timeout". Yesterday, all
>> of 120 OSDs died at least 12 time (max 74 time) with an average around 40 
>> time
>>
>> here logs from ceph mon and from one OSD :
>>
>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB)
> 
> Do you have an older log showing the start of the incident? The
> cluster was already down when this log started.

Here the log from Saturday, OSD 134 is the first which had error :

http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.134.log.4.bz2
http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.4.bz2
http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.4.bz2

>> http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB)
> 
> In this log the thread which is hanging is doing deep-scrub:
> 
> 2016-10-18 22:16:23.985462 7f12da4af700  0 log_channel(cluster) log
> [INF] : 39.54 deep-scrub starts
> 2016-10-18 22:16:39.008961 7f12e4cc4700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f12da4af700' had timed out after 15
> 2016-10-18 22:18:54.175912 7f12e34c1700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7f12da4af700' had suicide timed out after 150
> 
> So you can disable scrubbing completely with
> 
>   ceph osd set noscrub
>   ceph osd set nodeep-scrub
> 
> in case you are hitting some corner case with the scrubbing code.

Now the cluster seem to be healthy. but as soon as I re enable scrubbing and 
rebalancing OSD start to flap and the cluster switch to HEATH_ERR

cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4
  health HEALTH_WARN
 noout,noscrub,nodeep-scrub,sortbitwise flag(s) set
  monmap e1: 3 mons at
{iccluster002.iccluster.epfl.ch=10.90.37.3:6789/0,iccluster010.iccluster.epfl.ch=10.90.37.11:6789/0,iccluster018.iccluster.epfl.ch=10.90.37.19:6789/0}
 election epoch 64, quorum 0,1,2 
iccluster002.iccluster.epfl.ch,iccluster010.iccluster.epfl.ch,iccluster018.iccluster.epfl.ch
   fsmap e131: 1/1/1 up {0=iccluster022.iccluster.epfl.ch=up:active}, 2 
up:standby
  osdmap e72932: 144 osds: 144 up, 120 in
 flags noout,noscrub,nodeep-scrub,sortbitwise
   pgmap v4834810: 9408 pgs, 28 pools, 153 TB data, 75849 kobjects
 449 TB used, 203 TB / 653 TB avail
 9408 active+clean


>> We have stopped all clients i/o to see if the cluster get stable without 
>> success, to avoid  endless rebalancing with OSD flapping, we had to
>> "set noout" the cluster. For now we have no idea what's going on.
>>
>> Anyone can help us to understand what's happening ?
> 
> Is your network OK?

We have one 10G nic for the private network and one 10G nic for the public 
network. The network is far under loaded right now and there is no
error. We don't use jumbo frame.

> It will be useful to see the start of the incident to better
> understand what caused this situation.
>
> Also, maybe useful for you... you can increase the suicide timeout, e.g.:
> 
>osd op thread suicide timeout: 
> 
> If the cluster is just *slow* somehow, then increasing that might
> help. If there is something systematically broken, increasing would
> just postpone the inevitable.

Ok, I'm going to study this option with my colleagues

thanks

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] HELP ! Cluster unusable with lots of "hit suicide timeout"

2016-10-19 Thread Yoann Moulin

Dear List,

We have a cluster in Jewel 10.2.2 under ubuntu 16.04. The cluster is compose by 
12 nodes, each nodes have 10 OSD with journal on disk.

We have one rbd partition and a radosGW with 2 data pool, one replicated, one 
EC (8+2)

in attachment few details on our cluster.

Currently, our cluster is not usable at all due to too much OSD instability. 
OSDs daemon die randomly with "hit suicide timeout". Yesterday, all
of 120 OSDs died at least 12 time (max 74 time) with an average around 40 time

here logs from ceph mon and from one OSD :

http://icwww.epfl.ch/~ymoulin/ceph/cephprod.log.bz2 (6MB)
http://icwww.epfl.ch/~ymoulin/ceph/cephprod-osd.10.log.bz2 (6MB)

We have stopped all clients i/o to see if the cluster get stable without 
success, to avoid  endless rebalancing with OSD flapping, we had to
"set noout" the cluster. For now we have no idea what's going on.

Anyone can help us to understand what's happening ?

thanks for your help

-- 
Yoann Moulin
EPFL IC-IT
$ ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

$ uname -a
Linux icadmin004 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ ceph osd pool ls detail
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 4927 flags hashpspool stripe_width 0
	removed_snaps [1~3]
pool 3 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 258 flags hashpspool stripe_width 0
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 259 flags hashpspool stripe_width 0
pool 5 'default.rgw.data.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 260 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 6 'default.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 261 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 7 'default.rgw.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 262 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 8 'erasure.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 271 flags hashpspool stripe_width 0
pool 9 'erasure.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 272 flags hashpspool stripe_width 0
pool 11 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 276 flags hashpspool stripe_width 0
pool 12 'default.rgw.buckets.extra' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 277 flags hashpspool stripe_width 0
pool 14 'default.rgw.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 311 flags hashpspool stripe_width 0
pool 15 'default.rgw.users.keys' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 313 flags hashpspool stripe_width 0
pool 16 'default.rgw.meta' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 315 flags hashpspool stripe_width 0
pool 17 'default.rgw.users.swift' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 320 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 18 'default.rgw.users.email' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 322 owner 18446744073709551615 flags hashpspool stripe_width 0
pool 19 'default.rgw.usage' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 353 flags hashpspool stripe_width 0
pool 20 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 4918 flags hashpspool stripe_width 0
pool 26 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3549 flags hashpspool stripe_width 0
pool 27 '.rgw' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3551 flags hashpspool stripe_width 0
pool 28 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3552 flags hashpspool stripe_width 0
pool 29 '.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 3553 flags hashpspool stripe_width 0
pool 30 'test' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 4910 flags hashpspool stripe_width 0
pool 31 'data' replicated size 3 min_si

[ceph-users] Loop in radosgw-admin orphan find

2016-10-13 Thread Yoann Moulin

1 entries at orphan.scan.erasure.linked.19
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.43
> storing 1 entries at orphan.scan.erasure.linked.47
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.63
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.2
> storing 1 entries at orphan.scan.erasure.linked.5
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.19
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.43
> storing 1 entries at orphan.scan.erasure.linked.47
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.63
> storing 1 entries at orphan.scan.erasure.linked.2
> storing 1 entries at orphan.scan.erasure.linked.5
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.19
> storing 1 entries at orphan.s can.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.43
> storing 1 entries at orphan.scan.erasure.linked.47
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.63
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.2
> storing 1 entries at orphan.scan.erasure.linked.5
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.19
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.43
> storing 1 entries at orphan.scan.erasure.linked.47
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.63
> storing 1 entries at orphan.scan.erasure.linked.2
> storing 1 entries at orphan.scan.erasure.linked.5
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.19
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.43
> storing 1 entries at orphan.scan.erasure.linked.47
> storing 1 entries at orphan.scan.erasure.linked.56
> storing 1 entries at orphan.scan.erasure.linked.63
> storing 1 entries at orphan.scan.erasure.linked.9
> storing 1 entries at orphan.scan.erasure.linked.25
> storing 1 entries at orphan.scan.erasure.linked.40
> storing 1 entries at orphan.scan.erasure.linked.56

Thanks for your help

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph full cluster

2016-09-26 Thread Yoann Moulin

Hello,

> Yes, you are right!
> I've changed this for all pools, but not for last two!
> 
> pool 1 '.rgw.root' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
> rjenkins pg_num 8 pgp_num 8 last_change 27 owner
> 18446744073709551615 flags hashpspool strip
> e_width 0
> pool 2 'default.rgw.control' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 29 owner
> 18446744073709551615 flags hashps
> pool stripe_width 0
> pool 3 'default.rgw.data.root' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 31 owner
> 18446744073709551615 flags hash
> pspool stripe_width 0
> pool 4 'default.rgw.gc' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 33 owner
> 18446744073709551615 flags hashpspool
> stripe_width 0
> pool 5 'default.rgw.log' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 35 owner
> 18446744073709551615 flags hashpspool
> stripe_width 0
> pool 6 'default.rgw.users.uid' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 37 owner
> 18446744073709551615 flags hash
> pspool stripe_width 0
> pool 7 'default.rgw.users.keys' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 owner
> 18446744073709551615 flags has
> hpspool stripe_width 0
> pool 8 'default.rgw.meta' replicated size 2 min_size 2 crush_ruleset 0 
> object_hash rjenkins pg_num 8 pgp_num 8 last_change 41 owner
> 18446744073709551615 flags hashpspoo
> l stripe_width 0
> pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 
> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 43 flags
> hashpspool stripe_width 0
> pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 
> 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 45 flags
> hashpspool stripe_width 0

Be-careful, if you set size 2 and min_size 2, your cluster will be in 
HEALTH_ERR state if you loose only OSD, if you want to set "size 2" (which
is not recommended) you should set min_size to 1.

Best Regards.

Yoann Moulin

> On Mon, Sep 26, 2016 at 2:05 PM, Burkhard Linke 
> <burkhard.li...@computational.bio.uni-giessen.de
> <mailto:burkhard.li...@computational.bio.uni-giessen.de>> wrote:
> 
> Hi,
> 
> 
> On 09/26/2016 12:58 PM, Dmitriy Lock wrote:
>> Hello all!
>> I need some help with my Ceph cluster.
>> I've installed ceph cluster with two physical servers with osd /data 40G 
>> on each.
>> Here is ceph.conf:
>> [global]
>> fsid = 377174ff-f11f-48ec-ad8b-ff450d43391c
>> mon_initial_members = vm35, vm36
>> mon_host = 192.168.1.35,192.168.1.36
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>>
>> osd pool default size = 2  # Write an object 2 times.
>> osd pool default min size = 1 # Allow writing one copy in a degraded 
>> state.
>>
>> osd pool default pg num = 200
>> osd pool default pgp num = 200
>>
>> Right after creation it was HEALTH_OK, and i've started with filling it. 
>> I've wrote 40G data to cluster using Rados gateway, but cluster
>> uses all avaiable space and keep growing after i've added two another 
>> osd - 10G /data1 on each server.
>> Here is tree output:
>> # ceph osd tree
>> ID WEIGHT  TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY  
>> -1 0.09756 root default 
>> -2 0.04878 host vm35
>> 0 0.03899 osd.0  up  1.0  1.0  
>> 2 0.00980 osd.2  up  1.0  1.0  
>> -3 0.04878 host vm36
>> 1 0.03899 osd.1  up  1.0  1.0  
>> 3 0.00980 osd.3  up  1.0  1.0 
>>
>> and health:
>> root@vm35:/etc# ceph health
>> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
>> unclean; 15 pgs undersized; recovery 87176/300483 objects degraded
>> (29.012%); recovery 62272/300483 obj
>> ects misplaced (20.724%); 1 full osd(s); 2 near full osd(s); pool 
>> default.rgw.buckets.data has many more objects per pg than average (too
>> few pgs?)
>> root@vm35:/etc# ceph health detail
>> HEALTH_ERR 5 pgs backfill_toofull; 15 pgs degraded; 16 pgs stuck 
>> unclean; 15 pgs undersiz

Re: [ceph-users] RadosGW index-sharding on Jewel

2016-09-14 Thread Yoann Moulin

Hello,

> i curently setup my new testcluster (Jewel) and found out the index
> sharding configuration had changed?
> 
> i did so far:
> 1. radosgw-admin realm create --rgw-realm=default --default
> 2. radosgw-admin zonegroup get --rgw-zonegroup=default > zonegroup.json
> 3. chaned value "bucket_index_max_shards": 64
> 4. radosgw-admin zonegroup set --rgw-zonegroup=default < zonegroup.json
> 5. radosgw-admin region get --rgw-zonegroup=default > region.json
> 6. chaned value "bucket_index_max_shards": 64
> 7. radosgw-admin region set --rgw-region=default --rgw-zone=default
> --rgw-zonegroup=default < region.json

As far as I know, region and zonegroup are the same in jewel :

http://docs.ceph.com/docs/jewel/radosgw/multisite/

« Zonegroup: A zonegroup consists of multiple zones, this approximately 
corresponds to what used to be called as a region in pre Jewel releases
for federated deployments. There should be a master zonegroup that will handle 
changes to the system configuration. »

> but bukets are created with ot sharding:
>  rados -p default.rgw.buckets.index ls | grep $(radosgw-admin metadata
> get bucket:images-eu-v1 | jq .data.bucket.bucket_id| tr -d '"')

On that point, I don't know, I never configure index sharding

Best Regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW : troubleshoooting zone / zonegroup / period

2016-09-12 Thread Yoann Moulin

max_objects": -1
> },
> "num_shards": 0,
> "bi_shard_hash_type": 0,
> "requester_pays": "false",
> "has_website": "false",
> "swift_versioning": "false",
> "swift_ver_location": ""
> },
> "attrs": [
> {
> "key": "user.rgw.acl",
> "val":
> "AgKRAwIaCQAAAHJlcGxpY2F0ZQkAAAByZXBsaWNhdGUDA2sBAQkAAAByZXBsaWNhdGUPAQkAAAByZXBsaWNhdGUEAzoCAgQACQAAAHJlcGxpY2F0ZQAAAgIEDwkAAAByZXBsaWNhdGUAAA=="
> },
> {
> "key": "user.rgw.idtag",
> "val": ""
> },
> {
> "key": "user.rgw.manifest",
> "val": ""
> }
> ]
> }
> }

We try to fixe this by creating new zonegroup and zone with the good IDs, set 
as default and delete the other one but we fall back on the bug on
period update

3. Troubleshooting #2

Restart from scratch the process :

We stop all the radosgw daemon, delete the .rgw.root pool, start the radosgw, 
create the realm again

Then we decide to try to create the zonegroup and the zone from json we save 
with good IDs set

We have to be careful to change the realm id in the 2 json with the new one, if 
not it won't work.

After edition the 2 files again

default_zonegroup.json
default_zone.json

we can create the zonegroup and zone like that :

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < 
> default_zone.json

At this point, the new zonegroup and zone were successfully created but their 
IDs wasn't those in the json, during the set, the radosgw-admin
create a new IDs for both zonegroup and zone.

In this situation we are still not able to access to the data. We have to start 
again from scratch...

4. Troubleshooting #3

We decide to restart the process but leave the radosgw stopped, we have the 
intuition that may affect the behaviour by creation default zone and
zonegroup itself.

Finally we did that :

Stop all RadosGW !

Purge the .rgw.root pool

> rados purge .rgw.root --yes-i-really-really-mean-it

create a new realm id and set it as default

> radosgw-admin realm create --rgw-realm=default --default

Edit the 2 json files to change the realm id with the new one

> vim default_zone.json #change realm with the new one
> vim default_zonegroup.json #change realm with the new one

Create the zonegroup and the zone like that (the order is really important here 
!)

> radosgw-admin zonegroup set --rgw-zonegroup default < default_zonegroup.json
> radosgw-admin zone set --rgw-zonegroup default --rgw-zone default < 
> default_zone.json

Set zonegroup and zone as default

> radosgw-admin zonegroup default --rgw-zonegroup default
> radosgw-admin zone default --rgw-zone default

We can check if the zone and the zonegroup are good by doing this

> radosgw-admin zonegroup list
> radosgw-admin zonegroup get
> radosgw-admin zone list
> radosgw-admin zone get

We have to update the period (do not commit first and read if the data in the 
update are good)

> radosgw-admin period update

Then we can commit the period update to apply the configuration

> radosgw-admin period update --commit

We can now safely restart the radosgw !

-- 
Yoann Moulin
EPFL IC-IT




default_zone.json
Description: application/json


default_zonegroup.json
Description: application/json
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured

2016-09-06 Thread Yoann Moulin

Le 06/09/2016 à 11:13, Orit Wasserman a écrit :
> you can try:
> radosgw-admin zonegroup modify --zonegroup-id  --master=false

I try but I don't have any zonegroup with this ID listed, the zonegroup with 
this Id appear only in the zonegroup-map.

anyway I can do a zonegroup get --zonegroup-id 
4d982760-7853-4174-8c05-cec2ef148cf0

I might try to change the name of this zonegroup ? because I have 2 zone wiht 
the same name but with 2 different IDs

$ radosgw-admin zonegroup get --zonegroup-id 
4d982760-7853-4174-8c05-cec2ef148cf0
{
"id": "4d982760-7853-4174-8c05-cec2ef148cf0",
"name": "default",
"api_name": "",
"is_master": "false",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e",
"zones": [
{
"id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "custom-placement",
"tags": []
},
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a"
}

and the default :

$ radosgw-admin zonegroup get --zonegroup-id default
{
"id": "default",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "",
"zones": [
{
"id": "default",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
    "realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a"
}

$ radosgw-admin bucket list
2016-09-06 11:21:04.787391 7fb8a1f0b900  0 Error updating periodmap, multiple 
master zonegroups configured
2016-09-06 11:21:04.787407 7fb8a1f0b900  0 master zonegroup: 
4d982760-7853-4174-8c05-cec2ef148cf0 and  default
2016-09-06 11:21:04.787409 7fb8a1f0b900  0 ERROR: updating period map: (22) 
Invalid argument
2016-09-06 11:21:04.787424 7fb8a1f0b900  0 failed to add zonegroup to 
current_period: (22) Invalid argument
2016-09-06 11:21:04.787432 7fb8a1f0b900 -1 failed converting region to 
zonegroup : ret -22 (22) Invalid argument
couldn't init storage provider


> On Tue, Sep 6, 2016 at 11:08 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>> Hello Orit,
>>
>>> you have two (or more) zonegroups that are set as master.
>>
>> Yes I know, but I don't know how to fix this
>>
>>> First detect which zonegroup are the problematic
>>> get zonegroup list by running: radosgw-admin zonegroup list
>>
>> I only see one zonegroup :
>>
>> $ radosgw-admin zonegroup list
>> read_default_id : 0
>> {
>> "default_info": "default",
>> "zonegroups": [
>> "default"
>> ]
>> }
>>
>>> than on each zonegroup run:
>>> radosgw-admin zonegroup get --rgw-zonegroup 
>>> see in which is_master is true.
>>
>> $ radosgw-admin zonegroup get --rgw-zonegroup default
>> {
>> "id": "default",
>> "name": "default",
>> "api_name": "",
>> "is_master": "true",
>> "endpoints": [],
>> "hostnames": [],
>> "hostnames_s3website": [],
>> "master_zone": "",
>> "zones": [
>> {
>> "id": "default",
>> "name": "default",
>> "endpoints": [],

Re: [ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured

2016-09-06 Thread Yoann Moulin

Hello Orit,

> you have two (or more) zonegroups that are set as master.

Yes I know, but I don't know how to fix this

> First detect which zonegroup are the problematic
> get zonegroup list by running: radosgw-admin zonegroup list

I only see one zonegroup :

$ radosgw-admin zonegroup list
read_default_id : 0
{
"default_info": "default",
"zonegroups": [
"default"
]
}

> than on each zonegroup run:
> radosgw-admin zonegroup get --rgw-zonegroup 
> see in which is_master is true.

$ radosgw-admin zonegroup get --rgw-zonegroup default
{
"id": "default",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "",
"zones": [
{
"id": "default",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a"
}


> Now you need to clear the master flag for all zonegroups except one,
> this can be done by running:
> radsogw-admin zonegroup modify --rgw-zonegroup  --master=false

if you check in files in my previous mail in metadata_zonegroup-map.json and 
metadata_zonegroup.json, there is only one zonegroup with name
"default" but in metadata_zonegroup.json, the id is "default" and in 
metadata_zonegroup-map.json it is "4d982760-7853-4174-8c05-cec2ef148cf0"

so for the zonegroup with the name "default", I have 2 differents ID, I guess 
the problem is there

Thanks for your help

Best regards

Yoann Moulin

> On Tue, Sep 6, 2016 at 9:22 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>> Dear List,
>>
>> I have an issue with my radosGW.
>>
>> ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
>> Linux cluster002 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 
>> UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>> Ubuntu 16.04 LTS
>>
>>> $ ceph -s
>>> cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4
>>>  health HEALTH_OK
>>>  monmap e1: 3 mons at 
>>> {cluster002.localdomain=10.90.37.3:6789/0,cluster010.localdomain=10.90.37.11:6789/0,cluster018.localdomain=10.90.37.19:6789/0}
>>> election epoch 40, quorum 0,1,2 
>>> cluster002.localdomain,cluster010.localdomain,cluster018.localdomain
>>>   fsmap e47: 1/1/1 up {0=cluster006.localdomain=up:active}, 2 up:standby
>>>  osdmap e3784: 144 osds: 144 up, 120 in
>>> flags sortbitwise
>>>   pgmap v1146863: 7024 pgs, 26 pools, 71470 GB data, 41466 kobjects
>>> 209 TB used, 443 TB / 653 TB avail
>>> 7013 active+clean
>>>7 active+clean+scrubbing+deep
>>>4 active+clean+scrubbing
>>
>> Example of the error message I have :
>>
>>> $ radosgw-admin bucket list
>>> 2016-09-06 09:04:14.810198 7fcbb01d5900  0 Error updating periodmap, 
>>> multiple master zonegroups configured
>>> 2016-09-06 09:04:14.810213 7fcbb01d5900  0 master zonegroup: 
>>> 4d982760-7853-4174-8c05-cec2ef148cf0 and  default
>>> 2016-09-06 09:04:14.810215 7fcbb01d5900  0 ERROR: updating period map: (22) 
>>> Invalid argument
>>> 2016-09-06 09:04:14.810230 7fcbb01d5900  0 failed to add zonegroup to 
>>> current_period: (22) Invalid argument
>>> 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to 
>>> zonegroup : ret -22 (22) Invalid argument
>>
>> in attachment, you have the result of those commands :
>>
>>> $ radosgw-admin metadata zonegroup-map get > metadata_zonegroup-map.json
>>> $ radosgw-admin metadata zonegroup get > metadata_zonegroup.json
>>> $ radosgw-admin metadata zone get > metadata_zone.json
>>> $ radosgw-admin metadata region-map get > metadata_region-map.json
>>> $ radosgw-admin metadata region get >  metadata_region.json
>>> $ radosgw-admin zonegroup-map get > zonegroup-map.json
>>> $ radosgw-admin zonegroup get > zonegroup.json
>>> $ radosgw-admin zone get > zone.json
>>> $ radosgw-admin region-map get > region-map.json
>>> $ radosgw-admin region get > region.json
>>> $ radosgw-admin period get > period.json
>>> $ radosgw-admin period list > period_list.json
>>
>> I have 60TB of data in this RadosGW, can I fix this issue without having to 
>> repupload all those data ?
>>
>> Thanks for you help !
>>
>> Best regards
>>
>> --
>> Yoann Moulin
>> EPFL IC-IT
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW Error : Error updating periodmap, multiple master zonegroups configured

2016-09-06 Thread Yoann Moulin

Dear List,

I have an issue with my radosGW.

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
Linux cluster002 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 16.04 LTS

> $ ceph -s
> cluster f9dfd27f-c704-4d53-9aa0-4a23d655c7c4
>  health HEALTH_OK
>  monmap e1: 3 mons at 
> {cluster002.localdomain=10.90.37.3:6789/0,cluster010.localdomain=10.90.37.11:6789/0,cluster018.localdomain=10.90.37.19:6789/0}
> election epoch 40, quorum 0,1,2 
> cluster002.localdomain,cluster010.localdomain,cluster018.localdomain
>   fsmap e47: 1/1/1 up {0=cluster006.localdomain=up:active}, 2 up:standby
>  osdmap e3784: 144 osds: 144 up, 120 in
> flags sortbitwise
>   pgmap v1146863: 7024 pgs, 26 pools, 71470 GB data, 41466 kobjects
> 209 TB used, 443 TB / 653 TB avail
> 7013 active+clean
>7 active+clean+scrubbing+deep
>4 active+clean+scrubbing

Example of the error message I have :

> $ radosgw-admin bucket list
> 2016-09-06 09:04:14.810198 7fcbb01d5900  0 Error updating periodmap, multiple 
> master zonegroups configured 
> 2016-09-06 09:04:14.810213 7fcbb01d5900  0 master zonegroup: 
> 4d982760-7853-4174-8c05-cec2ef148cf0 and  default
> 2016-09-06 09:04:14.810215 7fcbb01d5900  0 ERROR: updating period map: (22) 
> Invalid argument
> 2016-09-06 09:04:14.810230 7fcbb01d5900  0 failed to add zonegroup to 
> current_period: (22) Invalid argument
> 2016-09-06 09:04:14.810238 7fcbb01d5900 -1 failed converting region to 
> zonegroup : ret -22 (22) Invalid argument

in attachment, you have the result of those commands :

> $ radosgw-admin metadata zonegroup-map get > metadata_zonegroup-map.json
> $ radosgw-admin metadata zonegroup get > metadata_zonegroup.json
> $ radosgw-admin metadata zone get > metadata_zone.json
> $ radosgw-admin metadata region-map get > metadata_region-map.json
> $ radosgw-admin metadata region get >  metadata_region.json 
> $ radosgw-admin zonegroup-map get > zonegroup-map.json
> $ radosgw-admin zonegroup get > zonegroup.json
> $ radosgw-admin zone get > zone.json
> $ radosgw-admin region-map get > region-map.json
> $ radosgw-admin region get > region.json
> $ radosgw-admin period get > period.json
> $ radosgw-admin period list > period_list.json

I have 60TB of data in this RadosGW, can I fix this issue without having to 
repupload all those data ?

Thanks for you help !

Best regards

-- 
Yoann Moulin
EPFL IC-IT


metadata_region-map.json
Description: application/json


metadata_zonegroup-map.json
Description: application/json


region.json
Description: application/json


region-map.json
Description: application/json


metadata_region.json
Description: application/json


zonegroup.json
Description: application/json


zone.json
Description: application/json


zonegroup-map.json
Description: application/json


metadata_zone.json
Description: application/json


metadata_zonegroup.json
Description: application/json


period_list.json
Description: application/json


period.json
Description: application/json
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW zonegroup id error

2016-09-05 Thread Yoann Moulin

Hello,

>>> I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I 
>>> don't
>>> know when this occured, but I think I did a wrong command during the
>>> manipulation of zones and regions. Now the ID of my zonegroup is "default"
>>> instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or
>>> regions anymore.
>>>
>>> Is that possible to change the ID of the zonegroup, I try to update the json
>>> then set the zonegroup but it doesn't work (certainly because it's not the 
>>> same
>>> ID...)
>>
>> if I create a new zonegroup then set as the default zonegroup, update the 
>> zonegroup-map, zone etc, then delete the zonegroup with the ID
>> "default" it should work ?
>>
> It should work. Do you have any existing data on the zone group?

There is only one zonegroup, so I guess yes.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW zonegroup id error

2016-09-02 Thread Yoann Moulin

Hello,

> I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I 
> don't
> know when this occured, but I think I did a wrong command during the
> manipulation of zones and regions. Now the ID of my zonegroup is "default"
> instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or
> regions anymore.
> 
> Is that possible to change the ID of the zonegroup, I try to update the json
> then set the zonegroup but it doesn't work (certainly because it's not the 
> same
> ID...)

if I create a new zonegroup then set as the default zonegroup, update the 
zonegroup-map, zone etc, then delete the zonegroup with the ID
"default" it should work ?

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] RadosGW zonegroup id error

2016-09-01 Thread Yoann Moulin

Hello,

I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I don't
know when this occured, but I think I did a wrong command during the
manipulation of zones and regions. Now the ID of my zonegroup is "default"
instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I cannot update zones or
regions anymore.

Is that possible to change the ID of the zonegroup, I try to update the json
then set the zonegroup but it doesn't work (certainly because it's not the same
ID...)

see below the zonegroup and zonegroup-map metadata

$ radosgw-admin zonegroup get
{
"id": "default",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "",
"zones": [
{
"id": "default",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a"
}

$ radosgw-admin zonegroup-map get
{
"zonegroups": [
{
"key": "4d982760-7853-4174-8c05-cec2ef148cf0",
"val": {
"id": "4d982760-7853-4174-8c05-cec2ef148cf0",
"name": "default",
"api_name": "",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e",
"zones": [
{
"id": "c9724aff-5fa0-4dd9-b494-57bdb48fab4e",
"name": "default",
"endpoints": [],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
        "placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "ccc2e663-66d3-49a6-9e3a-f257785f2d9a"
}
}
],
"master_zonegroup": "4d982760-7853-4174-8c05-cec2ef148cf0",
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}

Thanks for your help,

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)

2016-07-26 Thread Yoann Moulin

Hello Mark,

> FWIW, on CentOS7 I actually saw a performance increase when upgrading from the
> stock 3.10 kernel to 4.4.5 with Intel P3700 NVMe devices.  I was encountering
> some kind of strange concurrency/locking issues at the driver level that 4.4.5
> resolved.  I think your best bet is to try different intermediate kernels, 
> track
> it down as much as you can and then look through the kernel changelog.

The point here is I have only installed kernel from linux-image-virtual-lts
package, I expect for my future environment to stay on lts kernel package
maintained by security team.

anyway, I'm still in test, I can test kernels to try to find from which one the
regression start.

> Sorry I can't be of more help!

no problems :)

--
Yoann

> On 07/25/2016 10:45 AM, Yoann Moulin wrote:
>> Hello,
>>
>> (this is a repost, my previous message seems to be slipping under the radar)
>>
>> Does anyone get a similar behaviour to the one described below ?
>>
>> I found a big performance drop between kernel 3.13.0-88 (default kernel on
>> Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default kernel 
>> on
>> Ubuntu Xenial 16.04)
>>
>> - ceph version is Jewel (10.2.2).
>> - All tests have been done under Ubuntu 14.04 on
>> - Each cluster has 5 nodes strictly identical.
>> - Each node has 10 OSDs.
>> - Journals are on the disk.
>>
>> Kernel 4.4 has a drop of more than 50% compared to 4.2
>> Kernel 4.4 has a drop of 40% compared to 3.13
>>
>> details below :
>>
>> With the 3 kernel I have the same performance on disks :
>>
>> Raw benchmark:
>> dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average 
>> ~230MB/s
>> dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average 
>> ~220MB/s
>>
>> Filesystem mounted benchmark:
>> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average 
>> ~205MB/s
>> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average 
>> ~214MB/s
>> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average 
>> ~190MB/s
>>
>> Ceph osd Benchmark:
>> Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average  ~81MB/s
>> Kernel 4.2.0-38-generic  : ceph tell osd.ID bench => average ~109MB/s
>> Kernel 4.4.0-24-generic  : ceph tell osd.ID bench => average  ~50MB/s
>>
>> I did new benchmarks then on 3 new fresh clusters.
>>
>> - Each cluster has 3 nodes strictly identical.
>> - Each node has 10 OSDs.
>> - Journals are on the disk.
>>
>> bench5 : Ubuntu 14.04 / Ceph Infernalis
>> bench6 : Ubuntu 14.04 / Ceph Jewel
>> bench7 : Ubuntu 16.04 / Ceph jewel
>>
>> this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 
>> x 30
>> OSDs)
>>
>> bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
>> bench6 / 14.04 / Jewel  / kernel 3.13 :  86.47 MB/s
>>
>> bench5 / 14.04 / Infernalis / kernel 4.2  :  63.38 MB/s
>> bench6 / 14.04 / Jewel  / kernel 4.2  : 107.75 MB/s
>> bench7 / 16.04 / Jewel  / kernel 4.2  : 101.54 MB/s
>>
>> bench5 / 14.04 / Infernalis / kernel 4.4  :  53.61 MB/s
>> bench6 / 14.04 / Jewel  / kernel 4.4  :  65.82 MB/s
>> bench7 / 16.04 / Jewel  / kernel 4.4  :  61.57 MB/s
>>
>> If needed, I have the raw output of "ceph tell osd.* bench"
>>
>> Best regards
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)

2016-07-26 Thread Yoann Moulin

Hello,

> Am running ubuntu 16 with kernel 4.4-0.31-generic and my speed are similar.

do you have journal on disk too ?

> I did tests on ubuntu 14 and Ubuntu 16 and the speed is similar. I have around
> 80-90MB/s of OSD speeds in both operating systems

ok and could you do bench with kernel 4.2 ? just to see if you have better
throughput. Thanks

> Only issue am observing now with ubuntu 16 is sometime osd fails on rebooting
> until i start them manually or adding starting commands in rc.local.

in my case, it's a test environment, so I don't have notice those behaviours

--
Yoann

> On Mon, Jul 25, 2016 at 6:45 PM, Yoann Moulin <yoann.mou...@epfl.ch
> <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> (this is a repost, my previous message seems to be slipping under the 
> radar)
> 
> Does anyone get a similar behaviour to the one described below ?
> 
> I found a big performance drop between kernel 3.13.0-88 (default kernel on
> Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default 
> kernel on
> Ubuntu Xenial 16.04)
> 
> - ceph version is Jewel (10.2.2).
> - All tests have been done under Ubuntu 14.04 on
> - Each cluster has 5 nodes strictly identical.
> - Each node has 10 OSDs.
> - Journals are on the disk.
> 
> Kernel 4.4 has a drop of more than 50% compared to 4.2
> Kernel 4.4 has a drop of 40% compared to 3.13
> 
> details below :
> 
> With the 3 kernel I have the same performance on disks :
> 
> Raw benchmark:
> dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average 
> ~230MB/s
> dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average 
> ~220MB/s
> 
> Filesystem mounted benchmark:
> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average 
> ~205MB/s
> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average 
> ~214MB/s
> dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average 
> ~190MB/s
> 
> Ceph osd Benchmark:
> Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average  ~81MB/s
> Kernel 4.2.0-38-generic  : ceph tell osd.ID bench => average ~109MB/s
> Kernel 4.4.0-24-generic  : ceph tell osd.ID bench => average  ~50MB/s
> 
> I did new benchmarks then on 3 new fresh clusters.
> 
> - Each cluster has 3 nodes strictly identical.
> - Each node has 10 OSDs.
> - Journals are on the disk.
> 
> bench5 : Ubuntu 14.04 / Ceph Infernalis
> bench6 : Ubuntu 14.04 / Ceph Jewel
> bench7 : Ubuntu 16.04 / Ceph jewel
> 
> this is the average of 2 runs of "ceph tell osd.* bench" on each cluster 
> (2 x 30
> OSDs)
> 
> bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
> bench6 / 14.04 / Jewel  / kernel 3.13 :  86.47 MB/s
> 
> bench5 / 14.04 / Infernalis / kernel 4.2  :  63.38 MB/s
> bench6 / 14.04 / Jewel  / kernel 4.2  : 107.75 MB/s
> bench7 / 16.04 / Jewel  / kernel 4.2  : 101.54 MB/s
> 
> bench5 / 14.04 / Infernalis / kernel 4.4  :  53.61 MB/s
> bench6 / 14.04 / Jewel  / kernel 4.4  :  65.82 MB/s
> bench7 / 16.04 / Jewel  / kernel 4.4  :  61.57 MB/s
> 
> If needed, I have the raw output of "ceph tell osd.* bench"
> 
> Best regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Jewel + kernel 4.4 Massive performance regression (-50%)

2016-07-25 Thread Yoann Moulin

Hello,

(this is a repost, my previous message seems to be slipping under the radar)

Does anyone get a similar behaviour to the one described below ?

I found a big performance drop between kernel 3.13.0-88 (default kernel on
Ubuntu Trusty 14.04) or kernel 4.2.0 and kernel 4.4.0.24.14 (default kernel on
Ubuntu Xenial 16.04)

- ceph version is Jewel (10.2.2).
- All tests have been done under Ubuntu 14.04 on
- Each cluster has 5 nodes strictly identical.
- Each node has 10 OSDs.
- Journals are on the disk.

Kernel 4.4 has a drop of more than 50% compared to 4.2
Kernel 4.4 has a drop of 40% compared to 3.13

details below :

With the 3 kernel I have the same performance on disks :

Raw benchmark:
dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s
dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average ~220MB/s

Filesystem mounted benchmark:
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average ~205MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average ~190MB/s

Ceph osd Benchmark:
Kernel 3.13.0-88-generic : ceph tell osd.ID bench => average  ~81MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID bench => average ~109MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID bench => average  ~50MB/s

I did new benchmarks then on 3 new fresh clusters.

- Each cluster has 3 nodes strictly identical.
- Each node has 10 OSDs.
- Journals are on the disk.

bench5 : Ubuntu 14.04 / Ceph Infernalis
bench6 : Ubuntu 14.04 / Ceph Jewel
bench7 : Ubuntu 16.04 / Ceph jewel

this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 x 30
OSDs)

bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
bench6 / 14.04 / Jewel  / kernel 3.13 :  86.47 MB/s

bench5 / 14.04 / Infernalis / kernel 4.2  :  63.38 MB/s
bench6 / 14.04 / Jewel  / kernel 4.2  : 107.75 MB/s
bench7 / 16.04 / Jewel  / kernel 4.2  : 101.54 MB/s

bench5 / 14.04 / Infernalis / kernel 4.4  :  53.61 MB/s
bench6 / 14.04 / Jewel  / kernel 4.4  :  65.82 MB/s
bench7 / 16.04 / Jewel  / kernel 4.4  :  61.57 MB/s

If needed, I have the raw output of "ceph tell osd.* bench"

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Re: Infernalis -> Jewel, 10x+ RBD latency increase

2016-07-22 Thread Yoann Moulin

Hi,

>>> I just upgraded from Infernalis to Jewel and see an approximate 10x
>>> latency increase.
>>>
>>> Quick facts:
>>>  - 3x replicated pool
>>>  - 4x 2x-"E5-2690 v3 @ 2.60GHz", 128GB RAM, 6x 1.6 TB Intel S3610
>>> SSDs,
>>>  - LSI3008 controller with up-to-date firmware and upstream driver,
>>> and up-to-date firmware on SSDs.
>>>  - 40GbE (Mellanox, with up-to-date drivers & firmware)
>>>  - CentOS 7.2

Which kernel do you runs ? I found performance drop on the troughput (~40%) with
kernel 4.4 compare of kernel 4.2. I didn't do the bench on latency but maybe the
issue impact the latency too.


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-07-01 Thread Yoann Moulin

Hello,

>>>>>>> I found a performance drop between kernel 3.13.0-88 (default kernel on 
>>>>>>> Ubuntu
>>>>>>> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 
>>>>>>> 16.04)
>>>>>>>
>>>>>>> ceph version is Jewel (10.2.2).
>>>>>>> All tests have been done under Ubuntu 14.04
>>>>>>
>>>>>> Knowing that you also have an internalis cluster on almost identical
>>>>>> hardware, can you please let the list know whether you see the same
>>>>>> behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on
>>>>>> that cluster as well?
>>>>>
>>>>> ceph version is infernalis (9.2.0)
>>>>>
>>>>> Ceph osd Benchmark:
>>>>>
>>>>> Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s
>>>>> Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~90MB/s
>>>>> Kernel 4.4.0-24-generic  : ceph tell osd.ID => average ~75MB/s
>>>>>
>>>>> The slow down is not as much as I have with Jewel but it is still present.
>>>>
>>>> But this is not on precisely identical hardware, is it?
>>>
>>> All the benchmarks were run on strictly identical hardware setups per node.
>>> Clusters differ slightly in sizes (infernalis vs jewel) but nodes and OSDs 
>>> are identical.
>>
>> One thing differ in the osd configuration, on the Jewel cluster, we have 
>> journal
>> on disk, on the Infernalis cluster, we have journal on SSD (S3500)
>>
>> I can restart my test on a Jewel cluster with journal on SSD if needed.
>> I can do as well a test on an Infernalis cluster with journal on disk.
> 
> I'd suggest that the second option is probably more meaningful to test.

I did new benchmarks on 3 clusters. Each cluster has 3 nodes strictly identical.
Each node has 10 OSDs. Journals are on the disk.

bench5 : Ubuntu 14.04 / Ceph Infernalis
bench6 : Ubuntu 14.04 / Ceph Jewel
bench7 : Ubuntu 16.04 / Ceph jewel

this is the average of 2 runs of "ceph tell osd.* bench" on each cluster (2 x 30
OSDs)

bench5 / 14.04 / Infernalis / kernel 3.13 :  54.35 MB/s
bench6 / 14.04 / Jewel  / kernel 3.13 :  86.47 MB/s

bench5 / 14.04 / Infernalis / kernel 4.2  :  63.38 MB/s
bench6 / 14.04 / Jewel  / kernel 4.2  : 107.75 MB/s
bench7 / 16.04 / Jewel  / kernel 4.2  : 101.54 MB/s

bench5 / 14.04 / Infernalis / kernel 4.4  :  53.61 MB/s
bench6 / 14.04 / Jewel  / kernel 4.4  :  65.82 MB/s
bench7 / 16.04 / Jewel  / kernel 4.4  :  61.57 MB/s

If needed, I have the raw output of "ceph tell osd.* bench"

> What I find curious is that no-one else on the list has apparently run
> into this. Any Ubuntu xenial users out there, or perhaps folks on
> trusty who choose to install linux-image-generic-lts-xenial?

Anyone to try on their side if they have the same behaviour ?

Cheers,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-23 Thread Yoann Moulin

Le 23/06/2016 08:25, Sarni Sofiane a écrit :
> Hi Florian,
> 


> On 23.06.16 06:25, "ceph-users on behalf of Florian Haas" 
> <ceph-users-boun...@lists.ceph.com on behalf of flor...@hastexo.com> wrote:
> 
>> On Wed, Jun 22, 2016 at 10:56 AM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>>> Hello Florian,
>>>
>>>> On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>>>>> Hello,
>>>>>
>>>>> I found a performance drop between kernel 3.13.0-88 (default kernel on 
>>>>> Ubuntu
>>>>> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 
>>>>> 16.04)
>>>>>
>>>>> ceph version is Jewel (10.2.2).
>>>>> All tests have been done under Ubuntu 14.04
>>>>
>>>> Knowing that you also have an internalis cluster on almost identical
>>>> hardware, can you please let the list know whether you see the same
>>>> behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on
>>>> that cluster as well?
>>>
>>> ceph version is infernalis (9.2.0)
>>>
>>> Ceph osd Benchmark:
>>>
>>> Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s
>>> Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~90MB/s
>>> Kernel 4.4.0-24-generic  : ceph tell osd.ID => average ~75MB/s
>>>
>>> The slow down is not as much as I have with Jewel but it is still present.
>>
>> But this is not on precisely identical hardware, is it?
>
> All the benchmarks were run on strictly identical hardware setups per node.
> Clusters differ slightly in sizes (infernalis vs jewel) but nodes and OSDs 
> are identical.

One thing differ in the osd configuration, on the Jewel cluster, we have journal
on disk, on the Infernalis cluster, we have journal on SSD (S3500)

I can restart my test on a Jewel cluster with journal on SSD if needed.
I can do as well a test on an Infernalis cluster with journal on disk.

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-22 Thread Yoann Moulin

Hello Florian,

> On Tue, Jun 21, 2016 at 3:11 PM, Yoann Moulin <yoann.mou...@epfl.ch> wrote:
>> Hello,
>>
>> I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
>> Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)
>>
>> ceph version is Jewel (10.2.2).
>> All tests have been done under Ubuntu 14.04
> 
> Knowing that you also have an internalis cluster on almost identical
> hardware, can you please let the list know whether you see the same
> behavior (severely reduced throughput on a 4.4 kernel, vs. 3.13) on
> that cluster as well?

ceph version is infernalis (9.2.0)

Ceph osd Benchmark:

Kernel 3.13.0-88-generic : ceph tell osd.ID => average ~84MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~90MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID => average ~75MB/s

The slow down is not as much as I have with Jewel but it is still present.

Best Regards,

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-21 Thread Yoann Moulin

Hello,

I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)

ceph version is Jewel (10.2.2).
All tests have been done under Ubuntu 14.04

Kernel 4.4 has a drop of 50% compared to 4.2
Kernel 4.4 has a drop of 40% compared to 3.13

details below:

With the 3 kernel I have the same performance on disks :

Raw benchmark:
dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s
dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average ~220MB/s

Filesystem mounted benchmark:
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average ~205MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average ~190MB/s

Ceph osd Benchmark:
Kernel 3.13.0-88-generic : ceph tell osd.ID => average  ~81MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~109MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID => average  ~50MB/s

Does anyone get a similar behaviour on their cluster ?

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] performance issue with jewel on ubuntu xenial (kernel)

2016-06-21 Thread Yoann Moulin

Hello,

I found a performance drop between kernel 3.13.0-88 (default kernel on Ubuntu
Trusty 14.04) and kernel 4.4.0.24.14 (default kernel on Ubuntu Xenial 16.04)

ceph version is Jewel (10.2.2).
All tests have been done under Ubuntu 14.04

Kernel 4.4 has a drop of 50% compared to 4.2
Kernel 4.4 has a drop of 40% compared to 3.13

details below:

With the 3 kernel I have the same performance on disks :

Raw benchmark:
dd if=/dev/zero of=/dev/sdX bs=1M count=1024 oflag=direct=> average ~230MB/s
dd if=/dev/zero of=/dev/sdX bs=1G count=1 oflag=direct   => average ~220MB/s

Filesystem mounted benchmark:
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1  => average ~205MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=direct => average ~214MB/s
dd if=/dev/zero of=/sdX1/test.img bs=1G count=1 oflag=sync   => average ~190MB/s

Ceph osd Benchmark:
Kernel 3.13.0-88-generic : ceph tell osd.ID => average  ~81MB/s
Kernel 4.2.0-38-generic  : ceph tell osd.ID => average ~109MB/s
Kernel 4.4.0-24-generic  : ceph tell osd.ID => average  ~50MB/s

Does anyone get a similar behaviour on their cluster ?

Best regards

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] journal or cache tier on SSDs ?

2016-05-10 Thread Yoann Moulin

Hello,

I'd like some advices about the setup of a new ceph cluster. Here the use case :

RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage. Most of
the access will be in read only mode. Write access will only be done by the
admin to update the datasets.

We might use rbd some time to sync data as temp storage (when POSIX is needed)
but performance will not be an issue here. We might use cephfs in the futur if
that can replace a filesystem on rdb.

We gonna start with 16 nodes (up to 24). The configuration of each node is :

CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (12c/48t)
Memory : 128GB
OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1)
Journal or cache Storage : 2 x SSD 400GB Intel S3300 DC (no Raid)
OSD Disk : 10 x HGST ultrastar-7k6000 6TB
Public Network : 1 x 10Gb/s
Private Network : 1 x 10Gb/s
OS : Ubuntu 16.04
Ceph version : Jewel

The question is : journal or cache tier (read only) on the SD 400GB Intel S3300 
DC ?

Each disk is able to write sequentially at 220MB/s. SSDs can write at ~500MB/s.
if we set 5 journals on each SSDs, SSD will still be the bottleneck (1GB/s vs
2GB/s). If we set the journal on OSDs, we can expect a good throughput in read
on the disk (in case of data not in the cache) and write shouldn't be so bad
too, even if we have random read on the OSD during the write ?

SSDs as cache tier seem to be a better usage than only 5 journal on each ? Is
that correct ?

We gonna use an EC pool for big files (jerasure 8+2 I think) and a replicated
pool for small files.

If I check on http://ceph.com/pgcalc/, in this use case

replicated pool: pg_num = 8192 for 160 OSDs but 16384 for 240 OSDs
Ec pool : pg_num = 4096
and pgp_num = pg_num

Should I set the pg_num to 8192 or 16384 ? what is the impact on the cluster if
we set the pg_num to 16384 at the beginning ? 16384 is high, isn't it ?

Thanks for your help

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to choose EC plugins and rulesets

2016-03-10 Thread Yoann Moulin

Le 10/03/2016 09:26, Nick Fisk a écrit :
> What is your intended use case RBD/FS/RGW? There are no major improvements
> in Jewel that I am aware of. The big one will be when EC pools allow direct
> partial overwrites without the use of a cache tier.

The main goal is for RadosGW. Most of the access will be read only.

We are interested also to use block device and later cephfs but it's not in our
priority. And in those cases, we did not discuss about replicate or erasure yet.

If you have some insight about this cases, we are also interested.

Thnaks,

Yoann

>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Yoann Moulin
>> Sent: 09 March 2016 16:01
>> To: ceph-us...@ceph.com
>> Subject: [ceph-users] how to choose EC plugins and rulesets
>>
>> Hello,
>>
>> We are looking for recommendations and guidelines for using erasure codes
>> (EC) with Ceph.
>>
>> Our setup consists of 25 identical nodes which we dedicate to Ceph. Each
>> node contains 10 HDDs (full specs below)
>>
>> We started with 10 nodes (comprising 100 OSDs) and created a pool with 3-
>> times replication.
>>
>> In order to increase the usable capacity, we would like to go for EC
> instead of
>> replication.
>>
>> - Can anybody share with us recommendations regarding the choice of
>> plugins and rulesets?
>> - In particular, how do we relate to the number of nodes and OSDs? Any
>> formulas or rules of thumb?
>> - Is it possible to change rulesets live on a pool?
>>
>> We currently use Infernalis but plan to move to Jewel.
>>
>> - Are there any improvement in Jewel with regard to erasure codes?
>>
>> Looking forward for your answers.
>>
>>
>> =
>>
>> Full specs of nodes
>>
>> CPU: 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
>> Memory: 128GB of Memory
>> OS Storage: 2 x SSD 240GB Intel S3500 DC (raid 1) Journal Storage: 2 x SSD
>> 400GB Intel S3300 DC (no Raid) OSD Disk: 10 x HGST ultrastar-7k6000 6TB
>> Network: 1 x 10Gb/s
>> OS: Ubuntu 14.04
>>
>> --
>> Yoann Moulin
>> EPFL IC-IT
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how to choose EC plugins and rulesets

2016-03-09 Thread Yoann Moulin

Hello,

We are looking for recommendations and guidelines for using erasure codes (EC)
with Ceph.

Our setup consists of 25 identical nodes which we dedicate to Ceph. Each node
contains 10 HDDs (full specs below)

We started with 10 nodes (comprising 100 OSDs) and created a pool with 3-times
replication.

In order to increase the usable capacity, we would like to go for EC instead of
replication.

- Can anybody share with us recommendations regarding the choice of plugins and
rulesets?
- In particular, how do we relate to the number of nodes and OSDs? Any formulas
or rules of thumb?
- Is it possible to change rulesets live on a pool?

We currently use Infernalis but plan to move to Jewel.

- Are there any improvement in Jewel with regard to erasure codes?

Looking forward for your answers.


=

Full specs of nodes

CPU: 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Memory: 128GB of Memory
OS Storage: 2 x SSD 240GB Intel S3500 DC (raid 1)
Journal Storage: 2 x SSD 400GB Intel S3300 DC (no Raid)
OSD Disk: 10 x HGST ultrastar-7k6000 6TB
Network: 1 x 10Gb/s
OS: Ubuntu 14.04

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSDs go down with infernalis

2016-03-09 Thread Yoann Moulin

Hello,

> If you manually create your journal partition, you need to specify the correct
> Ceph partition GUID in order for the system and Ceph to identify the partition
> as Ceph journal and affect correct ownership and permissions at boot via udev.

In my latest run, I let ceph-ansible creating partitions, everything seem to be
fine.

> I used something like this to create the partition :
> sudo sgdisk --new=1:0G:15G --typecode=1:45B0969E-9B03-4F30-B4C6-B4B80CEFF106
>  --partition-guid=$(uuidgen -r) --mbrtogpt -- /dev/sda
> 
> 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 being the GUID. More info on GTP GUID is
> available on wikipedia [1].
> 
> I think the issue with the label you had was linked to some bugs in the disk
> initialization process. This was discussed a few weeks back on this mailing 
> list.
> 
> [1] https://en.wikipedia.org/wiki/GUID_Partition_Table

That what I read on the irc channel, it seem to be a common mistake, might be
good to talk about that in the doc or FAQ ?

Yoann

> On Tue, Mar 8, 2016 at 5:21 PM, Yoann Moulin <yoann.mou...@epfl.ch
> <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello Adrien,
> 
> > I think I faced the same issue setting up my own cluster. If it is the 
> same,
> > it's one of the many people encounter(ed) during disk initialization.
> > Could you please give the output of :
> >  - ll /dev/disk/by-partuuid/
> >  - ll /var/lib/ceph/osd/ceph-*
> 
> unfortunately, I already reinstall my test cluster, but I got some 
> information
> that might explain this issue.
> 
> I was creating the journal partition before running the ansible playbook.
> firstly, owner and right was not persistent at boot (had to add udev's 
> rules).
> And I strongly suspect a side effect of not let ceph-disk create journal
> partition.
> 
> Yoann
> 
> > On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.mou...@epfl.ch 
> <mailto:yoann.mou...@epfl.ch>
> > <mailto:yoann.mou...@epfl.ch <mailto:yoann.mou...@epfl.ch>>> wrote:
> >
> > Hello,
> >
> > I'm (almost) a new user of ceph (couple of month). In my university,
> we start to
> > do some test with ceph a couple of months ago.
> >
> > We have 2 clusters. Each cluster have 100 OSDs on 10 servers :
> >
> > Each server as this setup :
> >
> > CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
> > Memory : 128GB of Memory
> > OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1)
> > Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid)
> > OSD Disk : 10 x HGST ultrastar-7k6000 6TB
> > Network : 1 x 10Gb/s
> > OS : Ubuntu 14.04
> > Ceph version : infernalis 9.2.0
> >
> > One cluster give access to some user through a S3 gateway (service 
> is
> still in
> > beta). We call this cluster "ceph-beta".
> >
> > One cluster is for our internal need to learn more about ceph. We 
> call
> this
> > cluster "ceph-test". (those servers will be integrated into the 
> ceph-beta
> > cluster when we will need more space)
> >
> > We have deploy both clusters with the ceph-ansible playbook[1]
> >
> > Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no 
> raid. 5
> > journals partitions on each SSDs.
> >
> > OSDs disk are format in XFS.
> >
> > 1. https://github.com/ceph/ceph-ansible
> >
> > We have an issue. Some OSDs go down and don't start. It seem to be
> related to
> > the fsid of the journal partition :
> >
> > > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal
> FileJournal::open:
> > ondisk fsid ---- doesn't match 
> expected
> > eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) 
> journal
> >
> > in attachment, the full logs of one of the dead OSDs
> >
> > We had this issue with 2 OSDs on ceph-beta cluster fixed by 
> removing,
> zapping
> > and readding it.
> >
> > Now, we have the same issue on ceph-test cluster but on 18 OSDs.
> >
> > Now the stats of this cluster
> >
> > > root@icadmin004:~# ceph -s
> > > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8
> > >  health HEA

Re: [ceph-users] OSDs go down with infernalis

2016-03-08 Thread Yoann Moulin

Hello Adrien,

> I think I faced the same issue setting up my own cluster. If it is the same,
> it's one of the many people encounter(ed) during disk initialization. 
> Could you please give the output of :
>  - ll /dev/disk/by-partuuid/
>  - ll /var/lib/ceph/osd/ceph-*

unfortunately, I already reinstall my test cluster, but I got some information
that might explain this issue.

I was creating the journal partition before running the ansible playbook.
firstly, owner and right was not persistent at boot (had to add udev's rules).
And I strongly suspect a side effect of not let ceph-disk create journal 
partition.

Yoann

> On Thu, Mar 3, 2016 at 3:42 PM, Yoann Moulin <yoann.mou...@epfl.ch
> <mailto:yoann.mou...@epfl.ch>> wrote:
> 
> Hello,
> 
> I'm (almost) a new user of ceph (couple of month). In my university, we 
> start to
> do some test with ceph a couple of months ago.
> 
> We have 2 clusters. Each cluster have 100 OSDs on 10 servers :
> 
> Each server as this setup :
> 
> CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
> Memory : 128GB of Memory
> OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1)
> Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid)
> OSD Disk : 10 x HGST ultrastar-7k6000 6TB
> Network : 1 x 10Gb/s
> OS : Ubuntu 14.04
> Ceph version : infernalis 9.2.0
> 
> One cluster give access to some user through a S3 gateway (service is 
> still in
> beta). We call this cluster "ceph-beta".
> 
> One cluster is for our internal need to learn more about ceph. We call 
> this
> cluster "ceph-test". (those servers will be integrated into the ceph-beta
> cluster when we will need more space)
> 
> We have deploy both clusters with the ceph-ansible playbook[1]
> 
> Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5
> journals partitions on each SSDs.
> 
> OSDs disk are format in XFS.
> 
> 1. https://github.com/ceph/ceph-ansible
> 
> We have an issue. Some OSDs go down and don't start. It seem to be 
> related to
> the fsid of the journal partition :
> 
> > -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal 
> FileJournal::open:
> ondisk fsid ---- doesn't match expected
> eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal
> 
> in attachment, the full logs of one of the dead OSDs
> 
> We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, 
> zapping
> and readding it.
> 
> Now, we have the same issue on ceph-test cluster but on 18 OSDs.
> 
> Now the stats of this cluster
> 
> > root@icadmin004:~# ceph -s
> > cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8
> >  health HEALTH_WARN
> > 1024 pgs incomplete
> > 1024 pgs stuck inactive
> > 1024 pgs stuck unclean
> >  monmap e1: 3 mons at
> 
> {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0
> 
> <http://10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0>}
> > election epoch 62, quorum 0,1,2
> iccluster003,iccluster014,iccluster022
> >  osdmap e242: 100 osds: 82 up, 82 in
> > flags sortbitwise
> >   pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects
> > 4812 MB used, 447 TB / 447 TB avail
> > 1280 active+clean
> > 1024 creating+incomplete
> 
> We have install this cluster at the begin of February. We did not use that
> cluster at all even at the begin to troubleshoot an issue with 
> ceph-ansible. We
> did not push any data neither create pool. What could explain this 
> behaviour ?
> 
> Thanks for your help
> 
> Best regards,
> 
> --
> Yoann Moulin
> EPFL IC-IT
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 


-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] OSDs go down with infernalis

2016-03-03 Thread Yoann Moulin

Hello,

I'm (almost) a new user of ceph (couple of month). In my university, we start to
do some test with ceph a couple of months ago.

We have 2 clusters. Each cluster have 100 OSDs on 10 servers :

Each server as this setup :

CPU : 2 x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Memory : 128GB of Memory
OS Storage : 2 x SSD 240GB Intel S3500 DC (raid 1)
Journal Storage : 2 x SSD 400GB Intel S3300 DC (no Raid)
OSD Disk : 10 x HGST ultrastar-7k6000 6TB
Network : 1 x 10Gb/s
OS : Ubuntu 14.04
Ceph version : infernalis 9.2.0

One cluster give access to some user through a S3 gateway (service is still in
beta). We call this cluster "ceph-beta".

One cluster is for our internal need to learn more about ceph. We call this
cluster "ceph-test". (those servers will be integrated into the ceph-beta
cluster when we will need more space)

We have deploy both clusters with the ceph-ansible playbook[1]

Journal are raw partitions on SSDs (400GB Intel S3300 DC) with no raid. 5
journals partitions on each SSDs.

OSDs disk are format in XFS.

1. https://github.com/ceph/ceph-ansible

We have an issue. Some OSDs go down and don't start. It seem to be related to
the fsid of the journal partition :

> -1> 2016-03-03 14:09:05.422515 7f31118d0940 -1 journal FileJournal::open: 
> ondisk fsid ---- doesn't match expected 
> eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal

in attachment, the full logs of one of the dead OSDs

We had this issue with 2 OSDs on ceph-beta cluster fixed by removing, zapping
and readding it.

Now, we have the same issue on ceph-test cluster but on 18 OSDs.

Now the stats of this cluster

> root@icadmin004:~# ceph -s
> cluster 4fb4773c-0873-44ad-a65f-269f01bfcff8
>  health HEALTH_WARN
> 1024 pgs incomplete
> 1024 pgs stuck inactive
> 1024 pgs stuck unclean
>  monmap e1: 3 mons at 
> {iccluster003=10.90.37.4:6789/0,iccluster014=10.90.37.15:6789/0,iccluster022=10.90.37.23:6789/0}
> election epoch 62, quorum 0,1,2 
> iccluster003,iccluster014,iccluster022
>  osdmap e242: 100 osds: 82 up, 82 in
> flags sortbitwise
>   pgmap v469212: 2304 pgs, 10 pools, 2206 bytes data, 181 objects
> 4812 MB used, 447 TB / 447 TB avail
> 1280 active+clean
> 1024 creating+incomplete

We have install this cluster at the begin of February. We did not use that
cluster at all even at the begin to troubleshoot an issue with ceph-ansible. We
did not push any data neither create pool. What could explain this behaviour ?

Thanks for your help

Best regards,

-- 
Yoann Moulin
EPFL IC-IT
2016-03-03 14:09:00.433074 7efd1a9d5940  0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 4446
2016-03-03 14:09:01.315583 7efd1a9d5940  0 filestore(/var/lib/ceph/osd/ceph-2) backend xfs (magic 0x58465342)
2016-03-03 14:09:01.338328 7efd1a9d5940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2016-03-03 14:09:01.338335 7efd1a9d5940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2016-03-03 14:09:01.338362 7efd1a9d5940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: splice is supported
2016-03-03 14:09:01.341468 7efd1a9d5940  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2016-03-03 14:09:01.341517 7efd1a9d5940  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-2) detect_features: extsize is supported and your kernel >= 3.5
2016-03-03 14:09:01.411145 7efd1a9d5940  0 filestore(/var/lib/ceph/osd/ceph-2) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2016-03-03 14:09:01.692400 7efd1a9d5940 -1 journal FileJournal::open: ondisk fsid ---- doesn't match expected eeadbce2-f096-4156-ba56-dfc634e59106, invalid (someone else's?) journal
2016-03-03 14:09:01.694251 7efd1a9d5940 -1 os/FileJournal.h: In function 'virtual FileJournal::~FileJournal()' thread 7efd1a9d5940 time 2016-03-03 14:09:01.692413
os/FileJournal.h: 406: FAILED assert(fd == -1)

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) [0x7efd1a4cbf2b]
 2: (()+0x2c2f80) [0x7efd19ed1f80]
 3: (FileJournal::~FileJournal()+0x67e) [0x7efd1a1b476e]
 4: (JournalingObjectStore::journal_replay(unsigned long)+0xbfa) [0x7efd1a1c353a]
 5: (FileStore::mount()+0x3b42) [0x7efd1a198a62]
 6: (OSD::init()+0x26d) [0x7efd19f51a5d]
 7: (main()+0x2954) [0x7efd19ed7474]
 8: (__libc_start_main()+0xf5) [0x7efd16d59ec5]
 9: (()+0x2f82b7) [0x7efd19f072b7]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to inte

Re: [ceph-users] can not umount ceph osd partition

2016-02-04 Thread Yoann Moulin

Hello,

>>> I am using 0.94.5. When I try to umount partition and fsck it I have issue:
>>> root@storage003:~# stop ceph-osd id=13
>>> ceph-osd stop/waiting
>>> root@storage003:~# umount /var/lib/ceph/osd/ceph-13
>>> root@storage003:~# fsck -yf /dev/sdf
>>> fsck from util-linux 2.20.1
>>> e2fsck 1.42.9 (4-Feb-2014)
>>> /dev/sdf is in use.
>>> e2fsck: Cannot continue, aborting.
>>>
>>> There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to 
>>> check
>>> fs.
>>> I can mount -o remount,rw, but I would like to umount device for maintenance
>>> and, maybe, replace it.
>>>
>>> Why I can't umount?
> 
>> is "lsof -n | grep /dev/sdf" give something ?
> 
> Nothing.
> 
>> and are you sure /dev/sdf is the disk for osd 13 ?
> 
> Absolutelly. I have even tried fsck -yf /dev/disk/by-label/osd-13. No luck.
> 
> Disk is mounted using LABEL in fstab, journal is symlink to
> /dev/disk/by-partlabel/j-13.

I think it's more linux related.

could you try to look with lsof if something hold the device by the
label or uuid instead of /dev/sdf ?

you can try to delete the device from the scsi bus with something like :

echo 1 > /sys/block//device/delete

be careful, it is like removing the disk physically, if a process holds
the device, you might expect that process gonna switch into kernel
status "D+" . You won't be able to kill that process even by kill -9. To
stop it, you will have to reboot the server.

you can give a look here how to manipulate scsi bus:

http://fibrevillage.com/storage/279-hot-add-remove-rescan-of-scsi-devices-on-linux

you can install the package "scsitools" that provide rescan-scsi-bus.sh
to rescan you scsi bus to get back your disk removed.

http://manpages.ubuntu.com/manpages/precise/man8/rescan-scsi-bus.8.html

hope that can help you

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] can not umount ceph osd partition

2016-02-03 Thread Yoann Moulin

Hello,

> I am using 0.94.5. When I try to umount partition and fsck it I have issue:
> root@storage003:~# stop ceph-osd id=13
> ceph-osd stop/waiting
> root@storage003:~# umount /var/lib/ceph/osd/ceph-13
> root@storage003:~# fsck -yf /dev/sdf
> fsck from util-linux 2.20.1
> e2fsck 1.42.9 (4-Feb-2014)
> /dev/sdf is in use.
> e2fsck: Cannot continue, aborting.
> 
> There is no /var/lib/ceph/osd/ceph-13 in /proc mounts. But no ability to check
> fs.
> I can mount -o remount,rw, but I would like to umount device for maintenance
> and, maybe, replace it.
> 
> Why I can't umount?

is "lsof -n | grep /dev/sdf" give something ?

and are you sure /dev/sdf is the disk for osd 13 ?

-- 
Yoann Moulin
EPFL IC-IT
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

76 matches

Mail list logo