Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread myxingkong

Hi Pritha:


I added administrator quotas to users, but they didn't seem to work.

radosgw-admin user create --uid=ADMIN --display-name=ADMIN --admin --system

radosgw-admin caps add --uid="ADMIN" 
--caps="user-policy=*;roles=*;users=*;buckets=*;metadata=*;usage=*;zone=*"
{
    "user_id": "ADMIN",
    "display_name": "ADMIN",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "ADMIN",
            "access_key": "HTRJ1HIKR4FB9A24ZG9C",
            "secret_key": "Dfk7t5u4jvdyFMlEf8t4MTdBLEqVlru7tag1g8PE"
        }
    ],
    "swift_keys": [],
    "caps": [
        {
            "type": "buckets",
            "perm": "*"
        },
        {
            "type": "metadata",
            "perm": "*"
        },
        {
            "type": "roles",
            "perm": "*"
        },
        {
            "type": "usage",
            "perm": "*"
        },
        {
            "type": "user-policy",
            "perm": "*"
        },
        {
            "type": "users",
            "perm": "*"
        },
        {
            "type": "zone",
            "perm": "*"
        }
    ],
    "op_mask": "read, write, delete",
    "system": "true",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}


Thanks,
myxingkong

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread Pritha Srivastava
Hi Myxingkong,

Did you add admin caps to the user (with access key id
'HTRJ1HIKR4FB9A24ZG9C'), which is trying to attach a user policy. using the
command below:

radosgw-admin caps add --uid= --caps="user-policy=*"

Thanks,
Pritha

On Tue, Mar 12, 2019 at 7:19 AM myxingkong  wrote:

> Hi Pritha:
> I was unable to attach the permission policy through S3curl, which
> returned an HTTP 403 error.
>
> ./s3curl.pl --id admin -- -s -v -X POST "
> http://192.168.199.81:7480/?Action=PutUserPolicy=Policy1=TESTER=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08;
>
> Request:
> > POST
> /?Action=PutUserPolicy=Policy1=TESTER={"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"s3:*","Resource":["*"],"Condition":{"BoolIfExists":{"sts:authentication":"false"}}},{"Effect":"Allow","Action":"sts:GetSessionToken","Resource":"*","Condition":{"BoolIfExists":{"sts:authentication":"false"}}}]}=2010-05-08
> HTTP/1.1
> > User-Agent: curl/7.29.0
> > Host: 192.168.199.81:7480
> > Accept: */*
> > Date: Tue, 12 Mar 2019 01:39:55 GMT
> > Authorization: AWS HTRJ1HIKR4FB9A24ZG9C:FTMBoc7+sJf0K+cx+nYD7Sdj2Xg=
> Response:
> < HTTP/1.1 403 Forbidden
> < Content-Length: 187
> < x-amz-request-id: tx00144-005c870deb-4a92d-default
> < Accept-Ranges: bytes
> < Content-Type: application/xml
> < Date: Tue, 12 Mar 2019 01:39:55 GMT
> <
> * Connection #0 to host 192.168.199.81 left intact
>  encoding="UTF-8"?>AccessDeniedtx00144-005c870deb-4a92d-default4a92d-default-default
>
>
> .s3curl
> %awsSecretAccessKeys = (
> admin => {
> id => 'HTRJ1HIKR4FB9A24ZG9C',
> key => 'Dfk7t5u4jvdyFMlEf8t4MTdBLEqVlru7tag1g8PE',
> },
> );
> Can you tell me what went wrong?
> Thanks,
> myxingkong
>
>
> *发件人:* myxingkong 
> *发送时间:* 2019-03-11 18:13:33
> *收件人:*  prsri...@redhat.com
> *抄送:*  ceph-users@lists.ceph.com
> *主题:* Re: [ceph-users] How to attach permission policy to user?
>
> Hi Pritha:
>
> This is the documentation for configuring restful modules:
> http://docs.ceph.com/docs/nautilus/mgr/restful/
>
> The command given according to the official documentation is to attach the
> permission policy through the REST API.
>
> This is the documentation for STS lite:
> http://docs.ceph.com/docs/nautilus/radosgw/STSLite/
>
> My version of ceph is: ceph version 14.1.0
> (adfd524c32325562f61c055a81dba4cb1b117e84) nautilus (dev)
>
> Thanks,
> myxingkong
> On 3/11/2019 18:06,Pritha Srivastava
>  wrote:
>
> Hi Myxingkong,
>
> Can you explain what you mean by 'enabling restful modules', particularly
> which document are you referring to?
>
> Right now there is no other way to attach a permission policy to a user.
>
> There is work in progress for adding functionality to RGW using which such
> calls can be scripted using boto.
>
> Thanks,
> Pritha
>
> On Mon, Mar 11, 2019 at 3:21 PM myxingkong  wrote:
>
>> Hello:
>>
>> I want to use the GetSessionToken method to get the temporary
>> credentials, but according to the answer given in the official
>> documentation, I need to attach a permission policy to the user before I
>> can use the GetSessionToken method.
>>
>> This is the command for the additional permission policy provided by the
>> official documentation:
>>
>> s3curl.pl --debug --id admin -- -s -v -X POST "
>> http://localhost:8000/?Action=PutUserPolicy=Policy1=TESTER1=\{\
>> "Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08"
>>
>>
>> This requires enabling restful modules to execute this command.
>>
>> I configured the restful module according to the documentation, but
>> without success, I was unable to configure the SSL certificate.
>>
>> ceph config-key set mgr/restful/crt -i restful.crt
>>
>> WARNING: it looks like you might be trying to set a ceph-mgr module
>> configuration key. Since Ceph 13.0.0 (Mimic), mgr module configuration is
>> done with `config set`, and new values set using `config-key set` will be
>> ignored.
>> set mgr/restful/crt
>>
>> Can someone tell me if there is a way to configure a restful module's
>> certificate, or if there is another way to attach permission policies to
>> users?
>>
>> Thanks,
>> myxingkong
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread myxingkong

Hi Pritha:

I was unable to attach the permission policy through S3curl, which returned an 
HTTP 403 error.

./s3curl.pl --id admin -- -s -v -X POST 
"http://192.168.199.81:7480/?Action=PutUserPolicy=Policy1=TESTER=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08;


Request:

> POST 
> /?Action=PutUserPolicy=Policy1=TESTER={"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"s3:*","Resource":["*"],"Condition":{"BoolIfExists":{"sts:authentication":"false"}}},{"Effect":"Allow","Action":"sts:GetSessionToken","Resource":"*","Condition":{"BoolIfExists":{"sts:authentication":"false"}}}]}=2010-05-08
>  HTTP/1.1 
> User-Agent: curl/7.29.0
> Host: 192.168.199.81:7480
> Accept: */*
> Date: Tue, 12 Mar 2019 01:39:55 GMT
> Authorization: AWS HTRJ1HIKR4FB9A24ZG9C:FTMBoc7+sJf0K+cx+nYD7Sdj2Xg=

Response:

< HTTP/1.1 403 Forbidden
< Content-Length: 187
< x-amz-request-id: tx00144-005c870deb-4a92d-default
< Accept-Ranges: bytes
< Content-Type: application/xml
< Date: Tue, 12 Mar 2019 01:39:55 GMT
< 
* Connection #0 to host 192.168.199.81 left intact
AccessDeniedtx00144-005c870deb-4a92d-default4a92d-default-default


.s3curl

%awsSecretAccessKeys = (
    admin => {
        id => 'HTRJ1HIKR4FB9A24ZG9C',
        key => 'Dfk7t5u4jvdyFMlEf8t4MTdBLEqVlru7tag1g8PE',
    },
);

Can you tell me what went wrong?


Thanks,
myxingkong


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3-node cluster with 3 x Intel Optane 900P - very low benchmarked performance (200 IOPS)?

2019-03-11 Thread David Clarke
On 9/03/19 10:07 PM, Victor Hooi wrote:
> Hi,
> 
> I'm setting up a 3-node Proxmox cluster with Ceph as the shared storage,
> based around Intel Optane 900P drives (which are meant to be the bee's
> knees), and I'm seeing pretty low IOPS/bandwidth.

We found that CPU performance, specifically power state, settings played
a large part in latency, and therefore IOPS.  This wasn't too evident
with spinning disks, but makes a large percentage difference in our NVMe
based clusters.

You may want to investigate setting processor.max_cstate=1 or
intel_idle.max_state=1, whichever is appropriate for your CPUs and
kernel, in the boot cmdline.



-- 
David Clarke
Systems Architect
Catalyst IT



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Is repairing an RGW bucket index broken?

2019-03-11 Thread Bryan Stillwell
I'm wondering if the 'radosgw-admin bucket check --fix' command is broken in 
Luminous (12.2.8)?

I'm asking because I'm trying to reproduce a situation we have on one of our 
production clusters and it doesn't seem to do anything.  Here's the steps of my 
test:

1. Create a bucket with 1 million objects
2. Verify the bucket got sharded into 10 shards of (100,000 objects each)
3. Remove one of the shards using the rados command
4. Verify the bucket is broken
5. Attempt to fix the bucket

I got as far as step 4:

# rados -p .rgw.buckets.index ls | grep "default.1434737011.12485" | sort
.dir.default.1434737011.12485.0
.dir.default.1434737011.12485.1
.dir.default.1434737011.12485.2
.dir.default.1434737011.12485.3
.dir.default.1434737011.12485.4
.dir.default.1434737011.12485.5
.dir.default.1434737011.12485.6
.dir.default.1434737011.12485.8
.dir.default.1434737011.12485.9
# radosgw-admin bucket list --bucket=bstillwell-1mil
ERROR: store->list_objects(): (2) No such file or directory

But step 5 is proving problematic:

# time radosgw-admin bucket check --fix --bucket=bstillwell-1mil

real0m0.201s
user0m0.105s
sys 0m0.033s

# time radosgw-admin bucket check --fix --check-objects --bucket=bstillwell-1mil

real0m0.188s
user0m0.102s
sys 0m0.025s


Could someone help me figure out what I'm missing?

Thanks,
Bryan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Chasing slow ops in mimic

2019-03-11 Thread Alex Litvak

Hello Cephers,

I am trying to find the cause of multiple slow ops happened with my 
small cluster.  I have a 3 node  with 9 OSDs


Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
128 GB RAM
Each OSD is SSD Intel DC-S3710 800GB
It runs mimic 13.2.2 in containers.

Cluster was operating normally for 4 month and then recently I had an 
outage with multiple VMs (RBD) showing


Mar  8 07:59:42 sbc12n2-chi.siptalk.com kernel: [140206.243812] INFO: 
task xfsaild/vda1:404 blocked for more than 120 seconds.
Mar  8 07:59:42 sbc12n2-chi.siptalk.com kernel: [140206.243957] 
Not tainted 4.19.5-1.el7.elrepo.x86_64 #1
Mar  8 07:59:42 sbc12n2-chi.siptalk.com kernel: [140206.244063] "echo 0 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  8 07:59:42 sbc12n2-chi.siptalk.com kernel: [140206.244181] 
xfsaild/vda1D0   404  2 0x8000


After examining ceph logs, i found following entries in multiple OSDs
Mar  8 07:38:52 storage1n2-chi ceph-osd-run.sh[20939]: 2019-03-08 
07:38:52.299 7fe0bdb8f700 -1 osd.13 502 get_health_metrics reporting 1 
slow ops, oldest is osd_op(client.148553.0:5996289 7.fe 
7:7f0ebfe2:::rbd_data.17bab2eb141f2.023d:head [stat,write 
2588672~16384] snapc 0=[] ondisk+write+known_if_redirected e502)
Mar  8 07:38:53 storage1n2-chi ceph-osd-run.sh[20939]: 2019-03-08 
07:38:53.347 7fe0bdb8f700 -1 osd.13 502 get_health_metrics reporting 1 
slow ops, oldest is osd_op(client.148553.0:5996289 7.fe 
7:7f0ebfe2:::rbd_data.17bab2eb141f2.


Mar  8 07:43:05 storage1n2-chi ceph-osd-run.sh[28089]: 2019-03-08 
07:43:05.360 7f32536bd700 -1 osd.7 502 get_health_metrics reporting 1 
slow ops, oldest is osd_op(client.152215.0:7037343 7.1e 
7:78d776e4:::rbd_data.27e662eb141f2.0436:head [stat,write 
393216~16384] snapc 0=[] ondisk+write+known_if_redirected e502)
Mar  8 07:43:06 storage1n2-chi ceph-osd-run.sh[28089]: 2019-03-08 
07:43:06.332 7f32536bd700 -1 osd.7 502 get_health_metrics reporting 2 
slow ops, oldest is osd_op(client.152215.0:7037343 7.1e 
7:78d776e4:::rbd_data.27e662eb141f2.0436:head [stat,write 
393216~16384] snapc 0=[] ondisk+write+known_if_redirected e502)


The messages were showing on all nodes and affecting several osds on 
each node.


The trouble started at approximately 07:30 am and end 30 minutes later. 
I have not seen any slow ops since then, nor VMs showed kernel hangups 
since then.  Here is my ceph status.  I also want to note that the load 
on the cluster was minimal at the time.  Please let me know where I 
could start looking as the cluster cannot be in production with this 
failures.


 cluster:
id: 054890af-aef7-46cf-a179-adc9170e3958
health: HEALTH_OK

  services:
mon: 3 daemons, quorum storage1n1-chi,storage1n2-chi,storage1n3-chi
mgr: storage1n3-chi(active), standbys: storage1n1-chi, storage1n2-chi
mds: cephfs-1/1/1 up  {0=storage1n2-chi=up:active}, 2 up:standby
osd: 27 osds: 27 up, 27 in
rgw: 3 daemons active

  data:
pools:   7 pools, 608 pgs
objects: 1.46 M objects, 697 GiB
usage:   3.0 TiB used, 17 TiB / 20 TiB avail
pgs: 608 active+clean

  io:
client:   0 B/s rd, 91 KiB/s wr, 6 op/s rd, 10 op/s wr

Thank you in advance,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3-node cluster with 3 x Intel Optane 900P - very low benchmarked performance (200 IOPS)?

2019-03-11 Thread Vitaliy Filippov
These options aren't needed, numjobs is 1 by default and RBD has no "sync"  
concept at all. Operations are always "sync" by default.


In fact even --direct=1 may be redundant because there's no page cache  
involved. However I keep it just in case - there is the RBD cache, what if  
one day fio gets it enabled? :)



how about adding:  --sync=1 --numjobs=1  to the command as well?


--
With best regards,
  Vitaliy Filippov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 3-node cluster with 3 x Intel Optane 900P - very low benchmarked performance (200 IOPS)?

2019-03-11 Thread solarflow99
how about adding:  --sync=1 --numjobs=1  to the command as well?



On Sat, Mar 9, 2019 at 12:09 PM Vitaliy Filippov  wrote:

> There are 2:
>
> fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite
> -pool=bench -rbdname=testimg
>
> fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=128 -rw=randwrite
> -pool=bench -rbdname=testimg
>
> The first measures your min possible latency - it does not scale with the
> number of OSDs at all, but it's usually what real applications like
> DBMSes
> need.
>
> The second measures your max possible random write throughput which you
> probably won't be able to utilize if you don't have enough VMs all
> writing
> in parallel.
>
> --
> With best regards,
>Vitaliy Filippov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw sync falling behind regularly

2019-03-11 Thread Trey Palmer
HI Casey,

We're still trying to figure this sync problem out, if you could possibly
tell us anything further we would be deeply grateful!

Our errors are coming from 'data sync'.   In `sync status` we pretty
constantly show one shard behind, but a different one each time we run it.

Here's a paste -- these commands were run in rapid succession.

root@sv3-ceph-rgw1:~# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
root@sv3-ceph-rgw1:~# radosgw-admin sync status
  realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
  zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
   zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 1 shards
behind shards: [30]
oldest incremental change not applied: 2019-01-19
22:53:23.0.16109s
source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source
root@sv3-ceph-rgw1:~#


Below I'm pasting a small section of log.  Thanks so much for looking!

Trey Palmer


root@sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep
-i error
2019-03-08 11:43:07.208572 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.211348 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.267117 7fa080cc7700  0 data sync: ERROR: failed to read
remote data log info: ret=-2
2019-03-08 11:43:07.269631 7fa080cc7700  0 meta sync: ERROR:
RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.895192 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.046685 7fa080cc7700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.171277 7fa0870eb700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.171748 7fa0850e7700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.175867 7fa08a0f1700  0 meta sync: ERROR: can't remove
key:
bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
ret=-2
2019-03-08 11:43:08.176755 7fa0820e1700  0 data sync: ERROR: init sync on
whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed,
retcode=-2
2019-03-08 11:43:08.176872 7fa0820e1700  0 data sync: ERROR: init sync on
dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.176885 7fa093103700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.176925 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.177916 7fa0910ff700  0 meta sync: ERROR: can't remove
key:
bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
ret=-2
2019-03-08 11:43:08.178815 7fa08b0f3700  0 ERROR: failed to get bucket
instance info for
.bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.178847 7fa0820e1700  0 data sync: ERROR: failed to
retrieve bucket info for
bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.179492 7fa0820e1700  0 data sync: 

Re: [ceph-users] priorize degraged objects than misplaced

2019-03-11 Thread Fabio Abreu
I am looking  one problematic pg on my disaster scenario and look at
bellow :

root@monitor~# ceph pg ls-by-pool cinder_sata | grep 5.5b7
5.5b7   26911   29  53851   107644  29  11224818892853258
53258   active+recovering+undersized+degraded+remapped  2019-03-11
14:05:29.857657  95096'33589806  95169:37258027  [96,47,38]
96  [154]   15465986'27640790  2019-01-21 19:36:06.645070
65986'27640790  2019-01-21 19:36:06.645070

My problematic pg  has 3 osds acting and one another osd acting primary :

up  up_primary  acting  acting_primary
9  [96,47,38]  96  [154]154

If I compare with a good one we look this :

up_primary  acting  acting_primary
85  [85,102,143]85

The problematic pg scenario is a normal thing ?



Regards,
Fabio Abreu








On Mon, Mar 11, 2019 at 9:01 AM David Turner  wrote:

> Ceph has been getting better and better about prioritizing this sorry of
> recovery, but free of those optimizations are in Jewel, which had been out
> of the support cycle for about a year. You should look into upgrading to
> mimic where you should see a pretty good improvement on this sorry of
> prioritization.
>
> On Sat, Mar 9, 2019, 3:10 PM Fabio Abreu  wrote:
>
>> HI Everybody,
>>
>> I have a doubt about degraded objects in the Jewel 10.2.7 version, can I
>> priorize the degraded objects than misplaced?
>>
>> I asking this because I try simulate a disaster recovery scenario.
>>
>>
>> Thanks and best regards,
>> Fabio Abreu Reis
>> http://fajlinux.com.br
>> *Tel : *+55 21 98244-0161
>> *Skype : *fabioabreureis
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 
Atenciosamente,
Fabio Abreu Reis
http://fajlinux.com.br
*Tel : *+55 21 98244-0161
*Skype : *fabioabreureis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Host-local sub-pool?

2019-03-11 Thread Harry G. Coin

Hello all

I have a 'would be nice' use case need I'm wondering if ceph can 
handle.  The goal is to allow an otherwise ordinary ceph server with a 
little 'one-of' special purpose extra hardware that could provide at 
least some value when off-host networking is down to do so while still 
taking advantage of the locally ample ceph replication capability.   
Then when connectivity is restored to participate as a 'normal' host in 
the larger setup.  Is this doable?


Here's a little more setup detail but tl;dr you get the idea.

Suppose a typical ceph multi-server, multi-osd/server setup. One 
particular otherwise typical server that has osd's running on the bare 
metal os and is also one of a few with a ceph mon and mds running in a 
VM.  But, that one server is 'special' in that it has some task-specific 
hardware that a dedicated-purpose VM running on it uses, a task that has 
no need to reference off-host data for any reason.


Like the other servers, there are several osds (1/drive) running on that 
special server.  So within that host there's plenty of room for 
host-local replication, though the server be part of a multi-host setup.


Suppose owing to some mishap all the networking cables were disconnected 
from that special server.    Is there a ceph configuration allowing the 
task-specific VM to operate based on a ceph block device relying only on 
host-local osds -- without regard to off-host connectivity?  Yet 
providing normal host ceph participation when networking is normal?


I'd like to avoid adding some added non-ceph mirroring scheme for the 
special-purpose VM's block device and I'd like to avoid having to 
maintain yet another box for a small task.


Ideas appreciated!

Thanks

Harry Coin



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to identify pre-luminous rdb images

2019-03-11 Thread Denny Kreische


> Am 11.03.2019 um 12:21 schrieb Konstantin Shalygin :
> 
> 
>> Hello list,
>> 
>> I upgraded to mimic some time ago and want to make use of the upmap feature 
>> now.
>> But I can't do "ceph osd set-require-min-compat-client luminous" as there 
>> are still pre-luminous clients connected.
>> 
>> The cluster was originally created from jewel release.
>> 
>> When I run "ceph features", I see many connections from jewel clients though 
>> all systems have mimic installed and are rebooted since then.
>> 
>> 08:44 dk at mon03  
>> [fra]:~$ ceph features
>> [...]
>> "client": [
>> {
>> "features": "0x7010fb86aa42ada",
>> "release": "jewel",
>> "num": 70
>> },
>> {
>> "features": "0x3ffddff8eea4fffb",
>> "release": "luminous",
>> "num": 185
>> },
>> {
>> "features": "0x3ffddff8ffa4fffb",
>> "release": "luminous",
>> "num": 403
>> }
>> [...]
>> 
>> These client connections belong to mapped rbd images.
>> When I inspect the rbd images with "rbd info" I don't see any difference in 
>> format and features.
>> 
>> How can I determine which rbd images are affected and how can I transform 
>> them to luminous types if possible.
> 
> Do you have krbd clients? Because kernel clients still have 'jewel' release, 
> but upmap is supported. 
> 
that was the important hint.
we changed our kvm environment (ganeti) from kernelspace rbd to userspace rbd 
some time ago. but there were still VMs running from before. Now I just need to 
restart these VMs and all should be fine.

Thanks,
Denny
> 
> If so the kernel should be 4.13+ or EL 7.5. If that so you should append 
> --yes-i-really-mean-it as safe workaround.
> 
> 
> 
> 
> 
> k
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD OSD crashing after upgrade to 12.2.10

2019-03-11 Thread Eugen Block

Hi all,

we had some assistance with our SSD crash issue outside of this  
mailing list - which is not resolved yet  
(http://tracker.ceph.com/issues/38395) - but there's one thing I'd  
like to ask the list.


I noticed that a lot of the OSD crashes show a correlation to MON  
elections. For the last 18 OSD failures I count 7 MON elections  
happening right before the OSD failures are reported. But if I take  
into account that there's a grace period of 20 seconds, it seems as if  
some OSD failures could trigger a MON election, is that even possible?


The logs look like this:

---cut here---
2019-03-02 21:43:17.599452 mon.monitor02 mon.1 :6789/0 977222  
: cluster [INF] mon.monitor02 calling monitor election
2019-03-02 21:43:17.758506 mon.monitor01 mon.0 :6789/0  
1079594 : cluster [INF] mon.monitor01 calling monitor election
2019-03-02 21:43:22.938084 mon.monitor01 mon.0 :6789/0  
1079595 : cluster [INF] mon.monitor01 is new leader, mons  
monitor01,monitor02 in quorum (ranks 0,1)
2019-03-02 21:43:23.106667 mon.monitor01 mon.0 :6789/0  
1079600 : cluster [WRN] Health check failed: 1/3 mons down, quorum  
monitor01,monitor02 (MON_DOWN)
2019-03-02 21:43:23.180382 mon.monitor01 mon.0 :6789/0  
1079601 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum  
monitor01,monitor02
2019-03-02 21:43:27.454252 mon.monitor01 mon.0 :6789/0  
1079610 : cluster [INF] osd.20 failed (root=default,host=monitor03) (2  
reporters from different host after 20.000136 >= grace 20.00)

[...]
2019-03-04 10:06:35.743561 mon.monitor01 mon.0 :6789/0  
1164043 : cluster [INF] mon.monitor01 calling monitor election
2019-03-04 10:06:35.752565 mon.monitor02 mon.1 :6789/0  
1054674 : cluster [INF] mon.monitor02 calling monitor election
2019-03-04 10:06:35.835435 mon.monitor01 mon.0 :6789/0  
1164044 : cluster [INF] mon.monitor01 is new leader, mons  
monitor01,monitor02,monitor03 in quorum (ranks 0,1,2)
2019-03-04 10:06:35.701759 mon.monitor03 mon.2 :6789/0 287652  
: cluster [INF] mon.monitor03 calling monitor election
2019-03-04 10:06:35.954407 mon.monitor01 mon.0 :6789/0  
1164049 : cluster [INF] overall HEALTH_OK
2019-03-04 10:06:45.299686 mon.monitor01 mon.0 :6789/0  
1164057 : cluster [INF] osd.20 failed (root=default,host=monitor03) (2  
reporters from different host after 20.068848 >= grace 20.00)

[...]
---cut here---

These MON elections only happened when a OSD failure occured, no  
elections without OSD failures. Does this make sense to anybody? Any  
insights would be greatly appreciated.


Regards,
Eugen


Zitat von Igor Fedotov :


Hi Eugen,

looks like this isn't [1] but rather

https://tracker.ceph.com/issues/38049

and

https://tracker.ceph.com/issues/36541 (=  
https://tracker.ceph.com/issues/36638 for luminous)


Hence it's not fixed in 12.2.10, target release is 12.2.11


Also please note the patch allows to avoid new occurrences for the  
issue. But there some chances that inconsistencies caused by it  
earlier are still present in DB. And assertion might still happen  
(hopefully with less frequency).


So could you please run fsck for OSDs that were broken once and  
share the results?


Then we can decide if it makes sense to proceed with the repair.


Thanks,

Igor

On 2/7/2019 3:37 PM, Eugen Block wrote:

Hi list,

I found this thread [1] about crashing SSD OSDs, although that was  
about an upgrade to 12.2.7, we just hit (probably) the same issue  
after our update to 12.2.10 two days ago in a production cluster.

Just half an hour ago I saw one OSD (SSD) crashing (for the first time):

2019-02-07 13:02:07.682178 mon.host1 mon.0 :6789/0 109754 :  
cluster [INF] osd.10 failed (root=default,host=host1) (connection  
refused reported by osd.20)
2019-02-07 13:02:08.623828 mon.host1 mon.0 :6789/0 109771 :  
cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)


One minute later, the OSD was back online.
This is the stack trace reported in syslog:

---cut here---
2019-02-07T13:01:51.181027+01:00 host1 ceph-osd[1136505]: ***  
Caught signal (Aborted) **
2019-02-07T13:01:51.181232+01:00 host1 ceph-osd[1136505]:  in  
thread 7f75ce646700 thread_name:bstore_kv_final
2019-02-07T13:01:51.185873+01:00 host1 ceph-osd[1136505]:  ceph  
version 12.2.10-544-gb10c702661  
(b10c702661a31c8563b3421d6d664de93a0cb0e2) luminous (stable)
2019-02-07T13:01:51.186077+01:00 host1 ceph-osd[1136505]:  1:  
(()+0xa587d9) [0x560b921cc7d9]
2019-02-07T13:01:51.186226+01:00 host1 ceph-osd[1136505]:  2:  
(()+0x10b10) [0x7f75d8386b10]
2019-02-07T13:01:51.186368+01:00 host1 ceph-osd[1136505]:  3:  
(gsignal()+0x37) [0x7f75d73508d7]
2019-02-07T13:01:51.186773+01:00 host1 ceph-osd[1136505]:  4:  
(abort()+0x13a) [0x7f75d7351caa]
2019-02-07T13:01:51.186906+01:00 host1 ceph-osd[1136505]:  5:  
(ceph::__ceph_assert_fail(char const*, char const*, int, char  
const*)+0x280) [0x560b922096d0]
2019-02-07T13:01:51.187027+01:00 host1 ceph-osd[1136505]:  6:  
(interval_setunsigned long, std::less,  

Re: [ceph-users] How to just delete PGs stuck incomplete on EC pool

2019-03-11 Thread Maks Kowalik
Hello Daniel,

I think you will not avoid a tedious job of manual cleanup...
Or the other way is to delete the whole pool (ID 18).

The manual cleanup means to take all the OSDs from "probing_osds", stop
them one by one and remove the shards of groups 18.1e and 18.c (using
ceph-objstore-tool).
Afterwards you need to restart these OSDs with
osd_find_best_info_ignore_history_les
set to true.

Kind regards,
Maks Kowalik





pon., 4 mar 2019 o 17:05 Daniel K  napisał(a):

> Thanks for the suggestions.
>
> I've tried both -- setting osd_find_best_info_ignore_history_les = true and
> restarting all OSDs,  as well as 'ceph osd-force-create-pg' -- but both
> still show incomplete
>
> PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete
> pg 18.c is incomplete, acting [32,48,58,40,13,44,61,59,30,27,43,37]
> (reducing pool ec84-hdd-zm min_size from 8 may help; search ceph.com/docs
> for 'incomplete')
> pg 18.1e is incomplete, acting [50,49,41,58,60,46,52,37,34,63,57,16]
> (reducing pool ec84-hdd-zm min_size from 8 may help; search ceph.com/docs
> for 'incomplete')
>
>
> The OSDs in down_osds_we_would_probe have already been marked lost
>
> When I ran  the force-create-pg command, they went to peering for a few
> seconds, but then went back incomplete.
>
> Updated ceph pg 18.1e query https://pastebin.com/XgZHvJXu
> Updated ceph pg 18.c query https://pastebin.com/N7xdQnhX
>
> Any other suggestions?
>
>
>
> Thanks again,
>
> Daniel
>
>
>
> On Sat, Mar 2, 2019 at 3:44 PM Paul Emmerich 
> wrote:
>
>> On Sat, Mar 2, 2019 at 5:49 PM Alexandre Marangone
>>  wrote:
>> >
>> > If you have no way to recover the drives, you can try to reboot the
>> OSDs with `osd_find_best_info_ignore_history_les = true` (revert it
>> afterwards), you'll lose data. If after this, the PGs are down, you can
>> mark the OSDs blocking the PGs from become active lost.
>>
>> this should work for PG 18.1e, but not for 18.c. Try running "ceph osd
>> force-create-pg " to reset the PGs instead.
>> Data will obviously be lost afterwards.
>>
>> Paul
>>
>> >
>> > On Sat, Mar 2, 2019 at 6:08 AM Daniel K  wrote:
>> >>
>> >> They all just started having read errors. Bus resets. Slow reads.
>> Which is one of the reasons the cluster didn't recover fast enough to
>> compensate.
>> >>
>> >> I tried to be mindful of the drive type and specifically avoided the
>> larger capacity Seagates that are SMR. Used 1 SM863 for every 6 drives for
>> the WAL.
>> >>
>> >> Not sure why they failed. The data isn't critical at this point, just
>> need to get the cluster back to normal.
>> >>
>> >> On Sat, Mar 2, 2019, 9:00 AM  wrote:
>> >>>
>> >>> Did they break, or did something went wronng trying to replace them?
>> >>>
>> >>> Jespe
>> >>>
>> >>>
>> >>>
>> >>> Sent from myMail for iOS
>> >>>
>> >>>
>> >>> Saturday, 2 March 2019, 14.34 +0100 from Daniel K > >:
>> >>>
>> >>> I bought the wrong drives trying to be cheap. They were 2TB WD Blue
>> 5400rpm 2.5 inch laptop drives.
>> >>>
>> >>> They've been replace now with HGST 10K 1.8TB SAS drives.
>> >>>
>> >>>
>> >>>
>> >>> On Sat, Mar 2, 2019, 12:04 AM  wrote:
>> >>>
>> >>>
>> >>>
>> >>> Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com <
>> satha...@gmail.com>:
>> >>>
>> >>> 56 OSD, 6-node 12.2.5 cluster on Proxmox
>> >>>
>> >>> We had multiple drives fail(about 30%) within a few days of each
>> other, likely faster than the cluster could recover.
>> >>>
>> >>>
>> >>> Hov did so many drives break?
>> >>>
>> >>> Jesper
>> >>
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack with Ceph RDMA

2019-03-11 Thread Lazuardi Nasution
Hi David,

I know the different between cluster network and public network. I usually
split them to vlans for statistic, isolation and priority. What I need to
know is about what kind of RDMA messaging Ceph do. Is it between OSDs or
related to other daemons and clients too?

Best regards,


On Mon, Mar 11, 2019, 19:15 David Turner  wrote:

> I can't speak to the rdma portion. But to clear up what each of these
> does... the cluster network is only traffic between the osds for
> replicating writes, reading EC data, as well as backfilling and recovery
> io. Mons, mds, rgw, and osds talking with clients all happen on the public
> network. The general consensus has been to not split the two networks,
> except for maybe by vlans for potential statistics and graphing. Even if
> you were running out of bandwidth, just upgrade the dual interface instead
> of segregating them physically.
>
> On Sat, Mar 9, 2019, 11:10 AM Lazuardi Nasution 
> wrote:
>
>> Hi,
>>
>> I'm looking for information about where is the RDMA messaging of Ceph
>> happen, on cluster network, public network or both (it seem both, CMIIW)?
>> I'm talking about configuration of ms_type, ms_cluster_type and
>> ms_public_type.
>>
>> In case of OpenStack integration with RBD, which of above three is
>> possible? In this case, should I still separate cluster network and public
>> network?
>>
>> Best regards,
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OpenStack with Ceph RDMA

2019-03-11 Thread David Turner
I can't speak to the rdma portion. But to clear up what each of these
does... the cluster network is only traffic between the osds for
replicating writes, reading EC data, as well as backfilling and recovery
io. Mons, mds, rgw, and osds talking with clients all happen on the public
network. The general consensus has been to not split the two networks,
except for maybe by vlans for potential statistics and graphing. Even if
you were running out of bandwidth, just upgrade the dual interface instead
of segregating them physically.

On Sat, Mar 9, 2019, 11:10 AM Lazuardi Nasution 
wrote:

> Hi,
>
> I'm looking for information about where is the RDMA messaging of Ceph
> happen, on cluster network, public network or both (it seem both, CMIIW)?
> I'm talking about configuration of ms_type, ms_cluster_type and
> ms_public_type.
>
> In case of OpenStack integration with RBD, which of above three is
> possible? In this case, should I still separate cluster network and public
> network?
>
> Best regards,
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] priorize degraged objects than misplaced

2019-03-11 Thread David Turner
Ceph has been getting better and better about prioritizing this sorry of
recovery, but free of those optimizations are in Jewel, which had been out
of the support cycle for about a year. You should look into upgrading to
mimic where you should see a pretty good improvement on this sorry of
prioritization.

On Sat, Mar 9, 2019, 3:10 PM Fabio Abreu  wrote:

> HI Everybody,
>
> I have a doubt about degraded objects in the Jewel 10.2.7 version, can I
> priorize the degraded objects than misplaced?
>
> I asking this because I try simulate a disaster recovery scenario.
>
>
> Thanks and best regards,
> Fabio Abreu Reis
> http://fajlinux.com.br
> *Tel : *+55 21 98244-0161
> *Skype : *fabioabreureis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH ISCSI Gateway

2019-03-11 Thread David Turner
The problem with clients on osd nodes is for kernel clients only. That's
true of krbd and the kernel client for cephfs. The only other reason not to
run any other Ceph daemon in the same node as osds is resource contention
if you're running at higher CPU and memory utilizations.

On Sat, Mar 9, 2019, 10:15 PM Mike Christie  wrote:

> On 03/07/2019 09:22 AM, Ashley Merrick wrote:
> > Been reading into the gateway, and noticed it’s been mentioned a few
> > times it can be installed on OSD servers.
> >
> > I am guessing therefore there be no issues like is sometimes mentioned
> > when using kRBD on a OSD node apart from the extra resources required
> > from the hardware.
> >
>
> That is correct. You might have a similar issue if you were to run the
> iscsi gw/target, OSD and then also run the iscsi initiator that logs
> into the iscsi gw/target all on the same node. I don't think any use
> case like that has ever come up though.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to identify pre-luminous rdb images

2019-03-11 Thread Konstantin Shalygin

Hello list,

I upgraded to mimic some time ago and want to make use of the upmap feature now.
But I can't do "ceph osd set-require-min-compat-client luminous" as there are 
still pre-luminous clients connected.

The cluster was originally created from jewel release.

When I run "ceph features", I see many connections from jewel clients though 
all systems have mimic installed and are rebooted since then.

08:44dk at mon03    
[fra]:~$ ceph features
[...]
 "client": [
 {
 "features": "0x7010fb86aa42ada",
 "release": "jewel",
 "num": 70
 },
 {
 "features": "0x3ffddff8eea4fffb",
 "release": "luminous",
 "num": 185
 },
 {
 "features": "0x3ffddff8ffa4fffb",
 "release": "luminous",
 "num": 403
 }
[...]

These client connections belong to mapped rbd images.
When I inspect the rbd images with "rbd info" I don't see any difference in 
format and features.

How can I determine which rbd images are affected and how can I transform them 
to luminous types if possible.


Do you have krbd clients? Because kernel clients still have 'jewel' 
release, but upmap is supported.


If so the kernel should be 4.13+ or EL 7.5. If that so you should append 
--yes-i-really-mean-it as safe workaround.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread Pritha Srivastava
Hi Myxingkong,

http://docs.ceph.com/docs/nautilus/mgr/restful/ is for the Manager module
of ceph. This is not related to rgw.

Please try attaching a policy by configuring s3curl tool.

Thanks,
Pritha

On Mon, Mar 11, 2019 at 3:43 PM myxingkong  wrote:

> Hi Pritha:
>
> This is the documentation for configuring restful modules:
> http://docs.ceph.com/docs/nautilus/mgr/restful/
>
> The command given according to the official documentation is to attach the
> permission policy through the REST API.
>
> This is the documentation for STS lite:
> http://docs.ceph.com/docs/nautilus/radosgw/STSLite/
>
> My version of ceph is: ceph version 14.1.0
> (adfd524c32325562f61c055a81dba4cb1b117e84) nautilus (dev)
>
> Thanks,
> myxingkong
> On 3/11/2019 18:06,Pritha Srivastava
>  wrote:
>
> Hi Myxingkong,
>
> Can you explain what you mean by 'enabling restful modules', particularly
> which document are you referring to?
>
> Right now there is no other way to attach a permission policy to a user.
>
> There is work in progress for adding functionality to RGW using which such
> calls can be scripted using boto.
>
> Thanks,
> Pritha
>
> On Mon, Mar 11, 2019 at 3:21 PM myxingkong  wrote:
>
>> Hello:
>>
>> I want to use the GetSessionToken method to get the temporary
>> credentials, but according to the answer given in the official
>> documentation, I need to attach a permission policy to the user before I
>> can use the GetSessionToken method.
>>
>> This is the command for the additional permission policy provided by the
>> official documentation:
>>
>> s3curl.pl --debug --id admin -- -s -v -X POST "
>> http://localhost:8000/?Action=PutUserPolicy=Policy1=TESTER1=\{\
>> "Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08"
>>
>>
>> This requires enabling restful modules to execute this command.
>>
>> I configured the restful module according to the documentation, but
>> without success, I was unable to configure the SSL certificate.
>>
>> ceph config-key set mgr/restful/crt -i restful.crt
>>
>> WARNING: it looks like you might be trying to set a ceph-mgr module
>> configuration key. Since Ceph 13.0.0 (Mimic), mgr module configuration is
>> done with `config set`, and new values set using `config-key set` will be
>> ignored.
>> set mgr/restful/crt
>>
>> Can someone tell me if there is a way to configure a restful module's
>> certificate, or if there is another way to attach permission policies to
>> users?
>>
>> Thanks,
>> myxingkong
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread myxingkong







Hi Pritha:This is the documentation for configuring restful modules:http://docs.ceph.com/docs/nautilus/mgr/restful/The command given according to the official documentation is to attach the permission policy through the REST API.This is the documentation for STS lite:http://docs.ceph.com/docs/nautilus/radosgw/STSLite/My version of ceph is: ceph version 14.1.0 (adfd524c32325562f61c055a81dba4cb1b117e84) nautilus (dev)




Thanks,myxingkong







On 3/11/2019 18:06,Pritha Srivastava wrote: 


Hi Myxingkong,Can you explain what you mean by 'enabling restful modules', particularly which document are you referring to?Right now there is no other way to attach a permission policy to a user. There is work in progress for adding functionality to RGW using which such calls can be scripted using boto.Thanks,PrithaOn Mon, Mar 11, 2019 at 3:21 PM myxingkong  wrote:






Hello:I want to use the GetSessionToken method to get the temporary credentials, but according to the answer given in the official documentation, I need to attach a permission policy to the user before I can use the GetSessionToken method.This is the command for the additional permission policy provided by the official documentation:s3curl.pl --debug --id admin -- -s -v -X POST "http://localhost:8000/?Action="">"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08"This requires enabling restful modules to execute this command.I configured the restful module according to the documentation, but without success, I was unable to configure the SSL certificate.ceph config-key set mgr/restful/crt -i restful.crtWARNING: it looks like you might be trying to set a ceph-mgr module configuration key. Since Ceph 13.0.0 (Mimic), mgr module configuration is done with `config set`, and new values set using `config-key set` will be ignored.set mgr/restful/crtCan someone tell me if there is a way to configure a restful module's certificate, or if there is another way to attach permission policies to users?Thanks,myxingkong


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to attach permission policy to user?

2019-03-11 Thread Pritha Srivastava
Hi Myxingkong,

Can you explain what you mean by 'enabling restful modules', particularly
which document are you referring to?

Right now there is no other way to attach a permission policy to a user.

There is work in progress for adding functionality to RGW using which such
calls can be scripted using boto.

Thanks,
Pritha

On Mon, Mar 11, 2019 at 3:21 PM myxingkong  wrote:

> Hello:
>
> I want to use the GetSessionToken method to get the temporary credentials,
> but according to the answer given in the official documentation, I need to
> attach a permission policy to the user before I can use the GetSessionToken
> method.
>
> This is the command for the additional permission policy provided by the
> official documentation:
>
> s3curl.pl --debug --id admin -- -s -v -X POST "
> http://localhost:8000/?Action=PutUserPolicy=Policy1=TESTER1=\{\
> "Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Deny\",\"Action\":\"s3:*\",\"Resource\":\[\"*\"\],\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\},\{\"Effect\":\"Allow\",\"Action\":\"sts:GetSessionToken\",\"Resource\":\"*\",\"Condition\":\{\"BoolIfExists\":\{\"sts:authentication\":\"false\"\}\}\}\]\}=2010-05-08"
>
>
> This requires enabling restful modules to execute this command.
>
> I configured the restful module according to the documentation, but
> without success, I was unable to configure the SSL certificate.
>
> ceph config-key set mgr/restful/crt -i restful.crt
>
> WARNING: it looks like you might be trying to set a ceph-mgr module
> configuration key. Since Ceph 13.0.0 (Mimic), mgr module configuration is
> done with `config set`, and new values set using `config-key set` will be
> ignored.
> set mgr/restful/crt
>
> Can someone tell me if there is a way to configure a restful module's
> certificate, or if there is another way to attach permission policies to
> users?
>
> Thanks,
> myxingkong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to identify pre-luminous rdb images

2019-03-11 Thread Denny Kreische
Hello list,

I upgraded to mimic some time ago and want to make use of the upmap feature now.
But I can't do "ceph osd set-require-min-compat-client luminous" as there are 
still pre-luminous clients connected.

The cluster was originally created from jewel release.

When I run "ceph features", I see many connections from jewel clients though 
all systems have mimic installed and are rebooted since then.

08:44 dk@mon03 [fra]:~$ ceph features
[...]
"client": [
{
"features": "0x7010fb86aa42ada",
"release": "jewel",
"num": 70
},
{
"features": "0x3ffddff8eea4fffb",
"release": "luminous",
"num": 185
},
{
"features": "0x3ffddff8ffa4fffb",
"release": "luminous",
"num": 403
}
[...]

These client connections belong to mapped rbd images.
When I inspect the rbd images with "rbd info" I don't see any difference in 
format and features.

How can I determine which rbd images are affected and how can I transform them 
to luminous types if possible.

Thanks,
Denny

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to attach permission policy to user?

2019-03-11 Thread myxingkong





Hello:I want to use the GetSessionToken method to get the temporary credentials, but according to the answer given in the official documentation, I need to attach a permission policy to the user before I can use the GetSessionToken method.This is the command for the additional permission policy provided by the official documentation:s3curl.pl --debug --id admin -- -s -v -X POST "http://localhost:8000/?Action="">This requires enabling restful modules to execute this command.I configured the restful module according to the documentation, but without success, I was unable to configure the SSL certificate.ceph config-key set mgr/restful/crt -i restful.crtWARNING: it looks like you might be trying to set a ceph-mgr module configuration key. Since Ceph 13.0.0 (Mimic), mgr module configuration is done with `config set`, and new values set using `config-key set` will be ignored.set mgr/restful/crtCan someone tell me if there is a way to configure a restful module's certificate, or if there is another way to attach permission policies to users?Thanks,myxingkong


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time

2019-03-11 Thread mart.v

Well, the drive supports trim:

# hdparm -I /dev/sdd|grep TRIM

           *    Data Set Management TRIM supported (limit 8 blocks)

           *    Deterministic read ZEROs after TRIM




But fstrim or discard is not enabled (I have checked both mount options and
services/cron). I'm using defaults from Proxmox, OSDs are created like this:
 ceph-volume lvm create --bluestore --data /dev/sdX




Best,
Martin
-- Původní e-mail --
Od: Matthew H 
Komu: Paul Emmerich , Massimo Sgaravatto 
Datum: 28. 2. 2019 10:51:36
Předmět: Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time
"

Is fstrim or discard enabled for these SSD's? If so, how did you enable it?







I've seen similiar issues with poor controllers on SSDs. They tend to block
I/O when trim kicks off.




Thanks,





From: ceph-users  on behalf of Paul
Emmerich 
Sent: Friday, February 22, 2019 9:04 AM
To: Massimo Sgaravatto
Cc: Ceph Users
Subject: Re: [ceph-users] REQUEST_SLOW across many OSDs at the same time 
 



Bad SSDs can also cause this. Which SSD are you using?

Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io
(https://croit.io)

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io(http://www.croit.io)
Tel: +49 89 1896585 90

On Fri, Feb 22, 2019 at 2:53 PM Massimo Sgaravatto
 wrote:
>
> A couple of hints to debug the issue (since I had to recently debug a 
problem with the same symptoms):
>
> - As far as I understand the reported 'implicated osds' are only the
primary ones. In the log of the osds you should find also the relevant pg 
number, and with this information you can get all the involved OSDs. This 
might be useful e.g. to see if a specific OSD node is always involved. This
was my case (a the problem was with the patch cable connecting the node)
>
> - You can use the "ceph daemon osd.x dump_historic_ops" command to debug
some of these slow requests (to see which events take much time)
>
> Cheers, Massimo
>
> On Fri, Feb 22, 2019 at 10:28 AM mart.v  wrote:
>>
>> Hello everyone,
>>
>> I'm experiencing a strange behaviour. My cluster is relatively small (43
OSDs, 11 nodes), running Ceph 12.2.10 (and Proxmox 5). Nodes are connected
via 10 Gbit network (Nexus 6000). Cluster is mixed (SSD and HDD), but with
different pools. Descibed error is only on the SSD part of the cluster.
>>
>> I noticed that few times a day the cluster slows down a bit and I have 
discovered this in logs:
>>
>> 2019-02-22 08:21:20.064396 mon.node1 mon.0 172.16.254.101:6789/0 1794159
: cluster [WRN] Health check failed: 27 slow requests are blocked > 32 sec.
Implicated osds 10,22,33 (REQUEST_SLOW)
>> 2019-02-22 08:21:26.589202 mon.node1 mon.0 172.16.254.101:6789/0 1794169
: cluster [WRN] Health check update: 199 slow requests are blocked > 32 sec.
Implicated osds 0,4,5,6,7,8,9,10,12,16,17,19,20,21,22,25,26,33,41 (REQUEST_
SLOW)
>> 2019-02-22 08:21:32.655671 mon.node1 mon.0 172.16.254.101:6789/0 1794183
: cluster [WRN] Health check update: 448 slow requests are blocked > 32 sec.
Implicated osds 0,3,4,5,6,7,8,9,10,12,15,16,17,19,20,21,22,24,25,26,33,41 
(REQUEST_SLOW)
>> 2019-02-22 08:21:38.744210 mon.node1 mon.0 172.16.254.101:6789/0 1794210
: cluster [WRN] Health check update: 388 slow requests are blocked > 32 sec.
Implicated osds 4,8,10,16,24,33 (REQUEST_SLOW)
>> 2019-02-22 08:21:42.790346 mon.node1 mon.0 172.16.254.101:6789/0 1794214
: cluster [INF] Health check cleared: REQUEST_SLOW (was: 18 slow requests 
are blocked > 32 sec. Implicated osds 8,16)
>>
>> "ceph health detail" shows nothing more
>>
>> It is happening through the whole day and the times can't be linked to 
any read or write intensive task (e.g. backup). I also tried to disable 
scrubbing, but it kept on going. These errors were not there since
beginning, but unfortunately I cannot track the day they started (it is 
beyond my logs).
>>
>> Any ideas?
>>
>> Thank you!
>> Martin
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com)




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
"___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph crushmap re-arrange with minimum rebalancing?

2019-03-11 Thread Wido den Hollander


On 3/8/19 4:17 AM, Pardhiv Karri wrote:
> Hi,
> 
> We have a ceph cluster with rack as failure domain but the racks are so
> imbalanced due to which we are not able to utilize the maximum of
> storage allocated as some odd's in small racks are filling up too fast
> and causing ceph to go into warning state and near_full_ratio being
> triggered.
> 
> We are planning to restructure the entire crushmap with Rows being the
> failure domain instead of Racks so that each row will have the same
> number of hosts irrespective of how many Racks we have in each Row. We
> are using 3X replica in our ceph cluster
> 
> Current:
> Rack1 has 4 hosts
> Rack 2 has 2 hosts
> Rack 3 has 3 hosts
> Rack 4 has 6 hosts
> Rack 5 has 7 hosts
> Rack 6 has 2 hosts
> Rack 7 has 3 hosts
> 
> Future: With each  Row having 9 hosts,
> 
> Row_A with Rack 1 + Rack 2 + Rack 3 = 9 Hosts
> Row_B with Rack 4 + Rack 7 = 9 Hosts
> Row_C with Rack 5 + Rack 6 = 9 Hosts
> 
> The question is how can we safely do that without triggering too much
> rebalance?
> I can add empty rows to the crushmap and change failure domain to row
> without any rebalancing but when I move a rack under a row it is
> triggering 50-60% of rebalance and even the cluster is going completely
> out of (error: connecting to cluster). How can we avoid it?
> 

The cluster going down is not correct, even if you make such a big
change you shouldn't see the cluster go down.

It's normal that such operations trigger a large data migration. You
want to change the topology of the cluster and thus it moves data.

Wido

> Thanks, 
> *Pardhiv Karri*
> "Rise and Rise again untilLAMBSbecome LIONS" 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com