Re: [ceph-users] Unable to start rgw after upgrade from hammer to jewel

2017-03-03 Thread K K

Hi

I faced in those error. After some research I commentded out my custom setting:
#rgw zonegroup root pool = se.root
#rgw zone root pool = se.root
and after those rgw successfully started. Now setting are placed in default 
pool: .rgw.root

>Суббота,  4 марта 2017, 6:40 +05:00 от Gagandeep Arora 
>:
>
>Permissions are correct see below:
>
>[root@radosgw1 radosgw]# ls -l /etc/ceph
>total 16
>-rw-r--r--. 1 ceph ceph   63 Nov 19  2015 cephprod.client.admin.keyring
>-rw-r--r--. 1 ceph ceph  122 Nov 24  2015 cephprod.client.radosgw.keyring
>-rw-r--r--. 1 ceph ceph 1049 Nov 20  2015 cephprod.conf
>lrwxrwxrwx. 1 ceph ceph   29 Nov 19  2015 keyring -> 
>cephprod.client.admin.keyring
>drwxr-xr-x. 2 ceph ceph   41 Jun 14  2016 radosgw-agent
>-rwxr-xr-x. 1 ceph ceph   92 Dec 10 06:36 rbdmap
>
>Regards,
>Gagan
>
>On Sat, Mar 4, 2017 at 11:02 AM, Roger Brown  < rogerpbr...@gmail.com > wrote:
>>My first thought is ceph doesn't have permissions to the rados keyring file.
>>eg. 
>>[root@nuc1 ~]# ls -l /etc/ceph/ceph.client.radosgw.keyring
>>-rw-rw+ 1 root root 73 Feb  8 20:40 /etc/ceph/ceph.client.radosgw.keyring
>>
>>You could give it read permission or be clever with setfacl, eg.
>>setfacl -m u:ceph:r /etc/ceph/ceph.client.radosgw.keyring
>>
>>
>>On Fri, Mar 3, 2017 at 5:57 PM Gagandeep Arora < aroragaga...@gmail.com > 
>>wrote:
>>>Hi all,
>>>
>>>Unable to start radosgw after upgrading hammer(0.94.10) to jewel(10.2.5). 
>>>Please see the following log. Can someone help please?
>>>
>>># cat cephprod-client.radosgw.gps-prod-1.log
>>>2017-03-04 10:35:10.459830 7f24316189c0  0 set uid:gid to 167:167 (ceph:ceph)
>>>2017-03-04 10:35:10.459883 7f24316189c0  0 ceph version 10.2.5 
>>>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10561
>>>2017-03-04 10:35:10.487863 7f24316189c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:10.488220 7f24316189c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:10.490991 7f24316189c0  0 Error could not update zonegroup 
>>>gps: (1) Operation not permitted
>>>2017-03-04 10:35:10.491008 7f24316189c0 -1 failed converting regionmap: (1) 
>>>Operation not permitted
>>>2017-03-04 10:35:10.492683 7f24316189c0 -1 Couldn't init storage provider 
>>>(RADOS)
>>>2017-03-04 10:35:10.768690 7f710b4a59c0  0 set uid:gid to 167:167 (ceph:ceph)
>>>2017-03-04 10:35:10.768712 7f710b4a59c0  0 ceph version 10.2.5 
>>>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10586
>>>2017-03-04 10:35:10.796166 7f710b4a59c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:10.796512 7f710b4a59c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:10.799167 7f710b4a59c0  0 Error could not update zonegroup 
>>>gps: (1) Operation not permitted
>>>2017-03-04 10:35:10.799184 7f710b4a59c0 -1 failed converting regionmap: (1) 
>>>Operation not permitted
>>>2017-03-04 10:35:10.800461 7f710b4a59c0 -1 Couldn't init storage provider 
>>>(RADOS)
>>>2017-03-04 10:35:11.017386 7f484298e9c0  0 set uid:gid to 167:167 (ceph:ceph)
>>>2017-03-04 10:35:11.017416 7f484298e9c0  0 ceph version 10.2.5 
>>>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10610
>>>2017-03-04 10:35:11.045061 7f484298e9c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:11.045421 7f484298e9c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:11.048360 7f484298e9c0  0 Error could not update zonegroup 
>>>gps: (1) Operation not permitted
>>>2017-03-04 10:35:11.048390 7f484298e9c0 -1 failed converting regionmap: (1) 
>>>Operation not permitted
>>>2017-03-04 10:35:11.049755 7f484298e9c0 -1 Couldn't init storage provider 
>>>(RADOS)
>>>2017-03-04 10:35:11.268341 7fad08c2e9c0  0 set uid:gid to 167:167 (ceph:ceph)
>>>2017-03-04 10:35:11.268374 7fad08c2e9c0  0 ceph version 10.2.5 
>>>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10634
>>>2017-03-04 10:35:11.298192 7fad08c2e9c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:11.298520 7fad08c2e9c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:11.301236 7fad08c2e9c0  0 Error could not update zonegroup 
>>>gps: (1) Operation not permitted
>>>2017-03-04 10:35:11.301259 7fad08c2e9c0 -1 failed converting regionmap: (1) 
>>>Operation not permitted
>>>2017-03-04 10:35:11.302522 7fad08c2e9c0 -1 Couldn't init storage provider 
>>>(RADOS)
>>>2017-03-04 10:35:11.518008 7f85204f19c0  0 set uid:gid to 167:167 (ceph:ceph)
>>>2017-03-04 10:35:11.518036 7f85204f19c0  0 ceph version 10.2.5 
>>>(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10658
>>>2017-03-04 10:35:11.545524 7f85204f19c0  0 error in read_id for id  : (2) No 
>>>such file or directory
>>>2017-03-04 10:35:11.545847 7f85204f19c0  0 error in read_id for id  : (2) No 
>>>such file or directory

[ceph-users] ceph activation error

2017-03-03 Thread gjprabu
Hi Team,



  I am installing new ceph setup(jewel) and here while activating OSD 
its throughing below error.



  I am using partition based osd like /home/osd1 not a entire disk. 
Earlier installation one month back all are working fine but this time i 
getting error like below.





[root@cphadmin mycluster]# ceph-deploy osd activate cphosd1:/home/osd1 
cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.37): /usr/bin/ceph-deploy osd activate 
cphosd1:/home/osd1 cphosd2:/home/osd2 cphosd3:/home/osd3

[ceph_deploy.cli][INFO  ] ceph-deploy options:

[ceph_deploy.cli][INFO  ]  username  : None

[ceph_deploy.cli][INFO  ]  verbose   : False

[ceph_deploy.cli][INFO  ]  overwrite_conf: False

[ceph_deploy.cli][INFO  ]  subcommand: activate

[ceph_deploy.cli][INFO  ]  quiet : False

[ceph_deploy.cli][INFO  ]  cd_conf   : 
ceph_deploy.conf.cephdeploy.Conf instance at 0x7ff270353fc8

[ceph_deploy.cli][INFO  ]  cluster   : ceph

[ceph_deploy.cli][INFO  ]  func  : function osd at 
0x7ff2703492a8

[ceph_deploy.cli][INFO  ]  ceph_conf : None

[ceph_deploy.cli][INFO  ]  default_release   : False

[ceph_deploy.cli][INFO  ]  disk  : [('cphosd1', 
'/home/osd1', None), ('cphosd2', '/home/osd2', None), ('cphosd3', '/home/osd3', 
None)]

[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks cphosd1:/home/osd1: 
cphosd2:/home/osd2: cphosd3:/home/osd3:

[cphosd1][DEBUG ] connected to host: cphosd1

[cphosd1][DEBUG ] detect platform information from remote host

[cphosd1][DEBUG ] detect machine type

[cphosd1][DEBUG ] find the location of an executable

[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.1.1503 Core

[ceph_deploy.osd][DEBUG ] activating host cphosd1 disk /home/osd1

[ceph_deploy.osd][DEBUG ] will use init type: systemd

[cphosd1][DEBUG ] find the location of an executable

[cphosd1][INFO  ] Running command: /usr/sbin/ceph-disk -v activate --mark-init 
systemd --mount /home/osd1

[cphosd1][WARNIN] main_activate: path = /home/osd1

[cphosd1][WARNIN] activate: Cluster uuid is 62b4f8c7-c00c-48d0-8262-549c9ef6074c

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph-osd --cluster=ceph 
--show-config-value=fsid

[cphosd1][WARNIN] activate: Cluster name is ceph

[cphosd1][WARNIN] activate: OSD uuid is 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] allocate_osd_id: Allocating OSD id...

[cphosd1][WARNIN] command: Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd 
create --concise 241b30d8-b2ba-4380-81f8-2e30e6913bb2

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/whoami.22462.tmp

[cphosd1][WARNIN] activate: OSD id is 0

[cphosd1][WARNIN] activate: Initializing OSD...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o 
/home/osd1/activate.monmap

[cphosd1][WARNIN] got monmap epoch 2

[cphosd1][WARNIN] command: Running command: /usr/bin/timeout 300 ceph-osd 
--cluster ceph --mkfs --mkkey -i 0 --monmap /home/osd1/activate.monmap 
--osd-data /home/osd1 --osd-journal /home/osd1/journal --osd-uuid 
241b30d8-b2ba-4380-81f8-2e30e6913bb2 --keyring /home/osd1/keyring --setuser 
ceph --setgroup ceph

[cphosd1][WARNIN] activate: Marking with init system systemd

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/systemd

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/systemd

[cphosd1][WARNIN] activate: Authorizing OSD key...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/ceph --cluster 
ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /home/osd1/keyring 
osd allow * mon allow profile osd

[cphosd1][WARNIN] added key for osd.0

[cphosd1][WARNIN] command: Running command: /usr/sbin/restorecon -R 
/home/osd1/active.22462.tmp

[cphosd1][WARNIN] command: Running command: /usr/bin/chown -R ceph:ceph 
/home/osd1/active.22462.tmp

[cphosd1][WARNIN] activate: ceph osd.0 data dir is ready at /home/osd1

[cphosd1][WARNIN] activate_dir: Creating symlink /var/lib/ceph/osd/ceph-0 - 
/home/osd1

[cphosd1][WARNIN] start_daemon: Starting ceph osd.0...

[cphosd1][WARNIN] command_check_call: Running command: /usr/bin/systemctl 
enable ceph-osd@0

[cphosd1][WARNIN] Created symlink from 
/etc/systemd/system/ceph-osd.target.wants/ceph-osd@0.service to 
/usr/lib/systemd/system/ceph-osd@.service.

[cphosd1][WARNIN] 

Re: [ceph-users] Unable to start rgw after upgrade from hammer to jewel

2017-03-03 Thread Gagandeep Arora
Permissions are correct see below:

[root@radosgw1 radosgw]# ls -l /etc/ceph
total 16
-rw-r--r--. 1 ceph ceph   63 Nov 19  2015 cephprod.client.admin.keyring
-rw-r--r--. 1 ceph ceph  122 Nov 24  2015 cephprod.client.radosgw.keyring
-rw-r--r--. 1 ceph ceph 1049 Nov 20  2015 cephprod.conf
lrwxrwxrwx. 1 ceph ceph   29 Nov 19  2015 keyring ->
cephprod.client.admin.keyring
drwxr-xr-x. 2 ceph ceph   41 Jun 14  2016 radosgw-agent
-rwxr-xr-x. 1 ceph ceph   92 Dec 10 06:36 rbdmap

Regards,
Gagan

On Sat, Mar 4, 2017 at 11:02 AM, Roger Brown  wrote:

> My first thought is ceph doesn't have permissions to the rados keyring
> file.
> eg.
> [root@nuc1 ~]# ls -l /etc/ceph/ceph.client.radosgw.keyring
> -rw-rw+ 1 root root 73 Feb  8 20:40 /etc/ceph/ceph.client.radosgw.
> keyring
>
> You could give it read permission or be clever with setfacl, eg.
> setfacl -m u:ceph:r /etc/ceph/ceph.client.radosgw.keyring
>
>
> On Fri, Mar 3, 2017 at 5:57 PM Gagandeep Arora 
> wrote:
>
>> Hi all,
>>
>> Unable to start radosgw after upgrading hammer(0.94.10) to jewel(10.2.5).
>> Please see the following log. Can someone help please?
>>
>> # cat cephprod-client.radosgw.gps-prod-1.log
>> 2017-03-04 10:35:10.459830 7f24316189c0  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-03-04 10:35:10.459883 7f24316189c0  0 ceph version 10.2.5 (
>> c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10561
>> 2017-03-04 10:35:10.487863 7f24316189c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:10.488220 7f24316189c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:10.490991 7f24316189c0  0 Error could not update
>> zonegroup gps: (1) Operation not permitted
>> 2017-03-04 10:35:10.491008 7f24316189c0 -1 failed converting regionmap:
>> (1) Operation not permitted
>> 2017-03-04 10:35:10.492683 7f24316189c0 -1 Couldn't init storage provider
>> (RADOS)
>> 2017-03-04 10:35:10.768690 7f710b4a59c0  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-03-04 10:35:10.768712 7f710b4a59c0  0 ceph version 10.2.5 (
>> c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10586
>> 2017-03-04 10:35:10.796166 7f710b4a59c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:10.796512 7f710b4a59c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:10.799167 7f710b4a59c0  0 Error could not update
>> zonegroup gps: (1) Operation not permitted
>> 2017-03-04 10:35:10.799184 7f710b4a59c0 -1 failed converting regionmap:
>> (1) Operation not permitted
>> 2017-03-04 10:35:10.800461 7f710b4a59c0 -1 Couldn't init storage provider
>> (RADOS)
>> 2017-03-04 10:35:11.017386 7f484298e9c0  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-03-04 10:35:11.017416 7f484298e9c0  0 ceph version 10.2.5 (
>> c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10610
>> 2017-03-04 10:35:11.045061 7f484298e9c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.045421 7f484298e9c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.048360 7f484298e9c0  0 Error could not update
>> zonegroup gps: (1) Operation not permitted
>> 2017-03-04 10:35:11.048390 7f484298e9c0 -1 failed converting regionmap:
>> (1) Operation not permitted
>> 2017-03-04 10:35:11.049755 7f484298e9c0 -1 Couldn't init storage provider
>> (RADOS)
>> 2017-03-04 10:35:11.268341 7fad08c2e9c0  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-03-04 10:35:11.268374 7fad08c2e9c0  0 ceph version 10.2.5 (
>> c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10634
>> 2017-03-04 10:35:11.298192 7fad08c2e9c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.298520 7fad08c2e9c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.301236 7fad08c2e9c0  0 Error could not update
>> zonegroup gps: (1) Operation not permitted
>> 2017-03-04 10:35:11.301259 7fad08c2e9c0 -1 failed converting regionmap:
>> (1) Operation not permitted
>> 2017-03-04 10:35:11.302522 7fad08c2e9c0 -1 Couldn't init storage provider
>> (RADOS)
>> 2017-03-04 10:35:11.518008 7f85204f19c0  0 set uid:gid to 167:167
>> (ceph:ceph)
>> 2017-03-04 10:35:11.518036 7f85204f19c0  0 ceph version 10.2.5 (
>> c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10658
>> 2017-03-04 10:35:11.545524 7f85204f19c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.545847 7f85204f19c0  0 error in read_id for id  : (2)
>> No such file or directory
>> 2017-03-04 10:35:11.548713 7f85204f19c0  0 Error could not update
>> zonegroup gps: (1) Operation not permitted
>> 2017-03-04 10:35:11.548741 7f85204f19c0 -1 failed converting regionmap:
>> (1) Operation not permitted
>> 2017-03-04 10:35:11.549971 7f85204f19c0 -1 Couldn't init storage provider
>> (RADOS)
>>
>>
>> Rgards,
>> Gagan
>>
>> 

Re: [ceph-users] Unable to start rgw after upgrade from hammer to jewel

2017-03-03 Thread Roger Brown
My first thought is ceph doesn't have permissions to the rados keyring file.
eg.
[root@nuc1 ~]# ls -l /etc/ceph/ceph.client.radosgw.keyring
-rw-rw+ 1 root root 73 Feb  8 20:40
/etc/ceph/ceph.client.radosgw.keyring

You could give it read permission or be clever with setfacl, eg.
setfacl -m u:ceph:r /etc/ceph/ceph.client.radosgw.keyring


On Fri, Mar 3, 2017 at 5:57 PM Gagandeep Arora 
wrote:

> Hi all,
>
> Unable to start radosgw after upgrading hammer(0.94.10) to jewel(10.2.5).
> Please see the following log. Can someone help please?
>
> # cat cephprod-client.radosgw.gps-prod-1.log
> 2017-03-04 10:35:10.459830 7f24316189c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-03-04 10:35:10.459883 7f24316189c0  0 ceph version 10.2.5
> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10561
> 2017-03-04 10:35:10.487863 7f24316189c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:10.488220 7f24316189c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:10.490991 7f24316189c0  0 Error could not update
> zonegroup gps: (1) Operation not permitted
> 2017-03-04 10:35:10.491008 7f24316189c0 -1 failed converting regionmap:
> (1) Operation not permitted
> 2017-03-04 10:35:10.492683 7f24316189c0 -1 Couldn't init storage provider
> (RADOS)
> 2017-03-04 10:35:10.768690 7f710b4a59c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-03-04 10:35:10.768712 7f710b4a59c0  0 ceph version 10.2.5
> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10586
> 2017-03-04 10:35:10.796166 7f710b4a59c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:10.796512 7f710b4a59c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:10.799167 7f710b4a59c0  0 Error could not update
> zonegroup gps: (1) Operation not permitted
> 2017-03-04 10:35:10.799184 7f710b4a59c0 -1 failed converting regionmap:
> (1) Operation not permitted
> 2017-03-04 10:35:10.800461 7f710b4a59c0 -1 Couldn't init storage provider
> (RADOS)
> 2017-03-04 10:35:11.017386 7f484298e9c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-03-04 10:35:11.017416 7f484298e9c0  0 ceph version 10.2.5
> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10610
> 2017-03-04 10:35:11.045061 7f484298e9c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.045421 7f484298e9c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.048360 7f484298e9c0  0 Error could not update
> zonegroup gps: (1) Operation not permitted
> 2017-03-04 10:35:11.048390 7f484298e9c0 -1 failed converting regionmap:
> (1) Operation not permitted
> 2017-03-04 10:35:11.049755 7f484298e9c0 -1 Couldn't init storage provider
> (RADOS)
> 2017-03-04 10:35:11.268341 7fad08c2e9c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-03-04 10:35:11.268374 7fad08c2e9c0  0 ceph version 10.2.5
> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10634
> 2017-03-04 10:35:11.298192 7fad08c2e9c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.298520 7fad08c2e9c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.301236 7fad08c2e9c0  0 Error could not update
> zonegroup gps: (1) Operation not permitted
> 2017-03-04 10:35:11.301259 7fad08c2e9c0 -1 failed converting regionmap:
> (1) Operation not permitted
> 2017-03-04 10:35:11.302522 7fad08c2e9c0 -1 Couldn't init storage provider
> (RADOS)
> 2017-03-04 10:35:11.518008 7f85204f19c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 2017-03-04 10:35:11.518036 7f85204f19c0  0 ceph version 10.2.5
> (c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10658
> 2017-03-04 10:35:11.545524 7f85204f19c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.545847 7f85204f19c0  0 error in read_id for id  : (2)
> No such file or directory
> 2017-03-04 10:35:11.548713 7f85204f19c0  0 Error could not update
> zonegroup gps: (1) Operation not permitted
> 2017-03-04 10:35:11.548741 7f85204f19c0 -1 failed converting regionmap:
> (1) Operation not permitted
> 2017-03-04 10:35:11.549971 7f85204f19c0 -1 Couldn't init storage provider
> (RADOS)
>
>
> Rgards,
> Gagan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unable to start rgw after upgrade from hammer to jewel

2017-03-03 Thread Gagandeep Arora
Hi all,

Unable to start radosgw after upgrading hammer(0.94.10) to jewel(10.2.5).
Please see the following log. Can someone help please?

# cat cephprod-client.radosgw.gps-prod-1.log
2017-03-04 10:35:10.459830 7f24316189c0  0 set uid:gid to 167:167
(ceph:ceph)
2017-03-04 10:35:10.459883 7f24316189c0  0 ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10561
2017-03-04 10:35:10.487863 7f24316189c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:10.488220 7f24316189c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:10.490991 7f24316189c0  0 Error could not update zonegroup
gps: (1) Operation not permitted
2017-03-04 10:35:10.491008 7f24316189c0 -1 failed converting regionmap: (1)
Operation not permitted
2017-03-04 10:35:10.492683 7f24316189c0 -1 Couldn't init storage provider
(RADOS)
2017-03-04 10:35:10.768690 7f710b4a59c0  0 set uid:gid to 167:167
(ceph:ceph)
2017-03-04 10:35:10.768712 7f710b4a59c0  0 ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10586
2017-03-04 10:35:10.796166 7f710b4a59c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:10.796512 7f710b4a59c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:10.799167 7f710b4a59c0  0 Error could not update zonegroup
gps: (1) Operation not permitted
2017-03-04 10:35:10.799184 7f710b4a59c0 -1 failed converting regionmap: (1)
Operation not permitted
2017-03-04 10:35:10.800461 7f710b4a59c0 -1 Couldn't init storage provider
(RADOS)
2017-03-04 10:35:11.017386 7f484298e9c0  0 set uid:gid to 167:167
(ceph:ceph)
2017-03-04 10:35:11.017416 7f484298e9c0  0 ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10610
2017-03-04 10:35:11.045061 7f484298e9c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.045421 7f484298e9c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.048360 7f484298e9c0  0 Error could not update zonegroup
gps: (1) Operation not permitted
2017-03-04 10:35:11.048390 7f484298e9c0 -1 failed converting regionmap: (1)
Operation not permitted
2017-03-04 10:35:11.049755 7f484298e9c0 -1 Couldn't init storage provider
(RADOS)
2017-03-04 10:35:11.268341 7fad08c2e9c0  0 set uid:gid to 167:167
(ceph:ceph)
2017-03-04 10:35:11.268374 7fad08c2e9c0  0 ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10634
2017-03-04 10:35:11.298192 7fad08c2e9c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.298520 7fad08c2e9c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.301236 7fad08c2e9c0  0 Error could not update zonegroup
gps: (1) Operation not permitted
2017-03-04 10:35:11.301259 7fad08c2e9c0 -1 failed converting regionmap: (1)
Operation not permitted
2017-03-04 10:35:11.302522 7fad08c2e9c0 -1 Couldn't init storage provider
(RADOS)
2017-03-04 10:35:11.518008 7f85204f19c0  0 set uid:gid to 167:167
(ceph:ceph)
2017-03-04 10:35:11.518036 7f85204f19c0  0 ceph version 10.2.5
(c461ee19ecbc0c5c330aca20f7392c9a00730367), process radosgw, pid 10658
2017-03-04 10:35:11.545524 7f85204f19c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.545847 7f85204f19c0  0 error in read_id for id  : (2)
No such file or directory
2017-03-04 10:35:11.548713 7f85204f19c0  0 Error could not update zonegroup
gps: (1) Operation not permitted
2017-03-04 10:35:11.548741 7f85204f19c0 -1 failed converting regionmap: (1)
Operation not permitted
2017-03-04 10:35:11.549971 7f85204f19c0 -1 Couldn't init storage provider
(RADOS)


Rgards,
Gagan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] purging strays faster

2017-03-03 Thread Daniel Davidson

ceph daemonperf mds.ceph-0
-mds-- --mds_server-- ---objecter--- -mds_cache- 
---mds_log
rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs 
evts subm|
  0  336k  97k|  000 |  00   20 |  00 246k   0 | 31   
27k   0
  0  336k  97k|  000 |1120   20 |  00 246k  55 | 31   
26k  55
  0  336k  97k|  010 | 900   20 |  00 246k  45 | 31   
26k  45
  0  336k  97k|  000 |  20   20 |  00 246k   1 | 31   
26k   1
  0  336k  97k|  000 |1660   21 |  00 246k  83 | 31   
26k  83


I have too many strays that seem to be causing disk full errors when 
deleting man files (hundreds of thousands)  the number here is down from 
over 400k.  I have been trying to up the number of processes to do this, 
but it is not happening:


ceph tell mds.ceph-0 injectargs --mds-max-purge-ops-per-pg 2
2017-03-03 15:44:00.606548 7fd96400a700  0 client.225772 ms_handle_reset 
on 172.16.31.1:6800/55710
2017-03-03 15:44:00.618556 7fd96400a700  0 client.225776 ms_handle_reset 
on 172.16.31.1:6800/55710

mds_max_purge_ops_per_pg = '2'

ceph tell mds.ceph-0 injectargs --mds-max-purge-ops 16384
2017-03-03 15:45:27.256132 7ff6d900c700  0 client.225808 ms_handle_reset 
on 172.16.31.1:6800/55710
2017-03-03 15:45:27.268302 7ff6d900c700  0 client.225812 ms_handle_reset 
on 172.16.31.1:6800/55710

mds_max_purge_ops = '16384'

I do have a backfill running as I also have a new node that is almost 
done.  Any ideas as to what is going on here?


Dan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upgrade osd ceph version

2017-03-03 Thread Curt Beason
Hello,

So this is going to be a noob question probably.  I read the documentation,
but it didn't really cover upgrading to a specific version.

We have a cluster with mixed versions.  While I don't want to upgrade the
latest version of ceph, I would like to upgrade the osd's so they are all
on the same version.  Most of them are on 0.87.1 or 0.87.2.  There are 2
servers with osd's on 0.80.10.  What is the best way to go through and
upgrade them all to 0.87.2?

They are all running Ubuntu 14 with kernel 3.13 or newer.

Cheers,
Curt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object store backup tool recommendations

2017-03-03 Thread Robin H. Johnson
On Fri, Mar 03, 2017 at 10:55:06AM +1100, Blair Bethwaite wrote:
> Does anyone have any recommendations for good tools to perform
> file-system/tree backups and restores to/from a RGW object store (Swift or
> S3 APIs)? Happy to hear about both FOSS and commercial options please.
This isn't Ceph specific, but is something that has come up for me, and
I did a lot of research into it for the Gentoo distribution to use on
it's infrastructure.

The wiki page with all of our needs & contenders is here:
https://wiki.gentoo.org/wiki/Project:Infrastructure/Backups_v3

TL;DR: restic is probably the closest fit to your needs, but do evaluate
it carefully.

> I'm interested in:
> 1) tools known to work or not work at all for a basic file-based data backup
> 
> Plus these extras:
> 2) preserves/restores correct file metadata (e.g. owner, group, acls etc)
> 3) preserves/restores xattrs
Restic has acl/xattr in master, but not yet in version 0.4.0.

> 4) backs up empty directories and files
Yes.
> 5) supports some sort of snapshot/versioning/differential functionality,
> i.e., will keep a copy or diff or last N versions of a file or whole backup
> set, e.g., so that one can restore yesterday's file/s or last week's but
> not have to keep two full copies to achieve it
Yes.
> 6) is readily able to restore individual files
Yes.
> 7) can encrypt/decrypt client side
Yes, but beware the key model, it's fully symmetric, any client with the
key can touch the entire repo.

> 8) anything else I should be considering
restic does not do any compression, it's planned still.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-03 Thread Sage Weil
On Fri, 3 Mar 2017, Mike Lovell wrote:
> i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
> cluster that is using cache tiering. this cluster has 3 monitors, 28 storage
> nodes, around 370 osds. the upgrade of the monitors completed without issue.
> i then upgraded 2 of the storage nodes, and after the restarts, the osds
> started crashing during hit_set_trim. here is some of the output from the
> log.
> 2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
> 'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
> thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
> osd/ReplicatedPG.cc: 10514: FAILED assert(obc)
> 
>  ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0xbddac5]
>  2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
> int)+0x75f) [0x87e48f]
>  3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
>  4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
>  5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
> ThreadPool::TPHandle&)+0x68a) [0x83be4a]
>  6: (OSD::dequeue_op(boost::intrusive_ptr,
> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
>  7: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
>  8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
> [0xbcd1cf]
>  9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
>  10: (()+0x7dc5) [0x7f8c1c209dc5]
>  11: (clone()+0x6d) [0x7f8c1aceaced]
> 
> it started on just one osd and then spread to others until most of the osds
> that are part of the cache tier were crashing. that was happening on both
> the osds that were running jewel and on the ones running hammer. in the
> process of trying to sort this out, the use_gmt_hitset option was set to
> true and all of the osds were upgraded to hammer. we still have not been
> able to determine a cause or a fix.
> 
> it looks like when hit_set_trim and hit_set_remove_all are being called,
> they are calling hit_set_archive_object() to generate a name based on a
> timestamp and then calling get_object_context() which then returns nothing
> and triggers an assert.
> 
> i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
> i found the following in the ceph osd log afterwards.
> 
> 2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
> pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
> local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
> 266042/266042/266042) [146,116,179] r=0 lpr=266042
>  pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0 active+degraded
> NIBBLEWISE] get_object_context: no obc for soid
> 19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
> 05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
> 6Z:head and !can_create
> 2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
> start
> 2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
> finish
> 2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
> from osd.255
> 2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
> start
> 2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
> finish
> 2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
> 'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
> 2017-03-03 03:10:31.918201
> osd/ReplicatedPG.cc: 11494: FAILED assert(obc)
> 
>  ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x85) [0x7f21acee9425]
>  2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
>  3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
>  4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated const&)+0xac)
> [0x7f21ac916adc]
>  5: (boost::statechart::simple_state PG::RecoveryState::Primary, 
> PG::RecoveryState::Activating,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
> se const&, void const*)+0x179) [0x7f21a
> c974909]
>  6: (boost::statechart::simple_state PG::RecoveryState::Active, boost::mpl::list mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
> mpl_::na, mpl_::na, mpl_:
> :na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, 
> mpl_::na>,(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_ba
> se const&, void const*)+0xcd) [0x7f21ac977ccd]
>  7: (boost::statechart::state_machine PG::RecoveryState::Initial, 
> std::allocator,boost::statechart::null_exception_translator>::send_event(boost::statechart
> ::event_base const&)+0x6b) [0x7f21ac95
> d9cb]
>  8: (PG::handle_peering_event(std::shared_ptr,
> 

Re: [ceph-users] replica questions

2017-03-03 Thread Vy Nguyen Tan
Hi,

You should read email from Wido den Hollander:
"Hi,

As a Ceph consultant I get numerous calls throughout the year to help people
 with getting their broken Ceph clusters back online.

The causes of downtime vary vastly, but one of the biggest causes is that
people use replication 2x. size = 2, min_size = 1.

In 2016 the amount of cases I have where data was lost due to these
settings grew exponentially.

Usually a disk failed, recovery kicks in and while recovery is happening a
second disk fails. Causing PGs to become incomplete.

There have been to many times where I had to use xfs_repair on broken disks
and use ceph-objectstore-tool to export/import PGs.

I really don't like these cases, mainly because they can be prevented
easily by using size = 3 and min_size = 2 for all pools.

With size = 2 you go into the danger zone as soon as a single disk/daemon
fails. With size = 3 you always have two additional copies left thus
keeping your data safe(r).

If you are running CephFS, at least consider running the 'metadata' pool
with size = 3 to keep the MDS happy.

Please, let this be a big warning to everybody who is running with size =
2. The downtime and problems caused by missing objects/replicas are usually
big and it takes days to recover from those. But very often data is lost
and/or corrupted which causes even more problems.

I can't stress this enough. Running with size = 2 in production is a
SERIOUS hazard and should not be done imho.

To anyone out there running with size = 2, please reconsider this!

Thanks,

Wido"

Btw, could you please share your experience about HA network for Ceph ?
What type of bonding do you have? are you using stackable switches?



On Fri, Mar 3, 2017 at 6:24 PM, Maxime Guyot  wrote:

> Hi Henrik and Matteo,
>
>
>
> While I agree with Henrik: increasing your replication factor won’t
> improve recovery or read performance on its own. If you are changing from
> replica 2 to replica 3, you might need to scale-out your cluster to have
> enough space for the additional replica, and that would improve the
> recovery and read performance.
>
>
>
> Cheers,
>
> Maxime
>
>
>
> *From: *ceph-users  on behalf of
> Henrik Korkuc 
> *Date: *Friday 3 March 2017 11:35
> *To: *"ceph-users@lists.ceph.com" 
> *Subject: *Re: [ceph-users] replica questions
>
>
>
> On 17-03-03 12:30, Matteo Dacrema wrote:
>
> Hi All,
>
>
>
> I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD
> every 5 OSDs with replica 2 for a total RAW space of 150 TB.
>
> I’ve few question about it:
>
>
>
>- It’s critical to have replica 2? Why?
>
> Replica size 3 is highly recommended. I do not know exact numbers but it
> decreases chance of data loss as 2 disk failures appear to be quite
> frequent thing, especially in larger clusters.
>
>
>- Does replica 3 makes recovery faster?
>
> no
>
>
>- Does replica 3 makes rebalancing and recovery less heavy for
>customers? If I lose 1 node does replica 3 reduce the IO impact respect a
>replica 2?
>
> no
>
>
>- Does read performance increase with replica 3?
>
> no
>
>
>
> Thank you
>
> Regards
>
> Matteo
>
>
>
> 
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
>
>
>
>
> ___
>
> ceph-users mailing list
>
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw. Strange behavior in 2 zone configuration

2017-03-03 Thread K K

Hello, all!
I have successfully create 2 zone cluster(se and se2). But my radosgw machines 
are sending many GET /admin/log requests to each other after put 10k items to 
cluster via radosgw. It's look like:
2017-03-03 17:31:17.897872 7f21b9083700 1 civetweb: 0x7f222001f660: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.944212 7f21ca0a5700 1 civetweb: 0x7f2200015510: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.945363 7f21b9083700 1 civetweb: 0x7f222001f660: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
2017-03-03 17:31:17.988330 7f21ca0a5700 1 civetweb: 0x7f2200015510: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
2017-03-03 17:31:18.005993 7f21b9083700 1 civetweb: 0x7f222001f660: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
2017-03-03 17:31:18.006234 7f21c689e700 1 civetweb: 0x7f221c011260: 10.30.18.24 
- - [03/Mar/2017:17:31:17 +0500] "GET /admin/log/ HTTP/1.1" 200 0 - -
up to 2k rps!!! Do anybody know what is it???
Tcpdump show the request is:
GET 
/admin/log/?type=data=100=bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017
 HTTP/1.1
Host: se2.local
Accept: */*
Transfer-Encoding: chunked
AUTHORIZATION: AWS hEY2W7nW3tdodGrsnrdv:v6+m2FGGhqCSDQteGJ4w039X1uw=
DATE: Fri Mar 3 12:32:20 2017
Expect: 100-continue
and answer:

...2...m{"marker":"1_1488542463.536646_1448.1","last_update":"2017-03-03 
12:01:03.536646Z"}



All system install on:
OS: Ubuntu 16.04
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
rga sync status
2017-03-03 17:36:20.146017 7f7a72b5ea00 0 error in read_id for id : (2) No such 
file or directory
2017-03-03 17:36:20.147015 7f7a72b5ea00 0 error in read_id for id : (2) No such 
file or directory
realm d9ed5678-5734-4609-bf7a-fe3d5f700b23 (s)
zonegroup bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017 (se)
zone 9b212551-a7cf-4aaa-9ef6-b18a31a6e032 (se-k8)
metadata sync no sync (zone is master)
data sync source: 029e0f49-f4dc-4f29-8855-bcc23a8bbcd9 (se2-k12)
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is caught up with source


My config files are:
[client.radosgw.se2-k12-2]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.se2-k12-2
rgw zonegroup = se
rgw zone = se2-k12
#rgw zonegroup root pool = se.root
#rgw zone root pool = se.root
keyring = /etc/ceph/bak.client.radosgw.se2-k12-2.keyring
rgw host = cbrgw04
rgw dns name = se2.local
log file = /var/log/radosgw/client.radosgw.se2-k12-2.log
rgw_frontends = "civetweb num_threads=50 port=80"
rgw cache lru size = 10
rgw cache enabled = false
#debug rgw = 20
rgw enable ops log = false
#log to stderr = false
rgw enable usage log = false
rgw swift versioning enabled = true
rgw swift url = http://se2.local/
rgw override bucket index max shards = 20
rgw print continue = false

[client.radosgw.se-k8-2]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.se-k8-2
rgw zonegroup = se
rgw zone = se-k8
#rgw zonegroup root pool = .se.root
#rgw zone root pool = .se.root
keyring = /etc/ceph/ceph.client.radosgw.se-k8-2.keyring
rgw host = cnrgw02
rgw dns name = se.local
log file = /var/log/radosgw/client.radosgw.se-k8-2.log
rgw_frontends = "civetweb num_threads=100 port=80"
rgw cache enabled = false
rgw cache lru size = 10
#debug rgw = 20
rgw enable ops log = false
#log to stderr = false
rgw enable usage log = false
rgw swift versioning enabled = true
rgw swift url = http://se.local
rgw override bucket index max shards = 20
rgw print continue = false
rga zonegroup get
{
"id": "bfe2e3bb-2040-4b1a-9ccb-ab5347ce3017",
"name": "se",
"api_name": "se",
"is_master": "true",
"endpoints": [
"http:\/\/se.local:80"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "9b212551-a7cf-4aaa-9ef6-b18a31a6e032",
"zones": [
{
"id": "029e0f49-f4dc-4f29-8855-bcc23a8bbcd9",
"name": "se2-k12",
"endpoints": [
"http:\/\/se2.local:80"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
},
{
"id": "9b212551-a7cf-4aaa-9ef6-b18a31a6e032",
"name": "se-k8",
"endpoints": [
"http:\/\/se.local:80"
],
"log_meta": "true",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "d9ed5678-5734-4609-bf7a-fe3d5f700b23"
}


-- 
K K___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Mix HDDs and SSDs togheter

2017-03-03 Thread Дробышевский , Владимир
Hi, Matteo!

  Yes, I'm using mixed cluster in production but it's pretty small at the
moment. I've made a smal step by step manual for myself when I did this for
the first time and now put it as a gist:
https://gist.github.com/vheathen/cf2203aeb53e33e3f80c8c64a02263bc#file-manual-txt.
Probably it could be a little bit outdated since it was some time ago.

  Crush map modifications are going to be persistent in case of reboots and
maintenance if you put *'osd crush update on start = false*' to the [osd]
section of ceph conf.

  But I would also recommend to start from this article:
https://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/

  P.S. While I was writing this letter I've seen a letter from Maxime
Guyot. Seems that his method is much easier if it leads to the same results.

Best regards,
Vladimir

С уважением,
Дробышевский Владимир
Компания "АйТи Город"
+7 343 192

Аппаратное и программное обеспечение
IBM, Microsoft, Eset
Поставка проектов "под ключ"
Аутсорсинг ИТ-услуг

2017-03-03 16:30 GMT+05:00 Matteo Dacrema :

> Hi all,
>
> Does anyone run a production cluster with a modified crush map for create
> two pools belonging one to HDDs and one to SSDs.
> What’s the best method? Modify the crush map via ceph CLI or via text
> editor?
> Will the modification to the crush map be persistent across reboots and
> maintenance operations?
> There’s something to consider when doing upgrades or other operations or
> different by having “original” crush map?
>
> Thank you
> Matteo
> 
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mix HDDs and SSDs togheter

2017-03-03 Thread Matteo Dacrema
Hi all,

Does anyone run a production cluster with a modified crush map for create two 
pools belonging one to HDDs and one to SSDs.
What’s the best method? Modify the crush map via ceph CLI or via text editor? 
Will the modification to the crush map be persistent across reboots and 
maintenance operations?
There’s something to consider when doing upgrades or other operations or 
different by having “original” crush map?

Thank you
Matteo

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replica questions

2017-03-03 Thread Maxime Guyot
Hi Henrik and Matteo,

While I agree with Henrik: increasing your replication factor won’t improve 
recovery or read performance on its own. If you are changing from replica 2 to 
replica 3, you might need to scale-out your cluster to have enough space for 
the additional replica, and that would improve the recovery and read 
performance.

Cheers,
Maxime

From: ceph-users  on behalf of Henrik Korkuc 

Date: Friday 3 March 2017 11:35
To: "ceph-users@lists.ceph.com" 
Subject: Re: [ceph-users] replica questions

On 17-03-03 12:30, Matteo Dacrema wrote:
Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 
OSDs with replica 2 for a total RAW space of 150 TB.
I’ve few question about it:


  *   It’s critical to have replica 2? Why?
Replica size 3 is highly recommended. I do not know exact numbers but it 
decreases chance of data loss as 2 disk failures appear to be quite frequent 
thing, especially in larger clusters.


  *   Does replica 3 makes recovery faster?
no


  *   Does replica 3 makes rebalancing and recovery less heavy for customers? 
If I lose 1 node does replica 3 reduce the IO impact respect a replica 2?
no


  *   Does read performance increase with replica 3?
no


Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.





___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object store backup tool recommendations

2017-03-03 Thread Blair Bethwaite
Hi Marc,

Whilst I agree CephFS would probably help compared to your present
solution, what I'm looking for something that can talk to a the RadosGW
restful object storage APIs, so that the backing storage can be durable and
low-cost, i.e., on an erasure coded pool. In this case we're looking to
backup a Lustre filesystem.

Cheers,

On 3 March 2017 at 21:29, Marc Roos  wrote:

>
> Hi Blair,
>
> We are also thinking of using ceph for 'backup'. At the moment we are
> using rsync and hardlinks on a drbd setup. But I think when using cephfs
> things could speed up, because file information is gotten from the mds
> daemon, so this should save on one rsync file lookup, and we expect that
> we can run more tasks in parallel.
>
>
>
>
>
> -Original Message-
> From: Blair Bethwaite [mailto:blair.bethwa...@gmail.com]
> Sent: vrijdag 3 maart 2017 0:55
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] object store backup tool recommendations
>
> Hi all,
>
> Does anyone have any recommendations for good tools to perform
> file-system/tree backups and restores to/from a RGW object store (Swift
> or S3 APIs)? Happy to hear about both FOSS and commercial options
> please.
>
> I'm interested in:
> 1) tools known to work or not work at all for a basic file-based data
> backup
>
> Plus these extras:
> 2) preserves/restores correct file metadata (e.g. owner, group, acls
> etc)
> 3) preserves/restores xattrs
> 4) backs up empty directories and files
> 5) supports some sort of snapshot/versioning/differential functionality,
> i.e., will keep a copy or diff or last N versions of a file or whole
> backup set, e.g., so that one can restore yesterday's file/s or last
> week's but not have to keep two full copies to achieve it
> 6) is readily able to restore individual files
> 7) can encrypt/decrypt client side
>
> 8) anything else I should be considering
>
> --
>
> Cheers,
> ~Blairo
>
>
>


-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replica questions

2017-03-03 Thread Henrik Korkuc

On 17-03-03 12:30, Matteo Dacrema wrote:

Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD 
every 5 OSDs with replica 2 for a total RAW space of 150 TB.

I’ve few question about it:

  * It’s critical to have replica 2? Why?

Replica size 3 is highly recommended. I do not know exact numbers but it 
decreases chance of data loss as 2 disk failures appear to be quite 
frequent thing, especially in larger clusters.


  * Does replica 3 makes recovery faster?


no


  * Does replica 3 makes rebalancing and recovery less heavy for
customers? If I lose 1 node does replica 3 reduce the IO impact
respect a replica 2?


no


  * Does read performance increase with replica 3?


no


Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they 
are addressed. If you have received this email in error please notify 
the system manager. This message contains confidential information and 
is intended only for the individual named. If you are not the named 
addressee you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately by e-mail if you have received 
this e-mail by mistake and delete this e-mail from your system. If you 
are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osds crashing during hit_set_trim and hit_set_remove_all

2017-03-03 Thread Mike Lovell
i started an upgrade process to go from 0.94.7 to 10.2.5 on a production
cluster that is using cache tiering. this cluster has 3 monitors, 28
storage nodes, around 370 osds. the upgrade of the monitors completed
without issue. i then upgraded 2 of the storage nodes, and after the
restarts, the osds started crashing during hit_set_trim. here is some of
the output from the log.

2017-03-02 22:41:32.338290 7f8bfd6d7700 -1 osd/ReplicatedPG.cc: In function
'void ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned int)'
thread 7f8bfd6d7700 time 2017-03-02 22:41:32.335020
osd/ReplicatedPG.cc: 10514: FAILED assert(obc)

 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0xbddac5]
 2: (ReplicatedPG::hit_set_trim(ReplicatedPG::RepGather*, unsigned
int)+0x75f) [0x87e48f]
 3: (ReplicatedPG::hit_set_persist()+0xedb) [0x87f4ab]
 4: (ReplicatedPG::do_op(std::tr1::shared_ptr&)+0xe3a) [0x8a0d1a]
 5: (ReplicatedPG::do_request(std::tr1::shared_ptr&,
ThreadPool::TPHandle&)+0x68a) [0x83be4a]
 6: (OSD::dequeue_op(boost::intrusive_ptr,
std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x405) [0x69a5c5]
 7: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x333) [0x69ab33]
 8: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x86f)
[0xbcd1cf]
 9: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xbcf300]
 10: (()+0x7dc5) [0x7f8c1c209dc5]
 11: (clone()+0x6d) [0x7f8c1aceaced]

it started on just one osd and then spread to others until most of the osds
that are part of the cache tier were crashing. that was happening on both
the osds that were running jewel and on the ones running hammer. in the
process of trying to sort this out, the use_gmt_hitset option was set to
true and all of the osds were upgraded to hammer. we still have not been
able to determine a cause or a fix.

it looks like when hit_set_trim and hit_set_remove_all are being called,
they are calling hit_set_archive_object() to generate a name based on a
timestamp and then calling get_object_context() which then returns nothing
and triggers an assert.

i raised the debug_osd to 10/10 and then analyzed the logs after the crash.
i found the following in the ceph osd log afterwards.

2017-03-03 03:10:31.918470 7f218c842700 10 osd.146 pg_epoch: 266043
pg[19.5d4( v 264786'61233923 (262173'61230715,264786'61233923]
local-les=266043 n=393 ec=83762 les/c/f 266043/264767/0
266042/266042/266042) [146,116,179] r=0 lpr=266042
 pi=264766-266041/431 crt=262323'61233250 lcod 0'0 mlcod 0'0
active+degraded NIBBLEWISE] get_object_context: no obc for soid
19:2ba0:.ceph-internal::hit_set_19.5d4_archive_2017-03-03
05%3a55%3a58.459084Z_2017-03-03 05%3a56%3a58.98101
6Z:head and !can_create
2017-03-03 03:10:31.921064 7f2194051700 10 osd.146 266043 do_waiters --
start
2017-03-03 03:10:31.921072 7f2194051700 10 osd.146 266043 do_waiters --
finish
2017-03-03 03:10:31.921076 7f2194051700  7 osd.146 266043 handle_pg_notify
from osd.255
2017-03-03 03:10:31.921096 7f2194051700 10 osd.146 266043 do_waiters --
start
2017-03-03 03:10:31.921099 7f2194051700 10 osd.146 266043 do_waiters --
finish
2017-03-03 03:10:31.925858 7f218c041700 -1 osd/ReplicatedPG.cc: In function
'void ReplicatedPG::hit_set_remove_all()' thread 7f218c041700 time
2017-03-03 03:10:31.918201
osd/ReplicatedPG.cc: 11494: FAILED assert(obc)

 ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x85) [0x7f21acee9425]
 2: (ReplicatedPG::hit_set_remove_all()+0x412) [0x7f21ac9cba92]
 3: (ReplicatedPG::on_activate()+0x6dd) [0x7f21ac9f73fd]
 4: (PG::RecoveryState::Active::react(PG::AllReplicasActivated
const&)+0xac) [0x7f21ac916adc]
 5: (boost::statechart::simple_state::react_impl(boost::statechart::event_base
const&, void const*)+0x179) [0x7f21a
c974909]
 6: (boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xcd) [0x7f21ac977ccd]
 7: (boost::statechart::state_machine::send_event(boost::statechart::event_base
const&)+0x6b) [0x7f21ac95
d9cb]
 8: (PG::handle_peering_event(std::shared_ptr,
PG::RecoveryCtx*)+0x1f4) [0x7f21ac924d24]
 9: (OSD::process_peering_events(std::list >
const&, ThreadPool::TPHandle&)+0x259) [0x7f21ac87de99]
 10: (OSD::PeeringWQ::_process(std::list > const&,

[ceph-users] replica questions

2017-03-03 Thread Matteo Dacrema
Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD every 5 
OSDs with replica 2 for a total RAW space of 150 TB.
I’ve few question about it:

It’s critical to have replica 2? Why?
Does replica 3 makes recovery faster?
Does replica 3 makes rebalancing and recovery less heavy for customers? If I 
lose 1 node does replica 3 reduce the IO impact respect a replica 2?
Does read performance increase with replica 3?

Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and delete 
this e-mail from your system. If you are not the intended recipient you are 
notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] object store backup tool recommendations

2017-03-03 Thread Marc Roos
 
Hi Blair, 

We are also thinking of using ceph for 'backup'. At the moment we are 
using rsync and hardlinks on a drbd setup. But I think when using cephfs 
things could speed up, because file information is gotten from the mds 
daemon, so this should save on one rsync file lookup, and we expect that 
we can run more tasks in parallel.





-Original Message-
From: Blair Bethwaite [mailto:blair.bethwa...@gmail.com] 
Sent: vrijdag 3 maart 2017 0:55
To: ceph-users@lists.ceph.com
Subject: [ceph-users] object store backup tool recommendations

Hi all,

Does anyone have any recommendations for good tools to perform 
file-system/tree backups and restores to/from a RGW object store (Swift 
or S3 APIs)? Happy to hear about both FOSS and commercial options 
please.

I'm interested in:
1) tools known to work or not work at all for a basic file-based data 
backup

Plus these extras:
2) preserves/restores correct file metadata (e.g. owner, group, acls 
etc)
3) preserves/restores xattrs
4) backs up empty directories and files
5) supports some sort of snapshot/versioning/differential functionality, 
i.e., will keep a copy or diff or last N versions of a file or whole 
backup set, e.g., so that one can restore yesterday's file/s or last 
week's but not have to keep two full copies to achieve it
6) is readily able to restore individual files
7) can encrypt/decrypt client side

8) anything else I should be considering

-- 

Cheers,
~Blairo


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com