[ceph-users] set pg_num on pools with different size

2018-03-08 Thread Nagy Ákos
Hi,

we have a ceph cluster with 3 cluster nodes and 20 OSD's, with 6-7-7 2
TB HDD/s per node.

In long term we want to use 7-9 pools, and for 20 OSD and 8 pools I
calculate that the ideal pg_num was 250 (20 * 100 / 8).

In this case normally each OSD store 100 pg's, that is the recommanded.

I have few problems:

1. I have 1736 pg's, and if I want to create a new pool with 270 pg's, I
got the error:

Error ERANGE:  pg_num 270 size 2 would mean 4012 total pgs, which
exceeds max 4000 (mon_max_pg_per_osd 200 * num_in_osds 20)


2. Now we have 8 pools, but only one of them store huge amount of data,
and for this reason I got a warning:

health: HEALTH_WARN
    1 pools have many more objects per pg than average

But in past I remember that I got a warning that the pg_num for a pool
is less/more then the average pg_num in cluster.


In this case how can I set the optimal pg_num for my pools?

Some debug data:

OSD number: 20

  data:
    pools:   8 pools, 1736 pgs
    objects: 560k objects, 1141 GB
    usage:   2331 GB used, 30053 GB / 32384 GB avail
    pgs: 1736 active+clean
           
           
POOLS:
    NAME    ID USED   %USED MAX AVAIL OBJECTS
    kvmpool 5  34094M  0.24    13833G    8573
    rbd 6    155G  1.11    13833G   94056
    lxdhv04 15 29589M  0.21    13833G   12805
    lxdhv01 16 14480M  0.10    13833G    9732
    lxdhv02 17 14840M  0.10    13833G    7931
    lxdhv03 18 18735M  0.13    13833G    7567
    cephfs-metadata 22 40433k 0    13833G   11336
    cephfs-data 23   876G  5.96    13833G  422108

   
pool 5 'kvmpool' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 1909 lfor 0/1906 owner
18446744073709551615 flags hashpspool stripe_width 0 application rbd
pool 6 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 8422 lfor 0/2375 owner
18446744073709551615 flags hashpspool stripe_width 0 application rbd
pool 15 'lxdhv04' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3053 flags hashpspool
stripe_width 0 application rbd
pool 16 'lxdhv01' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3054 flags hashpspool
stripe_width 0 application rbd
pool 17 'lxdhv02' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 8409 flags hashpspool
stripe_width 0 application rbd
pool 18 'lxdhv03' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3066 flags hashpspool
stripe_width 0 application rbd
pool 22 'cephfs-metadata' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 8405 flags
hashpspool stripe_width 0 application cephfs
pool 23 'cephfs-data' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 8405 flags
hashpspool stripe_width 0 application cephfs


-- 
Ákos

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] set pg_num on pools with different size

2018-01-30 Thread Nagy Ákos
Hi,

we have a ceph cluster with 3 cluster nodes and 20 OSD's, with 6-7-7 2
TB HDD/s per node.

In long term we want to use 7-9 pools, and for 20 OSD and 8 pools I
calculate that the ideal pg_num was 250 (20 * 100 / 8).

In this case normally each OSD store 100 pg's, that is the recommanded.

I have few problems:

1. I have 1736 pg's, and if I want to create a new pool with 270 pg's, I
got the error:

Error ERANGE:  pg_num 270 size 2 would mean 4012 total pgs, which
exceeds max 4000 (mon_max_pg_per_osd 200 * num_in_osds 20)


2. Now we have 8 pools, but only one of them store huge amount of data,
and for this reason I got a warning:

health: HEALTH_WARN
    1 pools have many more objects per pg than average

But in past I remember that I got a warning that the pg_num for a pool
is less/more then the average pg_num in cluster.


In this case how can I set the optimal pg_num for my pools?

Some debug data:

OSD number: 20

  data:
    pools:   8 pools, 1736 pgs
    objects: 560k objects, 1141 GB
    usage:   2331 GB used, 30053 GB / 32384 GB avail
    pgs: 1736 active+clean
           
           
POOLS:
    NAME    ID USED   %USED MAX AVAIL OBJECTS
    kvmpool 5  34094M  0.24    13833G    8573
    rbd 6    155G  1.11    13833G   94056
    lxdhv04 15 29589M  0.21    13833G   12805
    lxdhv01 16 14480M  0.10    13833G    9732
    lxdhv02 17 14840M  0.10    13833G    7931
    lxdhv03 18 18735M  0.13    13833G    7567
    cephfs-metadata 22 40433k 0    13833G   11336
    cephfs-data 23   876G  5.96    13833G  422108

   
pool 5 'kvmpool' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 1909 lfor 0/1906 owner
18446744073709551615 flags hashpspool stripe_width 0 application rbd
pool 6 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 8422 lfor 0/2375 owner
18446744073709551615 flags hashpspool stripe_width 0 application rbd
pool 15 'lxdhv04' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3053 flags hashpspool
stripe_width 0 application rbd
pool 16 'lxdhv01' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3054 flags hashpspool
stripe_width 0 application rbd
pool 17 'lxdhv02' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 8409 flags hashpspool
stripe_width 0 application rbd
pool 18 'lxdhv03' replicated size 2 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 256 pgp_num 256 last_change 3066 flags hashpspool
stripe_width 0 application rbd
pool 22 'cephfs-metadata' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 8405 flags
hashpspool stripe_width 0 application cephfs
pool 23 'cephfs-data' replicated size 2 min_size 1 crush_rule 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 8405 flags
hashpspool stripe_width 0 application cephfs


-- 
Ákos

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
Thanks,

but I erase all of the data, I have only this backup.
If the restore work for 3 pools, I can do it for the remainig 2?

What can I try to set, to import it or how I can find this IDs?

2017. 10. 16. 13:39 keltezéssel, John Spray írta:
> On Mon, Oct 16, 2017 at 11:35 AM, Nagy Ákos <nagy.a...@libreoffice.ro> wrote:
>> Hi,
>>
>> I want to upgrade my ceph from jewel to luminous, and switch to bluestore.
>>
>> For that I export the pools from old cluster:
> This is not the way to do it.  You should convert your OSDs from
> filestore to bluestore one by one, and let the data re-replicate to
> the new OSDs.
>
> Dumping data out of one Ceph cluster and into another will not work,
> because things like RBD images record things like the ID of the pool
> where their parent image is, and pool IDs are usually different
> between clusters.
>
> John
>
>> rados export -p pool1 pool1.ceph
>>
>> and after upgrade and osd recreation:
>>
>> rados --create -p pool1 import pool1.ceph
>>
>> I can import the backup without error, but when I want  to map an image, I
>> got error:
>>
>> rbd --image container1 --pool pool1 map
>>
>> rbd: sysfs write failed
>> In some cases useful info is found in syslog - try "dmesg | tail".
>> rbd: map failed: (2) No such file or directory
>>
>> dmesg | tail
>>
>> [160606.729840] rbd: image container1 : WARNING: kernel layering is
>> EXPERIMENTAL!
>> [160606.730675] libceph: tid 86731 pool does not exist
>>
>>
>> When I try to get info about the image:
>>
>> rbd info pool1/container1
>>
>> 2017-10-16 13:18:17.404858 7f35a37fe700 -1
>> librbd::image::RefreshParentRequest: failed to open parent image: (2) No
>> such file or directory
>> 2017-10-16 13:18:17.404903 7f35a37fe700 -1 librbd::image::RefreshRequest:
>> failed to refresh parent image: (2) No such file or directory
>> 2017-10-16 13:18:17.404930 7f35a37fe700 -1 librbd::image::OpenRequest:
>> failed to refresh image: (2) No such file or directory
>> rbd: error opening image container1: (2) No such file or directory
>>
>>
>> I check to exported image checksum after export and before import, and it's
>> match, and I can restore three pools with one with 60 MB one with 1.2 GB and
>> one with 25 GB of data.
>>
>> The problematic has 60 GB data.
>>
>> The pool store LXD container images.
>>
>> Any help is highly appreciated.
>>
>> --
>> Ákos
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>

-- 
Ákos


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados export/import fail

2017-10-16 Thread Nagy Ákos
Hi,

I want to upgrade my ceph from jewel to luminous, and switch to bluestore.

For that I export the pools from old cluster:

/rados export -p pool1 pool1.ceph/

and after upgrade and osd recreation:

/rados --create -p pool1 import pool1.ceph/

I can import the backup without error, but when I want  to map an image,
I got error:

/rbd --image container1 --pool pool1 map/

rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (2) No such file or directory

dmesg | tail

[160606.729840] rbd: image container1 : WARNING: kernel layering is
EXPERIMENTAL!
[160606.730675] libceph: tid 86731 pool does not exist


When I try to get info about the image:

rbd info pool1/container1

2017-10-16 13:18:17.404858 7f35a37fe700 -1
librbd::image::RefreshParentRequest: failed to open parent image: (2) No
such file or directory
2017-10-16 13:18:17.404903 7f35a37fe700 -1
librbd::image::RefreshRequest: failed to refresh parent image: (2) No
such file or directory
2017-10-16 13:18:17.404930 7f35a37fe700 -1 librbd::image::OpenRequest:
failed to refresh image: (2) No such file or directory
rbd: error opening image container1: (2) No such file or directory


I check to exported image checksum after export and before import, and
it's match, and I can restore three pools with one with 60 MB one with
1.2 GB and one with 25 GB of data.

The problematic has 60 GB data.

The pool store LXD container images.

Any help is highly appreciated.

-- 
Ákos

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com