Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks Hector. So many things going through my head and I totally forgot to 
explore if just turning off the warnings (if only until I get more disks) was 
an option. 

This is 1000% more sensible for sure.

> On Feb 8, 2019, at 7:19 PM, Hector Martin  wrote:
> 
> My practical suggestion would be to do nothing for now (perhaps tweaking
> the config settings to shut up the warnings about PGs per OSD). Ceph
> will gain the ability to downsize pools soon, and in the meantime,
> anecdotally, I have a production cluster where we overshot the current
> recommendation by 10x due to confusing documentation at the time, and
> it's doing fine :-)
> 
> Stable multi-FS support is also coming, so really, multiple ways to fix
> your problem will probably materialize Real Soon Now, and in the
> meantime having more PGs than recommended isn't the end of the world.
> 
> (resending because the previous reply wound up off-list)
> 
> On 09/02/2019 10.39, Brian Topping wrote:
>> Thanks again to Jan, Burkhard, Marc and Hector for responses on this. To
>> review, I am removing OSDs from a small cluster and running up against
>> the “too many PGs per OSD problem due to lack of clarity. Here’s a
>> summary of what I have collected on it:
>> 
>> 1. The CephFS data pool can’t be changed, only added to. 
>> 2. CephFS metadata pool might be rebuildable
>>via https://www.spinics.net/lists/ceph-users/msg29536.html, but the
>>post is a couple of years old, and even then, the author stated that
>>he wouldn’t do this unless it was an emergency.
>> 3. Running multiple clusters on the same hardware is deprecated, so
>>there’s no way to make a new cluster with properly-sized pools and
>>cpio across.
>> 4. Running multiple filesystems on the same hardware is considered
>>experimental: 
>> http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster.
>>It’s unclear what permanent changes this will effect on the cluster
>>that I’d like to use moving forward. This would be a second option
>>to mount and cpio across.
>> 5. Importing pools (ie `zpool export …`, `zpool import …`) from other
>>clusters is likely not supported, so even if I created a new cluster
>>on a different machine, getting the pools back in the original
>>cluster is fraught.
>> 6. There’s really no way to tell Ceph where to put pools, so when the
>>new drives are added to CRUSH, everything starts rebalancing unless
>>`max pg per osd` is set to some small number that is already
>>exceeded. But if I start copying data to the new pool, doesn’t it fail?
>> 7. Maybe the former problem can be avoided by changing the weights of
>>the OSDs...
>> 
>> 
>> All these options so far seem either a) dangerous or b) like I’m going
>> to have a less-than-pristine cluster to kick off the next ten years
>> with. Unless I am mistaken in that, the only options are to copy
>> everything at least once or twice more:
>> 
>> 1. Copy everything back off CephFS to a `mdadm` RAID 1 with two of the
>>6TB drives. Blow away the cluster and start over with the other two
>>drives, copy everything back to CephFS, then re-add the freed drive
>>used as a store. Might be done by the end of next week.
>> 2. Create a new, properly sized cluster on a second machine, copy
>>everything over ethernet, then move the drives and the
>>`/var/lib/ceph` and `/etc/ceph` back to the cluster seed.
>> 
>> 
>> I appreciate small clusters are not the target use case of Ceph, but
>> everyone has to start somewhere!
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> -- 
> Hector Martin (hec...@marcansoft.com)
> Public Key: https://mrcn.st/pub

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Hector Martin
My practical suggestion would be to do nothing for now (perhaps tweaking
the config settings to shut up the warnings about PGs per OSD). Ceph
will gain the ability to downsize pools soon, and in the meantime,
anecdotally, I have a production cluster where we overshot the current
recommendation by 10x due to confusing documentation at the time, and
it's doing fine :-)

Stable multi-FS support is also coming, so really, multiple ways to fix
your problem will probably materialize Real Soon Now, and in the
meantime having more PGs than recommended isn't the end of the world.

(resending because the previous reply wound up off-list)

On 09/02/2019 10.39, Brian Topping wrote:
> Thanks again to Jan, Burkhard, Marc and Hector for responses on this. To
> review, I am removing OSDs from a small cluster and running up against
> the “too many PGs per OSD problem due to lack of clarity. Here’s a
> summary of what I have collected on it:
> 
>  1. The CephFS data pool can’t be changed, only added to. 
>  2. CephFS metadata pool might be rebuildable
> via https://www.spinics.net/lists/ceph-users/msg29536.html, but the
> post is a couple of years old, and even then, the author stated that
> he wouldn’t do this unless it was an emergency.
>  3. Running multiple clusters on the same hardware is deprecated, so
> there’s no way to make a new cluster with properly-sized pools and
> cpio across.
>  4. Running multiple filesystems on the same hardware is considered
> experimental: 
> http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster.
> It’s unclear what permanent changes this will effect on the cluster
> that I’d like to use moving forward. This would be a second option
> to mount and cpio across.
>  5. Importing pools (ie `zpool export …`, `zpool import …`) from other
> clusters is likely not supported, so even if I created a new cluster
> on a different machine, getting the pools back in the original
> cluster is fraught.
>  6. There’s really no way to tell Ceph where to put pools, so when the
> new drives are added to CRUSH, everything starts rebalancing unless
> `max pg per osd` is set to some small number that is already
> exceeded. But if I start copying data to the new pool, doesn’t it fail?
>  7. Maybe the former problem can be avoided by changing the weights of
> the OSDs...
> 
> 
> All these options so far seem either a) dangerous or b) like I’m going
> to have a less-than-pristine cluster to kick off the next ten years
> with. Unless I am mistaken in that, the only options are to copy
> everything at least once or twice more:
> 
>  1. Copy everything back off CephFS to a `mdadm` RAID 1 with two of the
> 6TB drives. Blow away the cluster and start over with the other two
> drives, copy everything back to CephFS, then re-add the freed drive
> used as a store. Might be done by the end of next week.
>  2. Create a new, properly sized cluster on a second machine, copy
> everything over ethernet, then move the drives and the
> `/var/lib/ceph` and `/etc/ceph` back to the cluster seed.
> 
> 
> I appreciate small clusters are not the target use case of Ceph, but
> everyone has to start somewhere!
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks again to Jan, Burkhard, Marc and Hector for responses on this. To 
review, I am removing OSDs from a small cluster and running up against the “too 
many PGs per OSD problem due to lack of clarity. Here’s a summary of what I 
have collected on it:

The CephFS data pool can’t be changed, only added to. 
CephFS metadata pool might be rebuildable via 
https://www.spinics.net/lists/ceph-users/msg29536.html 
, but the post is a 
couple of years old, and even then, the author stated that he wouldn’t do this 
unless it was an emergency.
Running multiple clusters on the same hardware is deprecated, so there’s no way 
to make a new cluster with properly-sized pools and cpio across.
Running multiple filesystems on the same hardware is considered experimental: 
http://docs.ceph.com/docs/master/cephfs/experimental-features/#multiple-filesystems-within-a-ceph-cluster
 
.
 It’s unclear what permanent changes this will effect on the cluster that I’d 
like to use moving forward. This would be a second option to mount and cpio 
across.
Importing pools (ie `zpool export …`, `zpool import …`) from other clusters is 
likely not supported, so even if I created a new cluster on a different 
machine, getting the pools back in the original cluster is fraught.
There’s really no way to tell Ceph where to put pools, so when the new drives 
are added to CRUSH, everything starts rebalancing unless `max pg per osd` is 
set to some small number that is already exceeded. But if I start copying data 
to the new pool, doesn’t it fail?
Maybe the former problem can be avoided by changing the weights of the OSDs...

All these options so far seem either a) dangerous or b) like I’m going to have 
a less-than-pristine cluster to kick off the next ten years with. Unless I am 
mistaken in that, the only options are to copy everything at least once or 
twice more:

Copy everything back off CephFS to a `mdadm` RAID 1 with two of the 6TB drives. 
Blow away the cluster and start over with the other two drives, copy everything 
back to CephFS, then re-add the freed drive used as a store. Might be done by 
the end of next week.
Create a new, properly sized cluster on a second machine, copy everything over 
ethernet, then move the drives and the `/var/lib/ceph` and `/etc/ceph` back to 
the cluster seed.

I appreciate small clusters are not the target use case of Ceph, but everyone 
has to start somewhere!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Hector Martin
On 08/02/2019 19.29, Marc Roos wrote:
>  
> 
> Yes that is thus a partial move, not the behaviour you expect from a mv 
> command. (I think this should be changed)

CephFS lets you put *data* in separate pools, but not *metadata*. Also,
I think you can't remove the original/default data pool.

The FSMap seems to store pools by ID, not by name, so renaming the pools
won't work.

This past thread has an untested procedure for migrating CephFS pools:

https://www.spinics.net/lists/ceph-users/msg29536.html

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks Marc and Burkhard. I think what I am learning is it’s best to copy 
between filesystems with cpio, if not impossible to do it any other way due to 
the “fs metadata in first pool” problem.

FWIW, the mimic docs still describe how to create a differently named cluster 
on the same hardware. But then I see 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021560.html 
 
saying that behavior is deprecated and problematic. 

A hard lesson, but no data was lost. I will set up two machines and a new 
cluster with the larger drives tomorrow.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
 

Yes that is thus a partial move, not the behaviour you expect from a mv 
command. (I think this should be changed)



-Original Message-
From: Burkhard Linke 
[mailto:burkhard.li...@computational.bio.uni-giessen.de] 
Sent: 08 February 2019 11:27
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Downsizing a cephfs pool

Hi,


you can move the data off to another pool, but you need to keep your 
_first_ data pool, since part of the filesystem metadata is stored in 
that pool. You cannot remove the first pool.


Regards,

Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Burkhard Linke

Hi,


you can move the data off to another pool, but you need to keep your 
_first_ data pool, since part of the filesystem metadata is stored in 
that pool. You cannot remove the first pool.



Regards,

Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
 

I think I would COPY and DELETE in chunks the data not via the 'backend' 
but just via cephfs. So you are 100% sure nothing weird can happen. 
(MOVE is not working as you think on a cephfs between different pools)
You can create and mount an extra data pool in cephfs. I have done this 
also so you can mix rep3 and erasure and a fast ssd pool on you cephfs. 

Adding a pool, something like this:
ceph osd pool set fs_data.ec21 allow_ec_overwrites true
ceph osd pool application enable fs_data.ec21 cephfs
ceph fs add_data_pool cephfs fs_data.ec21

Change a directory to use a different pool:
setfattr -n ceph.dir.layout.pool -v fs_data.ec21 folder
getfattr -n ceph.dir.layout.pool folder


-Original Message-
From: Brian Topping [mailto:brian.topp...@gmail.com] 
Sent: 08 February 2019 10:02
To: Marc Roos
Cc: ceph-users
Subject: Re: [ceph-users] Downsizing a cephfs pool

Hi Mark, thats great advice, thanks! Im always grateful for the 
knowledge. 

What about the issue with the pools containing a CephFS though? Is it 
something where I can just turn off the MDS, copy the pools and rename 
them back to the original name, then restart the MDS? 

Agreed about using smaller numbers. When I went to using seven disks, I 
was getting warnings about too few PGs per OSD. Im sure this is 
something one learns to cope with via experience and Im still picking 
that up. Had hoped not I get in a bind like this so quickly, but hey, 
here I am again :)

> On Feb 8, 2019, at 01:53, Marc Roos  wrote:
> 
> 
> There is a setting to set the max pg per osd. I would set that 
> temporarily so you can work, create a new pool with 8 pg's and move 
> data over to the new pool, remove the old pool, than unset this max pg 

> per osd.
> 
> PS. I am always creating pools starting 8 pg's and when I know I am at 

> what I want in production I can always increase the pg count.
> 
> 
> 
> -Original Message-
> From: Brian Topping [mailto:brian.topp...@gmail.com]
> Sent: 08 February 2019 05:30
> To: Ceph Users
> Subject: [ceph-users] Downsizing a cephfs pool
> 
> Hi all, I created a problem when moving data to Ceph and I would be 
> grateful for some guidance before I do something dumb.
> 
> 
> 1.I started with the 4x 6TB source disks that came together as a 
> single XFS filesystem via software RAID. The goal is to have the same 
> data on a cephfs volume, but with these four disks formatted for 
> bluestore under Ceph.
> 2.The only spare disks I had were 2TB, so put 7x together. I sized 

> data and metadata for cephfs at 256 PG, but it was wrong.
> 3.The copy went smoothly, so I zapped and added the original 4x 
6TB 
> disks to the cluster.
> 4.I realized what I did, that when the 7x2TB disks were removed, 
> there were going to be far too many PGs per OSD.
> 
> 
> I just read over https://stackoverflow.com/a/39637015/478209, but that 

> addresses how to do this with a generic pool, not pools used by 
CephFS.
> It looks easy to copy the pools, but once copied and renamed, CephFS 
> may not recognize them as the target and the data may be lost.
> 
> Do I need to create new pools and copy again using cpio? Is there a 
> better way?
> 
> Thanks! Brian
> 
> 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Hi Mark, that’s great advice, thanks! I’m always grateful for the knowledge. 

What about the issue with the pools containing a CephFS though? Is it something 
where I can just turn off the MDS, copy the pools and rename them back to the 
original name, then restart the MDS? 

Agreed about using smaller numbers. When I went to using seven disks, I was 
getting warnings about too few PGs per OSD. I’m sure this is something one 
learns to cope with via experience and I’m still picking that up. Had hoped not 
I get in a bind like this so quickly, but hey, here I am again :)

> On Feb 8, 2019, at 01:53, Marc Roos  wrote:
> 
> 
> There is a setting to set the max pg per osd. I would set that 
> temporarily so you can work, create a new pool with 8 pg's and move data 
> over to the new pool, remove the old pool, than unset this max pg per 
> osd.
> 
> PS. I am always creating pools starting 8 pg's and when I know I am at 
> what I want in production I can always increase the pg count.
> 
> 
> 
> -Original Message-
> From: Brian Topping [mailto:brian.topp...@gmail.com] 
> Sent: 08 February 2019 05:30
> To: Ceph Users
> Subject: [ceph-users] Downsizing a cephfs pool
> 
> Hi all, I created a problem when moving data to Ceph and I would be 
> grateful for some guidance before I do something dumb.
> 
> 
> 1.I started with the 4x 6TB source disks that came together as a 
> single XFS filesystem via software RAID. The goal is to have the same 
> data on a cephfs volume, but with these four disks formatted for 
> bluestore under Ceph.
> 2.The only spare disks I had were 2TB, so put 7x together. I sized 
> data and metadata for cephfs at 256 PG, but it was wrong.
> 3.The copy went smoothly, so I zapped and added the original 4x 6TB 
> disks to the cluster.
> 4.I realized what I did, that when the 7x2TB disks were removed, 
> there were going to be far too many PGs per OSD.
> 
> 
> I just read over https://stackoverflow.com/a/39637015/478209, but that 
> addresses how to do this with a generic pool, not pools used by CephFS. 
> It looks easy to copy the pools, but once copied and renamed, CephFS may 
> not recognize them as the target and the data may be lost.
> 
> Do I need to create new pools and copy again using cpio? Is there a 
> better way?
> 
> Thanks! Brian
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
 
There is a setting to set the max pg per osd. I would set that 
temporarily so you can work, create a new pool with 8 pg's and move data 
over to the new pool, remove the old pool, than unset this max pg per 
osd.

PS. I am always creating pools starting 8 pg's and when I know I am at 
what I want in production I can always increase the pg count.



-Original Message-
From: Brian Topping [mailto:brian.topp...@gmail.com] 
Sent: 08 February 2019 05:30
To: Ceph Users
Subject: [ceph-users] Downsizing a cephfs pool

Hi all, I created a problem when moving data to Ceph and I would be 
grateful for some guidance before I do something dumb.


1.  I started with the 4x 6TB source disks that came together as a 
single XFS filesystem via software RAID. The goal is to have the same 
data on a cephfs volume, but with these four disks formatted for 
bluestore under Ceph.
2.  The only spare disks I had were 2TB, so put 7x together. I sized 
data and metadata for cephfs at 256 PG, but it was wrong.
3.  The copy went smoothly, so I zapped and added the original 4x 6TB 
disks to the cluster.
4.  I realized what I did, that when the 7x2TB disks were removed, 
there were going to be far too many PGs per OSD.


I just read over https://stackoverflow.com/a/39637015/478209, but that 
addresses how to do this with a generic pool, not pools used by CephFS. 
It looks easy to copy the pools, but once copied and renamed, CephFS may 
not recognize them as the target and the data may be lost.

Do I need to create new pools and copy again using cpio? Is there a 
better way?

Thanks! Brian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Jan Kasprzak
Hello,

Brian Topping wrote:
: Hi all, I created a problem when moving data to Ceph and I would be grateful 
for some guidance before I do something dumb.
[...]
: Do I need to create new pools and copy again using cpio? Is there a better 
way?

I think I will be facing the same problem soon (moving my cluster
from ~64 1-2TB OSDs to about 16 12TB OSDs). Maybe this is the way to go:

https://ceph.com/geen-categorie/ceph-pool-migration/

(I did not tested that, though).

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
 This is the world we live in: the way to deal with computers is to google
 the symptoms, and hope that you don't have to watch a video. --P. Zaitcev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Downsizing a cephfs pool

2019-02-07 Thread Brian Topping
Hi all, I created a problem when moving data to Ceph and I would be grateful 
for some guidance before I do something dumb.

I started with the 4x 6TB source disks that came together as a single XFS 
filesystem via software RAID. The goal is to have the same data on a cephfs 
volume, but with these four disks formatted for bluestore under Ceph.
The only spare disks I had were 2TB, so put 7x together. I sized data and 
metadata for cephfs at 256 PG, but it was wrong.
The copy went smoothly, so I zapped and added the original 4x 6TB disks to the 
cluster.
I realized what I did, that when the 7x2TB disks were removed, there were going 
to be far too many PGs per OSD.

I just read over https://stackoverflow.com/a/39637015/478209 
, but that addresses how to do 
this with a generic pool, not pools used by CephFS. It looks easy to copy the 
pools, but once copied and renamed, CephFS may not recognize them as the target 
and the data may be lost.

Do I need to create new pools and copy again using cpio? Is there a better way?

Thanks! Brian___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com