[ceph-users] pool/volume live migration

2019-02-08 Thread Luis Periquito
Hi, a recurring topic is live migration and pool type change (moving from EC to replicated or vice versa). When I went to the OpenStack open infrastructure (aka summit) Sage mentioned about support of live migration of volumes (and as a result of pools) in Nautilus. Is this still the case and is

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
There is a setting to set the max pg per osd. I would set that temporarily so you can work, create a new pool with 8 pg's and move data over to the new pool, remove the old pool, than unset this max pg per osd. PS. I am always creating pools starting 8 pg's and when I know I am at what I

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Burkhard Linke
Hi, you can move the data off to another pool, but you need to keep your _first_ data pool, since part of the filesystem metadata is stored in that pool. You cannot remove the first pool. Regards, Burkhard -- Dr. rer. nat. Burkhard Linke Bioinformatics and Systems Biology

[ceph-users] Debugging 'slow requests' ...

2019-02-08 Thread Massimo Sgaravatto
Our Luminous ceph cluster have been worked without problems for a while, but in the last days we have been suffering from continuous slow requests. We have indeed done some changes in the infrastructure recently: - Moved OSD nodes to a new switch - Increased pg nums for a pool, to have about ~

[ceph-users] Bluestore increased disk usage

2019-02-08 Thread Jan Kasprzak
Hello, ceph users, I moved my cluster to bluestore (Ceph Mimic), and now I see the increased disk usage. From ceph -s: pools: 8 pools, 3328 pgs objects: 1.23 M objects, 4.6 TiB usage: 23 TiB used, 444 TiB / 467 TiB avail I use 3-way replication of my data, so I would

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Caspar Smit
Hi Luis, According to slide 21 of Sage's presentation at FOSDEM it is coming in Nautilus: https://fosdem.org/2019/schedule/event/ceph_project_status_update/attachments/slides/3251/export/events/attachments/ceph_project_status_update/slides/3251/ceph_new_in_nautilus.pdf Kind regards, Caspar Op

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Ashley Merrick
I just tried that, it already had a restart done as I fully deleted the old OSD and re-created using the correct hostname after zapping the disk and restarting the server itself. Just somewhere it still seems to have stored the external IP's of the other hosts for just this OSD, after restarting

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Hi Mark, that’s great advice, thanks! I’m always grateful for the knowledge. What about the issue with the pools containing a CephFS though? Is it something where I can just turn off the MDS, copy the pools and rename them back to the original name, then restart the MDS? Agreed about using

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
Yes that is thus a partial move, not the behaviour you expect from a mv command. (I think this should be changed) -Original Message- From: Burkhard Linke [mailto:burkhard.li...@computational.bio.uni-giessen.de] Sent: 08 February 2019 11:27 To: ceph-users@lists.ceph.com Subject:

Re: [ceph-users] best practices for EC pools

2019-02-08 Thread Caspar Smit
Op vr 8 feb. 2019 om 11:31 schreef Scheurer François < francois.scheu...@everyware.ch>: > Dear Eugen Block > Dear Alan Johnson > > > Thank you for your answers. > > So we will use EC 3+2 on 6 nodes. > Currently with only 4 osd's per node, then 8 and later 20. > > > >Just to add, that a more

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Jan Kasprzak
Hello, Brian Topping wrote: : Hi all, I created a problem when moving data to Ceph and I would be grateful for some guidance before I do something dumb. [...] : Do I need to create new pools and copy again using cpio? Is there a better way? I think I will be facing the same

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Marc Roos
I think I would COPY and DELETE in chunks the data not via the 'backend' but just via cephfs. So you are 100% sure nothing weird can happen. (MOVE is not working as you think on a cephfs between different pools) You can create and mount an extra data pool in cephfs. I have done this also so

Re: [ceph-users] best practices for EC pools

2019-02-08 Thread Scheurer François
Dear Eugen Block Dear Alan Johnson Thank you for your answers. So we will use EC 3+2 on 6 nodes. Currently with only 4 osd's per node, then 8 and later 20. >Just to add, that a more general formula is that the number of nodes should be >greater than or equal to k+m+m so N>=k+m+m for full

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Hector Martin
On 08/02/2019 17.05, Ashley Merrick wrote: > Just somewhere it still seems to have stored the external IP's of the > other hosts for just this OSD, after restarting it's still full of log > lines like :  no reply from externalip:6801 osd.21, which is a OSD on > another node and trying to connect

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Hector Martin
On 08/02/2019 20.54, Ashley Merrick wrote: > Yes that is all fine, the other 3 OSD's on the node work fine as expected, > > When I did the orginal OSD via ceph-deploy i used the external hostname > at the end of the command instead of the internal hostname, I then > deleted the OSD and zap'd the

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Ashley Merrick
I just tried that, nothing showing in ceph osd ls or ceph osd tree. Run the purge command, wiped the disk. However after re-creating the OSD it's still trying to connect via the external IP, I've looked to see if there is an option to specify the osd ID in ceph-deploy to try and use another ID

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Sage Weil
The IP that an OSD (or other non-monitor daemon) uses normally depends on what IP is used by the local host to reach the monitor(s). If you want your OSDs to be on a different network, generally the way to do that is to move the monitors to that network too. You can also try the

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks Marc and Burkhard. I think what I am learning is it’s best to copy between filesystems with cpio, if not impossible to do it any other way due to the “fs metadata in first pool” problem. FWIW, the mimic docs still describe how to create a differently named cluster on the same hardware.

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Jason Dillaman
Indeed, it is forthcoming in the Nautilus release. You would initiate a "rbd migration prepare " to transparently link the dst-image-spec to the src-image-spec. Any active Nautilus clients against the image will then re-open the dst-image-spec for all IO operations. Read requests that cannot be

Re: [ceph-users] best practices for EC pools

2019-02-08 Thread Scheurer François
Thank you Caspar for your corrections! > EC requires K+1 nodes to allow writes, so every IO freezes (until all > affected PG's are recovered to at least K+1) I was not aware of this. This is quite important to know, many thanks. -survive the loss of max 3 nodes, if the recovery has enough

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Ashley Merrick
Yes that is all fine, the other 3 OSD's on the node work fine as expected, When I did the orginal OSD via ceph-deploy i used the external hostname at the end of the command instead of the internal hostname, I then deleted the OSD and zap'd the disk and re-added using the internal hostname + the

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Hector Martin
On 08/02/2019 19.29, Marc Roos wrote: > > > Yes that is thus a partial move, not the behaviour you expect from a mv > command. (I think this should be changed) CephFS lets you put *data* in separate pools, but not *metadata*. Also, I think you can't remove the original/default data pool. The

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-08 Thread Alexandre DERUMIER
I'm just seeing StupidAllocator::_aligned_len and btree::btree_iterator, mempoo on 1 osd, both 10%. here the dump_mempools { "mempool": { "by_pool": { "bloom_filter": { "items": 0, "bytes": 0 },

[ceph-users] MDS crash (Mimic 13.2.2 / 13.2.4 ) elist.h: 39: FAILED assert(!is_on_list())

2019-02-08 Thread Jake Grimmett
Dear All, Unfortunately the MDS has crashed on our Mimic cluster... First symptoms were rsync giving: "No space left on device (28)" when trying to rename or delete This prompted me to try restarting the MDS, as it reported laggy. Restarting the MDS, shows this as error in the log before the

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Jason Dillaman
On Fri, Feb 8, 2019 at 11:43 AM Luis Periquito wrote: > > This is indeed for an OpenStack cloud - it didn't require any level of > performance (so was created on an EC pool) and now it does :( > > So the idea would be: 0 - upgrade OSDs and librbd clients to Nautilus > 1- create a new pool Are

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-08 Thread Alexandre DERUMIER
another mempool dump after 1h run. (latency ok) Biggest difference: before restart - "bluestore_cache_other": { "items": 48661920, "bytes": 1539544228 }, "bluestore_cache_data": { "items": 54, "bytes": 643072 }, (other caches seem to be quite low too, like

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Luis Periquito
This is indeed for an OpenStack cloud - it didn't require any level of performance (so was created on an EC pool) and now it does :( So the idea would be: 1- create a new pool 2- change cinder to use the new pool for each volume 3- stop the usage of the volume (stop the instance?) 4- "live

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-02-08 Thread Alexandre DERUMIER
>>hmm, so fragmentation grows eventually and drops on OSD restarts, isn't >>it? yes >>The same for other OSDs? yes >>Wondering if you have OSD mempool monitoring (dump_mempools command >>output on admin socket) reports? Do you have any historic data? not currently (I only have perf dump),

Re: [ceph-users] pool/volume live migration

2019-02-08 Thread Jason Dillaman
Correction: at least for the initial version of live-migration, you need to temporarily stop clients that are using the image, execute "rbd migration prepare", and then restart the clients against the new destination image. The "prepare" step will fail if it detects that the source image is

Re: [ceph-users] change OSD IP it uses

2019-02-08 Thread Ashley Merrick
All fixed was partly with the above, and partly me just missing something. Thanks all for your help! ,Ash On Fri, Feb 8, 2019 at 10:46 PM Sage Weil wrote: > The IP that an OSD (or other non-monitor daemon) uses normally depends on > what IP is used by the local host to reach the monitor(s).

[ceph-users] Controlling CephFS hard link "primary name" for recursive stat

2019-02-08 Thread Hector Martin
Hi list, As I understand it, CephFS implements hard links as effectively "smart soft links", where one link is the primary for the inode and the others effectively reference it. When it comes to directories, the size for a hardlinked file is only accounted for in recursive stats for the "primary"

Re: [ceph-users] Multicast communication compuverde

2019-02-08 Thread Robin H. Johnson
On Wed, Feb 06, 2019 at 11:49:28AM +0200, Maged Mokhtar wrote: > It could be used for sending cluster maps or other configuration in a > push model, i believe corosync uses this by default. For use in sending > actual data during write ops, a primary osd can send to its replicas, > they do not

Re: [ceph-users] Debugging 'slow requests' ...

2019-02-08 Thread Brad Hubbard
Try capturing another log with debug_ms turned up. 1 or 5 should be Ok to start with. On Fri, Feb 8, 2019 at 8:37 PM Massimo Sgaravatto wrote: > > Our Luminous ceph cluster have been worked without problems for a while, but > in the last days we have been suffering from continuous slow

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks Hector. So many things going through my head and I totally forgot to explore if just turning off the warnings (if only until I get more disks) was an option. This is 1000% more sensible for sure. > On Feb 8, 2019, at 7:19 PM, Hector Martin wrote: > > My practical suggestion would be

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Brian Topping
Thanks again to Jan, Burkhard, Marc and Hector for responses on this. To review, I am removing OSDs from a small cluster and running up against the “too many PGs per OSD problem due to lack of clarity. Here’s a summary of what I have collected on it: The CephFS data pool can’t be changed, only

Re: [ceph-users] Downsizing a cephfs pool

2019-02-08 Thread Hector Martin
My practical suggestion would be to do nothing for now (perhaps tweaking the config settings to shut up the warnings about PGs per OSD). Ceph will gain the ability to downsize pools soon, and in the meantime, anecdotally, I have a production cluster where we overshot the current recommendation by