Re: [ceph-users] Best version and SO for CefhFS
Ok, thanks. The I'll use standby-replay mode (typo error on other mail). Greetings!! El mié., 10 oct. 2018 a las 13:06, Sergey Malinin () escribió: > Standby MDS is required for HA. It can be configured in standby-replay > mode for faster failover. Otherwise, replaying the journal is incurred > which can take somewhat longer. > > > On 10.10.2018, at 13:57, Daniel Carrasco wrote: > > Thanks for your response. > > I'll point in that direction. > I also need a fast recovery in case that MDS die so, Standby MDS are > recomended or recovery is fast enought to be useful? > > Greetings! > > El mié., 10 oct. 2018 a las 12:26, Sergey Malinin () > escribió: > >> >> >> On 10.10.2018, at 10:49, Daniel Carrasco wrote: >> >> >>- Wich is the best configuration to avoid that MDS problems. >> >> Single active MDS with lots of RAM. >> >> > > -- > _ > > Daniel Carrasco Marín > Ingeniería para la Innovación i2TIC, S.L. > Tlf: +34 911 12 32 84 Ext: 223 > www.i2tic.com > _ > > > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Best version and SO for CefhFS
Thanks for your response. I'll point in that direction. I also need a fast recovery in case that MDS die so, Standby MDS are recomended or recovery is fast enought to be useful? Greetings! El mié., 10 oct. 2018 a las 12:26, Sergey Malinin () escribió: > > > On 10.10.2018, at 10:49, Daniel Carrasco wrote: > > >- Wich is the best configuration to avoid that MDS problems. > > Single active MDS with lots of RAM. > > -- _________ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Best version and SO for CefhFS
Hello, I'm trying to create a simple cluster to archieve HA on a webpage: - Three nodes with MDS, OSD, MON, y MGR - Replication factor of three (one copy on every node) - Two active and a backup MDS to allow a fail of one server - CephFS mounted using kernel driver - One disk by node of less than 500GB I've already tested other solutions like EFS, GlusterFS, NFS Master-Slave, and all are slower than CephFS or don't have HA (NFS). For now I've got a lot of troubles with FS (all MDS related), like high memory consumption caused by memory leaks (12.2.4), slow MDS requests after update and after a few hours (12.2.8, resolved restarting the MDS)... and for now I had to remove the entire cluster and mount a DRBD cluster (wich is not as good as Ceph and don't have HA). With all this problems my questions are: - Wich is the best free OS to create an small cluster?. For now i've used only debian based (Debian 9 and Ubuntu 16.04). - Wich is the vest Ceph version?. Maybe the 10.2.10 is more stable? - Wich is the best configuration to avoid that MDS problems. Our servers have 8GB of RAM, but is shared with other daemons that uses about 4GB. Thanks! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs
El lun., 8 oct. 2018 5:44, Yan, Zheng escribió: > On Mon, Oct 8, 2018 at 11:34 AM Daniel Carrasco > wrote: > > > > I've got several problems on 12.2.8 too. All my standby MDS uses a lot > of memory (while active uses normal memory), and I'm receiving a lot of > slow MDS messages (causing the webpage to freeze and fail until MDS are > restarted)... Finally I had to copy the entire site to DRBD and use NFS to > solve all problems... > > > > was standby-replay enabled? > I've tried both and I've seen more less the same behavior, maybe less when is not in replay mode. Anyway, we've deactivated CephFS for now there. I'll try with older versions on a test environment > > El lun., 8 oct. 2018 a las 5:21, Alex Litvak (< > alexander.v.lit...@gmail.com>) escribió: > >> > >> How is this not an emergency announcement? Also I wonder if I can > >> downgrade at all ? I am using ceph with docker deployed with > >> ceph-ansible. I wonder if I should push downgrade or basically wait for > >> the fix. I believe, a fix needs to be provided. > >> > >> Thank you, > >> > >> On 10/7/2018 9:30 PM, Yan, Zheng wrote: > >> > There is a bug in v13.2.2 mds, which causes decoding purge queue to > >> > fail. If mds is already in damaged state, please downgrade mds to > >> > 13.2.1, then run 'ceph mds repaired fs_name:damaged_rank' . > >> > > >> > Sorry for all the trouble I caused. > >> > Yan, Zheng > >> > > >> > >> > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > -- > > _ > > > > Daniel Carrasco Marín > > Ingeniería para la Innovación i2TIC, S.L. > > Tlf: +34 911 12 32 84 Ext: 223 > > www.i2tic.com > > _ > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs
I've got several problems on 12.2.8 too. All my standby MDS uses a lot of memory (while active uses normal memory), and I'm receiving a lot of slow MDS messages (causing the webpage to freeze and fail until MDS are restarted)... Finally I had to copy the entire site to DRBD and use NFS to solve all problems... El lun., 8 oct. 2018 a las 5:21, Alex Litvak () escribió: > How is this not an emergency announcement? Also I wonder if I can > downgrade at all ? I am using ceph with docker deployed with > ceph-ansible. I wonder if I should push downgrade or basically wait for > the fix. I believe, a fix needs to be provided. > > Thank you, > > On 10/7/2018 9:30 PM, Yan, Zheng wrote: > > There is a bug in v13.2.2 mds, which causes decoding purge queue to > > fail. If mds is already in damaged state, please downgrade mds to > > 13.2.1, then run 'ceph mds repaired fs_name:damaged_rank' . > > > > Sorry for all the trouble I caused. > > Yan, Zheng > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Connect client to cluster on other subnet
Hello, I've a Ceph cluster working on a subnet where clients on same subnet can connect without problem, but now I need to connect some clients that are on other subnet and I'm getting a connection timeout error. Both subnets are connected and I've disabled the FW for testing if maybe is blocker, but still failing. I'm able to connect to Ceph ports using telnet, and even I see how client connection is logged on server: 2018-08-23 14:41:09.937766 7ff80700 0 -- 10.22.0.168:6789/0 >> 10.20.0.185:0/1905845915 pipe(0x55a332b9d400 sd=10 :6789 s=0 pgs=0 cs=0 l=0 c=0x55a332b0bc20).accept peer addr is really 10.20.0.185:0/1905845915 (socket is -) But I still getting the timeout problem. The Ceph cluster is on 10.20.0.0/24 network, and client is on 10.22.0.0/24 network. My public network configuration is *public network = 10.20.0.0/24 <http://10.20.0.0/24>*. Maybe Is just adding the other subnet to public network (*public network = 10.20.0.0/24,10.22.0.0/24 <http://10.20.0.0/24,10.22.0.0/24>)*, and add *cluster network = 10.20.0.0/24 <http://10.20.0.0/24>* to config file, but is a production cluster and I need to be sure. Someone has tried it? Thanks!! -- _____ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Insane CPU utilization in ceph.fuse
Hello, just to report, Looks like change the message type to simple help to avoid the memory leak. Just about a day later the memory still OK: 1264 ceph 20 0 12,547g 1,247g 16652 S 3,3 8,2 110:16.93 ceph-mds The memory usage is more than 2x of MDS limit (512Mb), but maybe is the daemon overhead and the memory fragmentation. At least is not 13-15Gb like before. Greetings!! 2018-07-25 23:16 GMT+02:00 Daniel Carrasco : > I've changed the configuration adding your line and changing the mds > memory limit to 512Mb, and for now looks stable (its on about 3-6% and > sometimes even below 3%). I've got a very high usage on boot: > 1264 ceph 20 0 12,543g 6,251g 16184 S 2,0 41,1% 0:19.34 > ceph-mds > > but now looks acceptable: > 1264 ceph 20 0 12,543g 737952 16188 S 1,0 4,6% 0:41.05 > ceph-mds > > Anyway, I need time to test it, because 15 minutes is too less. > > Greetings!! > > 2018-07-25 17:16 GMT+02:00 Daniel Carrasco : > >> Hello, >> >> Thanks for all your help. >> >> The dd is an option of any command?, because at least on Debian/Ubuntu is >> an aplication to copy blocks, and then fails. >> For now I cannot change the configuration, but later I'll try. >> About the logs, I've not seen nothing about "warning", "error", "failed", >> "message" or something similar, so looks like there are no messages of that >> kind. >> >> >> Greetings!! >> >> 2018-07-25 14:48 GMT+02:00 Yan, Zheng : >> >>> On Wed, Jul 25, 2018 at 8:12 PM Yan, Zheng wrote: >>> > >>> > On Wed, Jul 25, 2018 at 5:04 PM Daniel Carrasco >>> wrote: >>> > > >>> > > Hello, >>> > > >>> > > I've attached the PDF. >>> > > >>> > > I don't know if is important, but I made changes on configuration >>> and I've restarted the servers after dump that heap file. I've changed the >>> memory_limit to 25Mb to test if stil with aceptable values of RAM. >>> > > >>> > >>> > Looks like there are memory leak in async messenger. what's output of >>> > "dd /usr/bin/ceph-mds"? Could you try simple messenger (add "ms type = >>> > simple" to 'global' section of ceph.conf) >>> > >>> >>> Besides, are there any suspicious messages in mds log? such as "failed >>> to decode message of type" >>> >>> >>> >>> >>> > Regards >>> > Yan, Zheng >>> > >>> > > Greetings! >>> > > >>> > > 2018-07-25 2:53 GMT+02:00 Yan, Zheng : >>> > >> >>> > >> On Wed, Jul 25, 2018 at 4:52 AM Daniel Carrasco < >>> d.carra...@i2tic.com> wrote: >>> > >> > >>> > >> > Hello, >>> > >> > >>> > >> > I've run the profiler for about 5-6 minutes and this is what I've >>> got: >>> > >> > >>> > >> >>> > >> please run pprof --pdf /usr/bin/ceph-mds >>> > >> /var/log/ceph/ceph-mds.x.profile..heap > >>> > >> /tmp/profile.pdf. and send me the pdf >>> > >> >>> > >> >>> > >> >>> > >> > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> > Using local file /usr/bin/ceph-mds. >>> > >> > Using local file /var/log/ceph/mds.kavehome-mgt >>> o-pro-fs01.profile.0009.heap. >>> > >> > Total: 400.0 MB >>> > >> >362.5 90.6% 90.6%362.5 90.6% >>> ceph::buffer::create_aligned_in_mempool >>> > >> > 20.4 5.1% 95.7% 29.8 7.5% CDir::_load_dentry >>> > >> > 5.9 1.5% 97.2% 6.9 1.7% CDir::add_primary_dentry >>> > >> > 4.7 1.2% 98.4% 4.7 1.2% >>> ceph::logging::Log::create_entry >>> > >> > 1.8 0.5% 98.8% 1.8 0.5% >>> std::_Rb_tree::_M_emplace_hint_unique >>> > >> > 1.8 0.5% 99.3% 2.2 0.5% compact_map_base::decode >>> > >> > 0.6 0.1% 99.4% 0
Re: [ceph-users] Insane CPU utilization in ceph.fuse
Hello, How many time is neccesary?, because is a production environment and memory profiler + low cache size because the problem, gives a lot of CPU usage from OSD and MDS that makes it fails while profiler is running. Is there any problem if is done in a low traffic time? (less usage and maybe it don't fails, but maybe less info about usage). Greetings! 2018-07-24 10:21 GMT+02:00 Yan, Zheng : > I mean: > > ceph tell mds.x heap start_profiler > > ... wait for some time > > ceph tell mds.x heap stop_profiler > > pprof --text /usr/bin/ceph-mds > /var/log/ceph/ceph-mds.x.profile..heap > > > > > On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco > wrote: > > > > This is what i get: > > > > > > > > > > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump > > 2018-07-24 09:05:19.350720 7fc562ffd700 0 client.1452545 > ms_handle_reset on 10.22.0.168:6800/1685786126 > > 2018-07-24 09:05:29.103903 7fc563fff700 0 client.1452548 > ms_handle_reset on 10.22.0.168:6800/1685786126 > > mds.kavehome-mgto-pro-fs01 dumping heap profile now. > > > > MALLOC: 760199640 ( 725.0 MiB) Bytes in use by application > > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist > > MALLOC: +246962320 ( 235.5 MiB) Bytes in central cache freelist > > MALLOC: + 43933664 ( 41.9 MiB) Bytes in transfer cache freelist > > MALLOC: + 41012664 ( 39.1 MiB) Bytes in thread cache freelists > > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata > > MALLOC: > > MALLOC: = 1102295200 ( 1051.2 MiB) Actual memory used (physical + swap) > > MALLOC: + 4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped) > > MALLOC: > > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space used > > MALLOC: > > MALLOC: 33027 Spans in use > > MALLOC: 19 Thread heaps in use > > MALLOC: 8192 Tcmalloc page size > > > > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > > Bytes released to the OS take up virtual address space but no physical > memory. > > > > > > > > > > > > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats > > 2018-07-24 09:14:25.747706 7f94f700 0 client.1452578 > ms_handle_reset on 10.22.0.168:6800/1685786126 > > 2018-07-24 09:14:25.754034 7f95057fa700 0 client.1452581 > ms_handle_reset on 10.22.0.168:6800/1685786126 > > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: > > > MALLOC: 960649328 ( 916.1 MiB) Bytes in use by application > > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist > > MALLOC: +108867288 ( 103.8 MiB) Bytes in central cache freelist > > MALLOC: + 37179424 ( 35.5 MiB) Bytes in transfer cache freelist > > MALLOC: + 40143000 ( 38.3 MiB) Bytes in thread cache freelists > > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata > > MALLOC: > > MALLOC: = 1157025952 ( 1103.4 MiB) Actual memory used (physical + swap) > > MALLOC: + 4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped) > > MALLOC: > > MALLOC: = 5370630304 ( 5121.8 MiB) Virtual address space used > > MALLOC: > > MALLOC: 33028 Spans in use > > MALLOC: 19 Thread heaps in use > > MALLOC: 8192 Tcmalloc page size > > > > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > > Bytes released to the OS take up virtual address space but no physical > memory. > > > > > > > > > > After heap release: > > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats > > 2018-07-24 09:15:28.540203 7f2f7affd700 0 client.1443339 > ms_handle_reset on 10.22.0.168:6800/1685786126 > > 2018-07-24 09:15:28.547153 7f2f7bfff700 0 client.1443342 > ms_handle_reset on 10.22.0.168:6
Re: [ceph-users] Insane CPU utilization in ceph.fuse
t; > http://docs.ceph.com/docs/mimic/rados/troubleshooting/memory-profiling/ > On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco > wrote: > > > > Yeah, is also my thread. This thread was created before lower the cache > size from 512Mb to 8Mb. I thought that maybe was my fault and I did a > misconfiguration, so I've ignored the problem until now. > > > > Greetings! > > > > El mar., 24 jul. 2018 1:00, Gregory Farnum > escribió: > >> > >> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly > wrote: > >>> > >>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco > wrote: > >>> > Hi, thanks for your response. > >>> > > >>> > Clients are about 6, and 4 of them are the most of time on standby. > Only two > >>> > are active servers that are serving the webpage. Also we've a > varnish on > >>> > front, so are not getting all the load (below 30% in PHP is not > much). > >>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. > >>> > >>> What! Please post `ceph daemon mds. config diff`, `... perf > >>> dump`, and `... dump_mempools ` from the server the active MDS is on. > >>> > >>> > I've tested > >>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage > grows up to > >>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, > at least > >>> > the memory usage is stable on less than 6Gb (now is using about 1GB > of RAM). > >>> > >>> We've seen reports of possible memory leaks before and the potential > >>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB? > >>> Your MDS cache size should be configured to 1-8GB (depending on your > >>> preference) so it's disturbing to see you set it so low. > >> > >> > >> See also the thread "[ceph-users] Fwd: MDS memory usage is very high", > which had more discussion of that. The MDS daemon seemingly had 9.5GB of > allocated RSS but only believed 489MB was in use for the cache... > >> -Greg > >> > >>> > >>> > >>> -- > >>> Patrick Donnelly > >>> ___ > >>> ceph-users mailing list > >>> ceph-users@lists.ceph.com > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Insane CPU utilization in ceph.fuse
Yeah, is also my thread. This thread was created before lower the cache size from 512Mb to 8Mb. I thought that maybe was my fault and I did a misconfiguration, so I've ignored the problem until now. Greetings! El mar., 24 jul. 2018 1:00, Gregory Farnum escribió: > On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly > wrote: > >> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco >> wrote: >> > Hi, thanks for your response. >> > >> > Clients are about 6, and 4 of them are the most of time on standby. >> Only two >> > are active servers that are serving the webpage. Also we've a varnish on >> > front, so are not getting all the load (below 30% in PHP is not much). >> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. >> >> What! Please post `ceph daemon mds. config diff`, `... perf >> dump`, and `... dump_mempools ` from the server the active MDS is on. >> >> > I've tested >> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows >> up to >> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at >> least >> > the memory usage is stable on less than 6Gb (now is using about 1GB of >> RAM). >> >> We've seen reports of possible memory leaks before and the potential >> fixes for those were in 12.2.6. How fast does your MDS reach 15GB? >> Your MDS cache size should be configured to 1-8GB (depending on your >> preference) so it's disturbing to see you set it so low. >> > > See also the thread "[ceph-users] Fwd: MDS memory usage is very high", > which had more discussion of that. The MDS daemon seemingly had 9.5GB of > allocated RSS but only believed 489MB was in use for the cache... > -Greg > > >> >> -- >> Patrick Donnelly >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Insane CPU utilization in ceph.fuse
Hi, I forgot to say that maybe the Diff is lower than real (8Mb), because the memory usage was still high and i've prepared a new configuration with lower limit (5Mb). I've not reloaded the daemons for now, but maybe the configuration was loaded again today and that's the reason why is using less than 1Gb of RAM just now. Of course I've not rebooted the machine, but maybe if the daemon was killed for high memory usage then the new configuration is loaded now. Greetings! 2018-07-23 21:07 GMT+02:00 Daniel Carrasco : > Thanks!, > > It's true that I've seen a continuous memory growth, but I've not thought > in a memory leak. I don't remember exactly how many hours were neccesary to > fill the memory, but I calculate that were about 14h. > > With the new configuration looks like memory grows slowly and when it > reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and > down again to less than 1Gb grown again to 5-6Gb slowly. > > Just today I don't know why and how, because I've not changed anything on > the ceph cluster, but the memory has down to less than 1 Gb and still there > 8 hours later. I've only deployed a git repository with some changes. > > I've some nodes on version 12.2.5 because I've detected this problem and I > didn't know if was for the latest version, so I've stopped the update. The > one that is the active MDS is on latest version (12.2.7), and I've > programmed an update for the rest of nodes the thursday. > > A graphic of the memory usage of latest days with that configuration: > https://imgur.com/a/uSsvBi4 > > I haven't info about when the problem was worst (512MB of MDS memory limit > and 15-16Gb of usage), because memory usage was not logged. I've only a > heap stats from that were dumped when the daemon was in progress to fill > the memory: > > # ceph tell mds.kavehome-mgto-pro-fs01 heap stats > 2018-07-19 00:43:46.142560 7f5a7a7fc700 0 client.1318388 ms_handle_reset > on 10.22.0.168:6800/1129848128 > 2018-07-19 00:43:46.181133 7f5a7b7fe700 0 client.1318391 ms_handle_reset > on 10.22.0.168:6800/1129848128 > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: > > MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist > MALLOC: +172148208 ( 164.2 MiB) Bytes in central cache freelist > MALLOC: + 19031168 ( 18.1 MiB) Bytes in transfer cache freelist > MALLOC: + 23987552 ( 22.9 MiB) Bytes in thread cache freelists > MALLOC: + 20869280 ( 19.9 MiB) Bytes in malloc metadata > MALLOC: > MALLOC: = 10219016352 ( 9745.6 MiB) Actual memory used (physical + swap) > MALLOC: + 3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped) > MALLOC: > MALLOC: = 14132703392 (13478.0 MiB) Virtual address space used > MALLOC: > MALLOC: 63875 Spans in use > MALLOC: 16 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > Bytes released to the OS take up virtual address space but no physical > memory. > > > > Here's the Diff: > > > { > "diff": { > "current": { > "admin_socket": "/var/run/ceph/ceph-mds. > kavehome-mgto-pro-fs01.asok", > "auth_client_required": "cephx", > "bluestore_cache_size_hdd": "80530636", > "bluestore_cache_size_ssd": "80530636", > "err_to_stderr": "true", > "fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8", > "internal_safe_to_start_threads": "true", > "keyring": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/ > keyring", > "log_file": "/var/log/ceph/ceph-mds. > kavehome-mgto-pro-fs01.log", > "log_max_recent": "1", > "log_to_stderr": "false", > "mds_cache_memory_limit": "53687091", > "mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01", > "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01", > "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log > cluster=/var/log/ceph/ceph.log&qu
Re: [ceph-users] Fwd: MDS memory usage is very high
Hi, I forgot to say that maybe the Diff is lower than real (8Mb), because the memory usage was still high and i've prepared a new configuration with lower limit (5Mb). I've not reloaded the daemons for now, but maybe the configuration was loaded again today and that's the reason why is using less than 1Gb of RAM just now. Of course I've not rebooted the machine, but maybe if the daemon was killed for high memory usage then the new configuration is loaded now. Greetings! 2018-07-19 11:35 GMT+02:00 Daniel Carrasco : > Hello again, > > It is still early to say that is working fine now, but looks like the MDS > memory is now under 20% of RAM and the most of time between 6-9%. Maybe was > a mistake on configuration. > > As appointment, I've changed this client config: > [global] > ... > bluestore_cache_size_ssd = 805306360 > bluestore_cache_size_hdd = 805306360 > mds_cache_memory_limit = 536870910 > > [client] > client_reconnect_stale = true > client_cache_size = 32768 > client_mount_timeout = 30 > client_oc_max_objects = 2000 > client_oc_size = 629145600 > rbd_cache = true > rbd_cache_size = 671088640 > > > for this (just client cache sizes / 10): > [global] > ... > bluestore_cache_size_ssd = 80530636 > bluestore_cache_size_hdd = 80530636 > mds_cache_memory_limit = 53687091 > > [client] > client_cache_size = 32768 > client_mount_timeout = 30 > client_oc_max_objects = 2000 > client_oc_size = 62914560 > rbd_cache = true > rbd_cache_size = 67108864 > > > > Now the heap stats are: > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: > > MALLOC: 714063568 ( 681.0 MiB) Bytes in use by application > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist > MALLOC: +132992224 ( 126.8 MiB) Bytes in central cache freelist > MALLOC: + 21929920 ( 20.9 MiB) Bytes in transfer cache freelist > MALLOC: + 31806608 ( 30.3 MiB) Bytes in thread cache freelists > MALLOC: + 30666912 ( 29.2 MiB) Bytes in malloc metadata > MALLOC: > MALLOC: =931459232 ( 888.3 MiB) Actual memory used (physical + swap) > MALLOC: + 21886803968 (20872.9 MiB) Bytes released to OS (aka unmapped) > MALLOC: > MALLOC: = 22818263200 (21761.2 MiB) Virtual address space used > MALLOC: > MALLOC: 21311 Spans in use > MALLOC: 18 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > Bytes released to the OS take up virtual address space but no physical > memory. > > And sometimes even better (taken later than above): > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: > > MALLOC: 516434072 ( 492.5 MiB) Bytes in use by application > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist > MALLOC: + 7564936 (7.2 MiB) Bytes in central cache freelist > MALLOC: + 2751072 (2.6 MiB) Bytes in transfer cache freelist > MALLOC: + 2707072 (2.6 MiB) Bytes in thread cache freelists > MALLOC: + 2715808 (2.6 MiB) Bytes in malloc metadata > MALLOC: > MALLOC: =532172960 ( 507.5 MiB) Actual memory used (physical + swap) > MALLOC: + 573440 (0.5 MiB) Bytes released to OS (aka unmapped) > MALLOC: > MALLOC: =532746400 ( 508.1 MiB) Virtual address space used > MALLOC: > MALLOC: 21990 Spans in use > MALLOC: 16 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > -------- > Call ReleaseFreeMemory() to release freelist memory to the OS (via > madvise()). > Bytes released to the OS take up virtual address space but no physical > memory. > > > Greetings!! > > 2018-07-19 10:24 GMT+02:00 Daniel Carrasco : > >> Hello, >> >> Finally I've to remove CephFS and use a simple NFS, because the MDS >> daemon starts to use a lot of memory and is unstable. After reboot one node >> because it started to swap (the cluster will be able to survive without a >> node), the cluster goes down because one of the other MDS starts to use >> about 15Gb of RAM and crash all the time, so the cluster is unable to come >> back. The only solution is to reboot all nodes and is not good for HA. >> >> If somebody knows something about this, I'll be pleased to test it on a >> test environment to see if we can find a solution. >> >> Greetings! >> >> 2018-
Re: [ceph-users] Insane CPU utilization in ceph.fuse
, "osdop_sparse_read": 0, "osdop_clonerange": 0, "osdop_getxattr": 100784, "osdop_setxattr": 0, "osdop_cmpxattr": 0, "osdop_rmxattr": 0, "osdop_resetxattrs": 0, "osdop_tmap_up": 0, "osdop_tmap_put": 0, "osdop_tmap_get": 0, "osdop_call": 0, "osdop_watch": 0, "osdop_notify": 0, "osdop_src_cmpxattr": 0, "osdop_pgls": 0, "osdop_pgls_filter": 0, "osdop_other": 3, "linger_active": 0, "linger_send": 0, "linger_resend": 0, "linger_ping": 0, "poolop_active": 0, "poolop_send": 0, "poolop_resend": 0, "poolstat_active": 0, "poolstat_send": 0, "poolstat_resend": 0, "statfs_active": 0, "statfs_send": 0, "statfs_resend": 0, "command_active": 0, "command_send": 0, "command_resend": 0, "map_epoch": 468, "map_full": 0, "map_inc": 39, "osd_sessions": 3, "osd_session_open": 479, "osd_session_close": 476, "osd_laggy": 0, "omap_wr": 7, "omap_rd": 202074, "omap_del": 1 }, "purge_queue": { "pq_executing_ops": 0, "pq_executing": 0, "pq_executed": 124 }, "throttle-msgr_dispatch_throttler-mds": { "val": 0, "max": 104857600, "get_started": 0, "get": 6140428, "get_sum": 2077944682, "get_or_fail_fail": 0, "get_or_fail_success": 6140428, "take": 0, "take_sum": 0, "put": 6140428, "put_sum": 2077944682, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } }, "throttle-objecter_bytes": { "val": 0, "max": 104857600, "get_started": 0, "get": 0, "get_sum": 0, "get_or_fail_fail": 0, "get_or_fail_success": 0, "take": 136767, "take_sum": 339484250, "put": 136523, "put_sum": 339484250, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } }, "throttle-objecter_ops": { "val": 0, "max": 1024, "get_started": 0, "get": 0, "get_sum": 0, "get_or_fail_fail": 0, "get_or_fail_success": 0, "take": 136767, "take_sum": 136767, "put": 136767, "put_sum": 136767, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } }, "throttle-write_buf_throttle": { "val": 0, "max": 3758096384, "get_started": 0, "get": 124, "get_sum": 11532, "get_or_fail_fail": 0, "get_or_fail_success": 124, "take": 0, "take_sum": 0, "put": 109, "put_sum": 11532, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } }, "throttle-write_buf_throttle-0x55faf5ba4220": { "val": 0, "max": 3758096384, "get_started": 0, "get": 125666, "get_sum": 198900816, "get_or_fail_fail": 0, "get_or_fail_success": 125666, "take": 0, "take_sum": 0, "put": 23473, "put_sum": 198900816, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } } } -- dump_mempools -- { "bloom_filter": { "items": 120, "bytes": 120 }, "bluestore_alloc": { "items": 0, "bytes": 0 }, "bluestore_cache_data": { "items": 0, "bytes": 0 }, "bluestore_cache_onode": { "items": 0, "bytes": 0 }, "bluestore_cache_other": { "items": 0, "bytes": 0 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 0, "bytes": 0 }, "bluestore_writing_deferred": { "items": 0, "bytes": 0 }, "bluestore_writing": { "items": 0, "bytes": 0 }, "bluefs": { "items": 0, "bytes": 0 }, "buffer_anon": { "items": 96401, "bytes": 16010198 }, "buffer_meta": { "items": 1, "bytes": 88 }, "osd": { "items": 0, "bytes": 0 }, "osd_mapbl": { "items": 0, "bytes": 0 }, "osd_pglog": { "items": 0, "bytes": 0 }, "osdmap": { "items": 80, "bytes": 3296 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 17604, "bytes": 2330840 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 }, "total": { "items": 114206, "bytes": 18344542 } } --- Sorry for my english!. Greetings!! El 23 jul. 2018 20:08, "Patrick Donnelly" escribió: On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco wrote: > Hi, thanks for your response. > > Clients are about 6, and 4 of them are the most of time on standby. Only two > are active servers that are serving the webpage. Also we've a varnish on > front, so are not getting all the load (below 30% in PHP is not much). > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. What! Please post `ceph daemon mds. config diff`, `... perf dump`, and `... dump_mempools ` from the server the active MDS is on. > I've tested > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up to > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at least > the memory usage is stable on less than 6Gb (now is using about 1GB of RAM). We've seen reports of possible memory leaks before and the potential fixes for those were in 12.2.6. How fast does your MDS reach 15GB? Your MDS cache size should be configured to 1-8GB (depending on your preference) so it's disturbing to see you set it so low. -- Patrick Donnelly ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Insane CPU utilization in ceph.fuse
Hi, thanks for your response. Clients are about 6, and 4 of them are the most of time on standby. Only two are active servers that are serving the webpage. Also we've a varnish on front, so are not getting all the load (below 30% in PHP is not much). About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. I've tested also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up to 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at least the memory usage is stable on less than 6Gb (now is using about 1GB of RAM). What catches my attention, is the huge difference between kernel and fuse. Why the kernel client is not notizable and the fuse client is using the most of CPU power... Greetings. 2018-07-23 14:01 GMT+02:00 Paul Emmerich : > Hi, > > do you happen to have a relatively large number of clients and a > relatively small cache size on the MDS? > > > Paul > > 2018-07-23 13:16 GMT+02:00 Daniel Carrasco : > >> Hello, >> >> I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds >> with two active). This cluster is for mainly for server a webpage (small >> files) and is configured to have three copies of files (a copy on every >> OSD). >> My question is about ceph.fuse clients: I've noticed an insane CPU usage >> when the fuse client is used, while the kernel client usage is unnoticeable. >> >> For example, now i've that machines working with kernel client and the >> CPU usage is less than 30% (all used by php processes). When I change to >> ceph.fuse the CPU usage raise to more than 130% and even sometimes up to >> 190-200% (on a two cores machines means burn the CPU). >> >> Now I've seen two warnings on the cluster: >> 1 MDSs report oversized cache >> 4 clients failing to respond to cache pressure >> >> and I think that maybe is a lack of capabilities on ceph kernel modules, >> so I want to give a try to fuse module but I've the above problem. >> >> My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph >> server/client version is 12.2.7. >> >> How I can debug why that CPU usage?. >> >> Thanks! >> -- >> _ >> >> Daniel Carrasco Marín >> Ingeniería para la Innovación i2TIC, S.L. >> Tlf: +34 911 12 32 84 Ext: 223 >> www.i2tic.com >> _ >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > <https://maps.google.com/?q=Freseniusstr.+31h+81247+M%C3%BCnchen=gmail=g> > 81247 München > <https://maps.google.com/?q=Freseniusstr.+31h+81247+M%C3%BCnchen=gmail=g> > www.croit.io > Tel: +49 89 1896585 90 > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Insane CPU utilization in ceph.fuse
Hello, I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds with two active). This cluster is for mainly for server a webpage (small files) and is configured to have three copies of files (a copy on every OSD). My question is about ceph.fuse clients: I've noticed an insane CPU usage when the fuse client is used, while the kernel client usage is unnoticeable. For example, now i've that machines working with kernel client and the CPU usage is less than 30% (all used by php processes). When I change to ceph.fuse the CPU usage raise to more than 130% and even sometimes up to 190-200% (on a two cores machines means burn the CPU). Now I've seen two warnings on the cluster: 1 MDSs report oversized cache 4 clients failing to respond to cache pressure and I think that maybe is a lack of capabilities on ceph kernel modules, so I want to give a try to fuse module but I've the above problem. My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph server/client version is 12.2.7. How I can debug why that CPU usage?. Thanks! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: MDS memory usage is very high
Hello again, It is still early to say that is working fine now, but looks like the MDS memory is now under 20% of RAM and the most of time between 6-9%. Maybe was a mistake on configuration. As appointment, I've changed this client config: [global] ... bluestore_cache_size_ssd = 805306360 bluestore_cache_size_hdd = 805306360 mds_cache_memory_limit = 536870910 [client] client_reconnect_stale = true client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 629145600 rbd_cache = true rbd_cache_size = 671088640 for this (just client cache sizes / 10): [global] ... bluestore_cache_size_ssd = 80530636 bluestore_cache_size_hdd = 80530636 mds_cache_memory_limit = 53687091 [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 62914560 rbd_cache = true rbd_cache_size = 67108864 Now the heap stats are: mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: MALLOC: 714063568 ( 681.0 MiB) Bytes in use by application MALLOC: +0 (0.0 MiB) Bytes in page heap freelist MALLOC: +132992224 ( 126.8 MiB) Bytes in central cache freelist MALLOC: + 21929920 ( 20.9 MiB) Bytes in transfer cache freelist MALLOC: + 31806608 ( 30.3 MiB) Bytes in thread cache freelists MALLOC: + 30666912 ( 29.2 MiB) Bytes in malloc metadata MALLOC: MALLOC: =931459232 ( 888.3 MiB) Actual memory used (physical + swap) MALLOC: + 21886803968 (20872.9 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 22818263200 (21761.2 MiB) Virtual address space used MALLOC: MALLOC: 21311 Spans in use MALLOC: 18 Thread heaps in use MALLOC: 8192 Tcmalloc page size Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. And sometimes even better (taken later than above): mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: MALLOC: 516434072 ( 492.5 MiB) Bytes in use by application MALLOC: +0 (0.0 MiB) Bytes in page heap freelist MALLOC: + 7564936 (7.2 MiB) Bytes in central cache freelist MALLOC: + 2751072 (2.6 MiB) Bytes in transfer cache freelist MALLOC: + 2707072 (2.6 MiB) Bytes in thread cache freelists MALLOC: + 2715808 (2.6 MiB) Bytes in malloc metadata MALLOC: MALLOC: =532172960 ( 507.5 MiB) Actual memory used (physical + swap) MALLOC: + 573440 (0.5 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: =532746400 ( 508.1 MiB) Virtual address space used MALLOC: MALLOC: 21990 Spans in use MALLOC: 16 Thread heaps in use MALLOC: 8192 Tcmalloc page size Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. Greetings!! 2018-07-19 10:24 GMT+02:00 Daniel Carrasco : > Hello, > > Finally I've to remove CephFS and use a simple NFS, because the MDS daemon > starts to use a lot of memory and is unstable. After reboot one node > because it started to swap (the cluster will be able to survive without a > node), the cluster goes down because one of the other MDS starts to use > about 15Gb of RAM and crash all the time, so the cluster is unable to come > back. The only solution is to reboot all nodes and is not good for HA. > > If somebody knows something about this, I'll be pleased to test it on a > test environment to see if we can find a solution. > > Greetings! > > 2018-07-19 1:07 GMT+02:00 Daniel Carrasco : > >> Thanks again, >> >> I was trying to use fuse client instead Ubuntu 16.04 kernel module to see >> if maybe is a client side problem, but CPU usage on fuse client is very >> high (a 100% and even more in a two cores machine), so I'd to rever to >> kernel client that uses much less CPU. >> >> Is a web server, so maybe the problem is that. PHP and Nginx should open >> a lot of files and maybe that uses a lot of RAM. >> >> For now I've rebooted the machine because is the only way to free the >> memory, but I cannot restart the machine every few hours... >> >> Greetings!! >> >> 2018-07-19 1:00 GMT+02:00 Gregory Farnum : >> >>> Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of >>> the cache! Hopefully one of the current FS users or devs has some idea. All >>> I can suggest is looking to see if there are a bunch of stuck requests or &
Re: [ceph-users] Fwd: MDS memory usage is very high
Hello, Finally I've to remove CephFS and use a simple NFS, because the MDS daemon starts to use a lot of memory and is unstable. After reboot one node because it started to swap (the cluster will be able to survive without a node), the cluster goes down because one of the other MDS starts to use about 15Gb of RAM and crash all the time, so the cluster is unable to come back. The only solution is to reboot all nodes and is not good for HA. If somebody knows something about this, I'll be pleased to test it on a test environment to see if we can find a solution. Greetings! 2018-07-19 1:07 GMT+02:00 Daniel Carrasco : > Thanks again, > > I was trying to use fuse client instead Ubuntu 16.04 kernel module to see > if maybe is a client side problem, but CPU usage on fuse client is very > high (a 100% and even more in a two cores machine), so I'd to rever to > kernel client that uses much less CPU. > > Is a web server, so maybe the problem is that. PHP and Nginx should open a > lot of files and maybe that uses a lot of RAM. > > For now I've rebooted the machine because is the only way to free the > memory, but I cannot restart the machine every few hours... > > Greetings!! > > 2018-07-19 1:00 GMT+02:00 Gregory Farnum : > >> Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of >> the cache! Hopefully one of the current FS users or devs has some idea. All >> I can suggest is looking to see if there are a bunch of stuck requests or >> something that are taking up memory which isn’t properly counted. >> >> On Wed, Jul 18, 2018 at 3:48 PM Daniel Carrasco >> wrote: >> >>> Hello, thanks for your response. >>> >>> This is what I get: >>> >>> # ceph tell mds.kavehome-mgto-pro-fs01 heap stats >>> 2018-07-19 00:43:46.142560 7f5a7a7fc700 0 client.1318388 >>> ms_handle_reset on 10.22.0.168:6800/1129848128 >>> 2018-07-19 00:43:46.181133 7f5a7b7fe700 0 client.1318391 >>> ms_handle_reset on 10.22.0.168:6800/1129848128 >>> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: >>> >>> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application >>> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist >>> MALLOC: +172148208 ( 164.2 MiB) Bytes in central cache freelist >>> MALLOC: + 19031168 ( 18.1 MiB) Bytes in transfer cache freelist >>> MALLOC: + 23987552 ( 22.9 MiB) Bytes in thread cache freelists >>> MALLOC: + 20869280 ( 19.9 MiB) Bytes in malloc metadata >>> MALLOC: >>> MALLOC: = 10219016352 ( 9745.6 MiB) Actual memory used (physical + swap) >>> MALLOC: + 3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped) >>> MALLOC: >>> MALLOC: = 14132703392 (13478.0 MiB) Virtual address space used >>> MALLOC: >>> MALLOC: 63875 Spans in use >>> MALLOC: 16 Thread heaps in use >>> MALLOC: 8192 Tcmalloc page size >>> >>> Call ReleaseFreeMemory() to release freelist memory to the OS (via >>> madvise()). >>> Bytes released to the OS take up virtual address space but no physical >>> memory. >>> >>> >>> I've tried the release command but it keeps using the same memory. >>> >>> greetings! >>> >>> >>> 2018-07-19 0:25 GMT+02:00 Gregory Farnum : >>> >>>> The MDS think it's using 486MB of cache right now, and while that's >>>> not a complete accounting (I believe you should generally multiply by >>>> 1.5 the configured cache limit to get a realistic memory consumption >>>> model) it's obviously a long way from 12.5GB. You might try going in >>>> with the "ceph daemon" command and looking at the heap stats (I forget >>>> the exact command, but it will tell you if you run "help" against it) >>>> and seeing what those say — you may have one of the slightly-broken >>>> base systems and find that running the "heap release" (or similar >>>> wording) command will free up a lot of RAM back to the OS! >>>> -Greg >>>> >>>> On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco >>>> wrote: >>>> > Hello, >>>> > >>>> > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 >>>> MDS >>>> > actives), and I've noticed that MDS is using a lot of memory (just >>>> now is >>&
Re: [ceph-users] Fwd: MDS memory usage is very high
Thanks again, I was trying to use fuse client instead Ubuntu 16.04 kernel module to see if maybe is a client side problem, but CPU usage on fuse client is very high (a 100% and even more in a two cores machine), so I'd to rever to kernel client that uses much less CPU. Is a web server, so maybe the problem is that. PHP and Nginx should open a lot of files and maybe that uses a lot of RAM. For now I've rebooted the machine because is the only way to free the memory, but I cannot restart the machine every few hours... Greetings!! 2018-07-19 1:00 GMT+02:00 Gregory Farnum : > Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of > the cache! Hopefully one of the current FS users or devs has some idea. All > I can suggest is looking to see if there are a bunch of stuck requests or > something that are taking up memory which isn’t properly counted. > > On Wed, Jul 18, 2018 at 3:48 PM Daniel Carrasco > wrote: > >> Hello, thanks for your response. >> >> This is what I get: >> >> # ceph tell mds.kavehome-mgto-pro-fs01 heap stats >> 2018-07-19 00:43:46.142560 7f5a7a7fc700 0 client.1318388 ms_handle_reset >> on 10.22.0.168:6800/1129848128 >> 2018-07-19 00:43:46.181133 7f5a7b7fe700 0 client.1318391 ms_handle_reset >> on 10.22.0.168:6800/1129848128 >> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: >> >> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application >> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist >> MALLOC: +172148208 ( 164.2 MiB) Bytes in central cache freelist >> MALLOC: + 19031168 ( 18.1 MiB) Bytes in transfer cache freelist >> MALLOC: + 23987552 ( 22.9 MiB) Bytes in thread cache freelists >> MALLOC: + 20869280 ( 19.9 MiB) Bytes in malloc metadata >> MALLOC: >> MALLOC: = 10219016352 ( 9745.6 MiB) Actual memory used (physical + swap) >> MALLOC: + 3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped) >> MALLOC: >> MALLOC: = 14132703392 (13478.0 MiB) Virtual address space used >> MALLOC: >> MALLOC: 63875 Spans in use >> MALLOC: 16 Thread heaps in use >> MALLOC: 8192 Tcmalloc page size >> >> Call ReleaseFreeMemory() to release freelist memory to the OS (via >> madvise()). >> Bytes released to the OS take up virtual address space but no physical >> memory. >> >> >> I've tried the release command but it keeps using the same memory. >> >> greetings! >> >> >> 2018-07-19 0:25 GMT+02:00 Gregory Farnum : >> >>> The MDS think it's using 486MB of cache right now, and while that's >>> not a complete accounting (I believe you should generally multiply by >>> 1.5 the configured cache limit to get a realistic memory consumption >>> model) it's obviously a long way from 12.5GB. You might try going in >>> with the "ceph daemon" command and looking at the heap stats (I forget >>> the exact command, but it will tell you if you run "help" against it) >>> and seeing what those say — you may have one of the slightly-broken >>> base systems and find that running the "heap release" (or similar >>> wording) command will free up a lot of RAM back to the OS! >>> -Greg >>> >>> On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco >>> wrote: >>> > Hello, >>> > >>> > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS >>> > actives), and I've noticed that MDS is using a lot of memory (just now >>> is >>> > using 12.5GB of RAM): >>> > # ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c >>> '.mds_co'; >>> > ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss' >>> > {"items":9272259,"bytes":510032260} >>> > 12466648 >>> > >>> > I've configured the limit: >>> > mds_cache_memory_limit = 536870912 >>> > >>> > But looks like is ignored, because is about 512Mb and is using a lot >>> more. >>> > >>> > Is there any way to limit the memory usage of MDS, because is giving a >>> lot >>> > of troubles because start to swap. >>> > Maybe I've to limit the cached inodes? >>> > >>> > The other active MDS is using a lot less memory (2.5Gb). but also is >>> using >>>
Re: [ceph-users] Fwd: MDS memory usage is very high
Hello, thanks for your response. This is what I get: # ceph tell mds.kavehome-mgto-pro-fs01 heap stats 2018-07-19 00:43:46.142560 7f5a7a7fc700 0 client.1318388 ms_handle_reset on 10.22.0.168:6800/1129848128 2018-07-19 00:43:46.181133 7f5a7b7fe700 0 client.1318391 ms_handle_reset on 10.22.0.168:6800/1129848128 mds.kavehome-mgto-pro-fs01 tcmalloc heap stats: MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application MALLOC: +0 (0.0 MiB) Bytes in page heap freelist MALLOC: +172148208 ( 164.2 MiB) Bytes in central cache freelist MALLOC: + 19031168 ( 18.1 MiB) Bytes in transfer cache freelist MALLOC: + 23987552 ( 22.9 MiB) Bytes in thread cache freelists MALLOC: + 20869280 ( 19.9 MiB) Bytes in malloc metadata MALLOC: MALLOC: = 10219016352 ( 9745.6 MiB) Actual memory used (physical + swap) MALLOC: + 3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped) MALLOC: MALLOC: = 14132703392 (13478.0 MiB) Virtual address space used MALLOC: MALLOC: 63875 Spans in use MALLOC: 16 Thread heaps in use MALLOC: 8192 Tcmalloc page size Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. I've tried the release command but it keeps using the same memory. greetings! 2018-07-19 0:25 GMT+02:00 Gregory Farnum : > The MDS think it's using 486MB of cache right now, and while that's > not a complete accounting (I believe you should generally multiply by > 1.5 the configured cache limit to get a realistic memory consumption > model) it's obviously a long way from 12.5GB. You might try going in > with the "ceph daemon" command and looking at the heap stats (I forget > the exact command, but it will tell you if you run "help" against it) > and seeing what those say — you may have one of the slightly-broken > base systems and find that running the "heap release" (or similar > wording) command will free up a lot of RAM back to the OS! > -Greg > > On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco > wrote: > > Hello, > > > > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS > > actives), and I've noticed that MDS is using a lot of memory (just now is > > using 12.5GB of RAM): > > # ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c '.mds_co'; > > ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss' > > {"items":9272259,"bytes":510032260} > > 12466648 > > > > I've configured the limit: > > mds_cache_memory_limit = 536870912 > > > > But looks like is ignored, because is about 512Mb and is using a lot > more. > > > > Is there any way to limit the memory usage of MDS, because is giving a > lot > > of troubles because start to swap. > > Maybe I've to limit the cached inodes? > > > > The other active MDS is using a lot less memory (2.5Gb). but also is > using > > more than 512Mb. The standby MDS is not using memory it all. > > > > I'm using the version: > > ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous > > (stable). > > > > Thanks!! > > -- > > _ > > > > Daniel Carrasco Marín > > Ingeniería para la Innovación i2TIC, S.L. > > Tlf: +34 911 12 32 84 Ext: 223 > > www.i2tic.com > > _ > > > > > > > > -- > > _ > > > > Daniel Carrasco Marín > > Ingeniería para la Innovación i2TIC, S.L. > > Tlf: +34 911 12 32 84 Ext: 223 > > www.i2tic.com > > _ > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: MDS memory usage is very high
Hello, I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS actives), and I've noticed that MDS is using a lot of memory (just now is using 12.5GB of RAM): # ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c '.mds_co'; ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss' {"items":9272259,"bytes":510032260} 12466648 I've configured the limit: mds_cache_memory_limit = 536870912 But looks like is ignored, because is about 512Mb and is using a lot more. Is there any way to limit the memory usage of MDS, because is giving a lot of troubles because start to swap. Maybe I've to limit the cached inodes? The other active MDS is using a lot less memory (2.5Gb). but also is using more than 512Mb. The standby MDS is not using memory it all. I'm using the version: ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable). Thanks!! -- _____ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ -- _____ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Slow clients after git pull
Hello, Some data is not in git repository and also needs to be updated on all servers at same time (uploads...), that's why I'm searching for a centralized solution. I think I've found a "patch" to do it... All our server are connected to a manager, so I've created a task in that managet to stop nginx, umount the FS, remount the FS and then start the Nginx when the git repository is deployed. Look like it works as expected and with the cache I'm planing to add to webpage the impact should be minimal. Thaks and greetings!!. 2018-03-01 16:28 GMT+01:00 David Turner <drakonst...@gmail.com>: > This removes ceph completely, or any other networked storage, but git has > triggers. If your website is stopped in git and you just need to make sure > that nginx always has access to the latest data, just configure git > triggers to auto-update the repository when there is a commit to the > repository from elsewhere. This would be on local storage and remove a lot > of complexity. All front-end servers would update automatically via git. > > If something like that doesn't work, it would seem you have a workaround > that works for you. > > > On Thu, Mar 1, 2018, 10:12 AM Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> Hello, >> >> Our problem is that the webpage is on a autoscaling group, so the created >> machine is not always updated and needs to have the latest data always. >> I've tried several ways to do it: >> >>- Local Storage synced: Sometimes the sync fails and data is not >>updated >>- NFS: If NFS server goes down, all clients die >>- Two NFS Server synced+Monit: when a NFS server is down umount >>freezes and is not able to change to the other NFS server >>- GlusterFS: Too slow for webpages >> >> CephFS is near to NFS on speed and have auto recovery if one node goes >> down (clients connects to other MDS automatically). >> >> About to use RBD, my problem is that I need a FS, because Nginx is not >> able to read directly from Ceph in other ways. >> About S3 and similar, I've also tried AWS NFS method but is much slower >> (even more than GlusterFS). >> >> My problem is that CephFS fits what I need. >> >> Doing tests I've noticed that maybe the file is updated on ceph node >> while client has file sessions open, so until I remount the FS that >> sessions continue opened. When I open the files with vim I notice that is a >> bit slower while is updating the repository, but after the update it works >> as fast as before. >> >> It fails even on Jewel so I think that maybe the only way to do it is to >> create a task to remount the FS when I deploy. >> >> Greetings and thanks!! >> >> >> 2018-03-01 15:29 GMT+01:00 David Turner <drakonst...@gmail.com>: >> >>> Using CephFS for something like this is about the last thing I would >>> do. Does it need to be on a networked posix filesystem that can be mounted >>> on multiple machines at the same time? If so, then you're kinda stuck and >>> we can start looking at your MDS hardware and see if there are any MDS >>> settings that need to be configured differently for this to work. >>> >>> If you don't NEED CephFS, then I would recommend utilizing an RBD for >>> something like this. Its limitation is only being able to be mapped to 1 >>> server at a time, but that's decent enough for most failover scenarios for >>> build setups. If you need to failover, unmap it from the primary and map >>> it to another server to resume workloads. >>> >>> Hosting websites out of CephFS also seems counter-intuitive. Have you >>> looked at S3 websites? RGW supports configuring websites out of a bucket >>> that might be of interest. Your RGW daemon configuration could easily >>> become an HA website with an LB in front of them. >>> >>> I'm biased here a bit, but I don't like to use networked filesystems >>> unless nothing else can be worked out or the software using it is 3rd party >>> and just doesn't support anything else. >>> >>> On Thu, Mar 1, 2018 at 9:05 AM Daniel Carrasco <d.carra...@i2tic.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I've tried to change a lot of things on configuration and use ceph-fuse >>>> but nothing makes it work better... When I deploy the git repository it >>>> becomes much slower until I remount the FS (just executing systemctl stop >>>> nginx && umount /mnt/ceph && mount -a && systemctl start nginx). It happen >>>>
Re: [ceph-users] Slow clients after git pull
Hello, Our problem is that the webpage is on a autoscaling group, so the created machine is not always updated and needs to have the latest data always. I've tried several ways to do it: - Local Storage synced: Sometimes the sync fails and data is not updated - NFS: If NFS server goes down, all clients die - Two NFS Server synced+Monit: when a NFS server is down umount freezes and is not able to change to the other NFS server - GlusterFS: Too slow for webpages CephFS is near to NFS on speed and have auto recovery if one node goes down (clients connects to other MDS automatically). About to use RBD, my problem is that I need a FS, because Nginx is not able to read directly from Ceph in other ways. About S3 and similar, I've also tried AWS NFS method but is much slower (even more than GlusterFS). My problem is that CephFS fits what I need. Doing tests I've noticed that maybe the file is updated on ceph node while client has file sessions open, so until I remount the FS that sessions continue opened. When I open the files with vim I notice that is a bit slower while is updating the repository, but after the update it works as fast as before. It fails even on Jewel so I think that maybe the only way to do it is to create a task to remount the FS when I deploy. Greetings and thanks!! 2018-03-01 15:29 GMT+01:00 David Turner <drakonst...@gmail.com>: > Using CephFS for something like this is about the last thing I would do. > Does it need to be on a networked posix filesystem that can be mounted on > multiple machines at the same time? If so, then you're kinda stuck and we > can start looking at your MDS hardware and see if there are any MDS > settings that need to be configured differently for this to work. > > If you don't NEED CephFS, then I would recommend utilizing an RBD for > something like this. Its limitation is only being able to be mapped to 1 > server at a time, but that's decent enough for most failover scenarios for > build setups. If you need to failover, unmap it from the primary and map > it to another server to resume workloads. > > Hosting websites out of CephFS also seems counter-intuitive. Have you > looked at S3 websites? RGW supports configuring websites out of a bucket > that might be of interest. Your RGW daemon configuration could easily > become an HA website with an LB in front of them. > > I'm biased here a bit, but I don't like to use networked filesystems > unless nothing else can be worked out or the software using it is 3rd party > and just doesn't support anything else. > > On Thu, Mar 1, 2018 at 9:05 AM Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> Hello, >> >> I've tried to change a lot of things on configuration and use ceph-fuse >> but nothing makes it work better... When I deploy the git repository it >> becomes much slower until I remount the FS (just executing systemctl stop >> nginx && umount /mnt/ceph && mount -a && systemctl start nginx). It happen >> when the FS gets a lot of IO because when I execute Rsync I got the same >> problem. >> >> I'm thinking about to downgrade to a lower version of ceph like for >> example jewel to see if works better. I know that will be deprecated soon, >> but I don't know what other tests I can do... >> >> Greetings!! >> >> 2018-02-28 17:11 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>: >> >>> Hello, >>> >>> I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. >>> The webpage speed is good enough (near to NFS speed), and have HA if one FS >>> die. >>> My problem comes when I deploy a git repository on that FS. The server >>> makes a lot of IOPS to check the files that have to update and then all >>> clients starts to have problems to use the FS (it becomes much slower). >>> In a normal usage the web takes about 400ms to load, and when the >>> problem start it takes more than 3s. To fix the problem I just have to >>> remount the FS on clients, but I can't remount the FS on every deploy... >>> >>> While is deploying I see how the CPU on MDS is a bit higher, but when it >>> ends the CPU usage goes down again, so look like is not a problem of CPU. >>> >>> My config file is: >>> [global] >>> fsid = bf56854..e611c08 >>> mon_initial_members = fs-01, fs-02, fs-03 >>> mon_host = 10.50.0.94,10.50.1.216,10.50.2.52 >>> auth_cluster_required = cephx >>> auth_service_required = cephx >>> auth_client_required = cephx >>> >>> public network = 10.50.0.0/22 >>> osd pool default size = 3 >>> >>> ## >>> ### OSD
Re: [ceph-users] Slow clients after git pull
Hello, I've tried to change a lot of things on configuration and use ceph-fuse but nothing makes it work better... When I deploy the git repository it becomes much slower until I remount the FS (just executing systemctl stop nginx && umount /mnt/ceph && mount -a && systemctl start nginx). It happen when the FS gets a lot of IO because when I execute Rsync I got the same problem. I'm thinking about to downgrade to a lower version of ceph like for example jewel to see if works better. I know that will be deprecated soon, but I don't know what other tests I can do... Greetings!! 2018-02-28 17:11 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>: > Hello, > > I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. The > webpage speed is good enough (near to NFS speed), and have HA if one FS die. > My problem comes when I deploy a git repository on that FS. The server > makes a lot of IOPS to check the files that have to update and then all > clients starts to have problems to use the FS (it becomes much slower). > In a normal usage the web takes about 400ms to load, and when the problem > start it takes more than 3s. To fix the problem I just have to remount the > FS on clients, but I can't remount the FS on every deploy... > > While is deploying I see how the CPU on MDS is a bit higher, but when it > ends the CPU usage goes down again, so look like is not a problem of CPU. > > My config file is: > [global] > fsid = bf56854..e611c08 > mon_initial_members = fs-01, fs-02, fs-03 > mon_host = 10.50.0.94,10.50.1.216,10.50.2.52 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > > public network = 10.50.0.0/22 > osd pool default size = 3 > > ## > ### OSD > ## > [osd] > osd_mon_heartbeat_interval = 5 > osd_mon_report_interval_max = 10 > osd_heartbeat_grace = 15 > osd_fast_fail_on_connection_refused = True > osd_pool_default_pg_num = 128 > osd_pool_default_pgp_num = 128 > osd_pool_default_size = 2 > osd_pool_default_min_size = 2 > > ## > ### Monitores > ## > [mon] > mon_osd_min_down_reporters = 1 > > ## > ### MDS > ## > [mds] > mds_cache_memory_limit = 792723456 > mds_bal_mode = 1 > > ## > ### Client > ## > [client] > client_cache_size = 32768 > client_mount_timeout = 30 > client_oc_max_objects = 2000 > client_oc_size = 629145600 > client_permissions = false > rbd_cache = true > rbd_cache_size = 671088640 > > My cluster and clients uses Debian 9 with latest ceph version (12.2.4). > The clients uses kernel modules to mount the share, because are a bit > faster than fuse modules. The deploy is done on one of the Ceph nodes, that > have the FS mounted by kernel module too. > My cluster is not a high usage cluster, so have all daemons on one machine > (3 machines with OSD, MON, MGR and MDS). All OSD has a copy of the data, > only one MGR is active and two of the MDS are active with one on standby. > The clients mount the FS using the three MDS IP addresses and just now > don't have any request because is not published. > > Someone knows what can be happening?, because all works fine (even on > other cluster I did with an high load), but just deploy the git repository > and all start to work very slow. > > Thanks!! > > > -- > _____ > > Daniel Carrasco Marín > Ingeniería para la Innovación i2TIC, S.L. > Tlf: +34 911 12 32 84 Ext: 223 > www.i2tic.com > _ > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Slow clients after git pull
Hello, I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. The webpage speed is good enough (near to NFS speed), and have HA if one FS die. My problem comes when I deploy a git repository on that FS. The server makes a lot of IOPS to check the files that have to update and then all clients starts to have problems to use the FS (it becomes much slower). In a normal usage the web takes about 400ms to load, and when the problem start it takes more than 3s. To fix the problem I just have to remount the FS on clients, but I can't remount the FS on every deploy... While is deploying I see how the CPU on MDS is a bit higher, but when it ends the CPU usage goes down again, so look like is not a problem of CPU. My config file is: [global] fsid = bf56854..e611c08 mon_initial_members = fs-01, fs-02, fs-03 mon_host = 10.50.0.94,10.50.1.216,10.50.2.52 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public network = 10.50.0.0/22 osd pool default size = 3 ## ### OSD ## [osd] osd_mon_heartbeat_interval = 5 osd_mon_report_interval_max = 10 osd_heartbeat_grace = 15 osd_fast_fail_on_connection_refused = True osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 2 osd_pool_default_min_size = 2 ## ### Monitores ## [mon] mon_osd_min_down_reporters = 1 ## ### MDS ## [mds] mds_cache_memory_limit = 792723456 mds_bal_mode = 1 ## ### Client ## [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 629145600 client_permissions = false rbd_cache = true rbd_cache_size = 671088640 My cluster and clients uses Debian 9 with latest ceph version (12.2.4). The clients uses kernel modules to mount the share, because are a bit faster than fuse modules. The deploy is done on one of the Ceph nodes, that have the FS mounted by kernel module too. My cluster is not a high usage cluster, so have all daemons on one machine (3 machines with OSD, MON, MGR and MDS). All OSD has a copy of the data, only one MGR is active and two of the MDS are active with one on standby. The clients mount the FS using the three MDS IP addresses and just now don't have any request because is not published. Someone knows what can be happening?, because all works fine (even on other cluster I did with an high load), but just deploy the git repository and all start to work very slow. Thanks!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Balanced MDS, all as active and recomended client settings.
Finally, I've changed the configuration to the following: ## ### MDS ## [mds] mds_cache_memory_limit = 792723456 mds_bal_mode = 1 ## ### Client ## [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_obijects = 2 client_oc_size = 1048576000 client_permissions = false client_quota = false rbd_cache = true rbd_cache_size = 671088640 I've disabled client_permissions and client_quota because the cluster is only for the webpage and the network is isolated, so it don't need to check the permissions every time, and I've disabled the quota check because there is no quota on this cluster. This will lower the petitions to MDS and CPU usage, right? Greetings!! 2018-02-22 19:34 GMT+01:00 Patrick Donnelly <pdonn...@redhat.com>: > On Wed, Feb 21, 2018 at 11:17 PM, Daniel Carrasco <d.carra...@i2tic.com> > wrote: > > I want to search also if there is any way to cache file metadata on > client, > > to lower the MDS load. I suppose that files are cached but the client > check > > with MDS if there are changes on files. On my server files are the most > of > > time read-only so MDS data can be also cached for a while. > > The MDS issues capabilities that allow clients to coherently cache > metadata. > > -- > Patrick Donnelly > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Balanced MDS, all as active and recomended client settings.
Thanks, I'll check it. I want to search also if there is any way to cache file metadata on client, to lower the MDS load. I suppose that files are cached but the client check with MDS if there are changes on files. On my server files are the most of time read-only so MDS data can be also cached for a while. Greetings!! El 22 feb. 2018 3:59, "Patrick Donnelly" <pdonn...@redhat.com> escribió: > Hello Daniel, > > On Wed, Feb 21, 2018 at 10:26 AM, Daniel Carrasco <d.carra...@i2tic.com> > wrote: > > Is possible to make a better distribution on the MDS load of both nodes?. > > We are aware of bugs with the balancer which are being worked on. You > can also manually create a partition if the workload can benefit: > > https://ceph.com/community/new-luminous-cephfs-subtree-pinning/ > > > Is posible to set all nodes as Active without problems? > > No. I recommend you read the docs carefully: > > http://docs.ceph.com/docs/master/cephfs/multimds/ > > > My last question is if someone can recomend me a good client > configuration > > like cache size, and maybe something to lower the metadata servers load. > > >> > >> ## > >> [mds] > >> mds_cache_size = 25 > >> mds_cache_memory_limit = 792723456 > > You should only specify one of those. See also: > > http://docs.ceph.com/docs/master/cephfs/cache-size-limits/ > > -- > Patrick Donnelly > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Balanced MDS, all as active and recomended client settings.
2018-02-21 19:26 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>: > Hello, > > I've created a Ceph cluster with 3 nodes to serve files to an high traffic > webpage. I've configured two MDS as active and one as standby, but after > add the new system to production I've noticed that MDS are not balanced and > one server get the most of clients petitions (One MDS about 700 or less vs > 4.000 or more the other). > > Is possible to make a better distribution on the MDS load of both nodes?. > Is posible to set all nodes as Active without problems? > > I know that is possible to set max_mds to 3 and all will be active, but I > want to know what happen if one node goes down for example, or if there are > another side effects. > > > My last question is if someone can recomend me a good client configuration > like cache size, and maybe something to lower the metadata servers load. > > > Thanks!! > I forgot to say my configuration xD. I've a three nodes cluster with AIO: - 3 Monitors - 3 OSD - 3 MDS (2 actives and one standby) - 3 MGR (1 active) The data has 3 copies, so is in every node. My configuration file is: [global] fsid = BlahBlahBlah mon_initial_members = fs-01, fs-02, fs-03 mon_host = 192.168.4.199,192.168.4.200,192.168.4.201 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public network = 192.168.4.0/24 osd pool default size = 3 ## ### OSD ## [osd] osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 3 osd_pool_default_min_size = 2 osd_mon_heartbeat_interval = 5 osd_mon_report_interval_max = 10 osd_heartbeat_grace = 15 osd_fast_fail_on_connection_refused = True ## ### MON ## [mon] mon_osd_min_down_reporters = 2 ## ### MDS ## [mds] mds_cache_size = 25 mds_cache_memory_limit = 792723456 ## ### Client ## [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 629145600 rbd_cache = true rbd_cache_size = 671088640 Thanks!!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Balanced MDS, all as active and recomended client settings.
Hello, I've created a Ceph cluster with 3 nodes to serve files to an high traffic webpage. I've configured two MDS as active and one as standby, but after add the new system to production I've noticed that MDS are not balanced and one server get the most of clients petitions (One MDS about 700 or less vs 4.000 or more the other). Is possible to make a better distribution on the MDS load of both nodes?. Is posible to set all nodes as Active without problems? I know that is possible to set max_mds to 3 and all will be active, but I want to know what happen if one node goes down for example, or if there are another side effects. My last question is if someone can recomend me a good client configuration like cache size, and maybe something to lower the metadata servers load. Thanks!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade
Finally i've disabled the mon_osd_report_timeout option and seems to works fine. Greetings!. 2017-10-17 19:02 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > Thanks!! > > I'll take a look later. > > Anyway, all my Ceph daemons are in same version on all nodes (I've > upgraded the whole cluster). > > Cheers!! > > > El 17 oct. 2017 6:39 p. m., "Marc Roos" <m.r...@f1-outsourcing.eu> > escribió: > > Did you check this? > > https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html > > > > > > > > > -Original Message- > From: Daniel Carrasco [mailto:d.carra...@i2tic.com] > Sent: dinsdag 17 oktober 2017 17:49 > To: ceph-us...@ceph.com > Subject: [ceph-users] OSD are marked as down after jewel -> luminous > upgrade > > Hello, > > Today I've decided to upgrade my Ceph cluster to latest LTS version. To > do it I've used the steps posted on release notes: > http://ceph.com/releases/v12-2-0-luminous-released/ > > After upgrade all the daemons I've noticed that all OSD daemons are > marked as down even when all are working, so the cluster becomes down. > > Maybe the problem is the command "ceph osd require-osd-release > luminous", but all OSD are on Luminous version. > > > - > > > - > > # ceph versions > { > "mon": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 3 > }, > "mgr": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 3 > }, > "osd": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 2 > }, > "mds": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 2 > }, > "overall": { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 10 > } > } > > > - > > > - > > # ceph osd versions > { > "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) > luminous (stable)": 2 } > > # ceph osd tree > > ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF > -1 0.08780 root default > -2 0.04390 host alantra_fs-01 > 0 ssd 0.04390 osd.0 up 1.0 1.0 > -3 0.04390 host alantra_fs-02 > 1 ssd 0.04390 osd.1 up 1.0 1.0 > -4 0 host alantra_fs-03 > > > - > > > - > > # ceph -s > cluster: > id: 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e > health: HEALTH_WARN > nodown flag(s) set > > services: > mon: 3 daemons, quorum alantra_fs-02,alantra_fs-01,alantra_fs-03 > mgr: alantra_fs-03(active), standbys: alantra_fs-01, alantra_fs-02 > mds: cephfs-1/1/1 up {0=alantra_fs-01=up:active}, 1 up:standby > osd: 2 osds: 2 up, 2 in > flags nodown > > data: > pools: 3 pools, 192 pgs > objects: 40177 objects, 3510 MB > usage: 7486 MB used, 84626 MB / 92112 MB avail > pgs: 192 active+clean > > io: > client: 564 kB/s rd, 767 B/s wr, 33 op/s rd, 0 op/s wr > > > - > > > - > Log: > 2017-10-17 16:15:25.466807 mon.alantra_fs-02 [INF] osd.0 marked down > after no beacon for 29.864632 seconds > 2017-10-17 16:15:25.467557 mon.alantra_fs-02 [WRN] Health check failed: > 1 osds down (OSD_DOWN) > 2017-10-17 16:15:25.467587 mon.alantra_fs-02 [WRN] Health check failed: > 1 host (1 osds) down (OSD_HOST_DOWN) > 2017-10-17 16:15:27.494526 mon.alantra_fs-02 [WRN] Health check failed: > Degraded data redundancy: 63 pgs unclean (PG_DEGRADED) > 2017-10-17 16:15:27.5
Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade
Thanks!! I'll take a look later. Anyway, all my Ceph daemons are in same version on all nodes (I've upgraded the whole cluster). Cheers!! El 17 oct. 2017 6:39 p. m., "Marc Roos" <m.r...@f1-outsourcing.eu> escribió: Did you check this? https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html -Original Message----- From: Daniel Carrasco [mailto:d.carra...@i2tic.com] Sent: dinsdag 17 oktober 2017 17:49 To: ceph-us...@ceph.com Subject: [ceph-users] OSD are marked as down after jewel -> luminous upgrade Hello, Today I've decided to upgrade my Ceph cluster to latest LTS version. To do it I've used the steps posted on release notes: http://ceph.com/releases/v12-2-0-luminous-released/ After upgrade all the daemons I've noticed that all OSD daemons are marked as down even when all are working, so the cluster becomes down. Maybe the problem is the command "ceph osd require-osd-release luminous", but all OSD are on Luminous version. - - # ceph versions { "mon": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3 }, "mgr": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3 }, "osd": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 }, "mds": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 }, "overall": { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 10 } } - - # ceph osd versions { "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 2 } # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.08780 root default -2 0.04390 host alantra_fs-01 0 ssd 0.04390 osd.0 up 1.0 1.0 -3 0.04390 host alantra_fs-02 1 ssd 0.04390 osd.1 up 1.0 1.0 -4 0 host alantra_fs-03 - - # ceph -s cluster: id: 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e health: HEALTH_WARN nodown flag(s) set services: mon: 3 daemons, quorum alantra_fs-02,alantra_fs-01,alantra_fs-03 mgr: alantra_fs-03(active), standbys: alantra_fs-01, alantra_fs-02 mds: cephfs-1/1/1 up {0=alantra_fs-01=up:active}, 1 up:standby osd: 2 osds: 2 up, 2 in flags nodown data: pools: 3 pools, 192 pgs objects: 40177 objects, 3510 MB usage: 7486 MB used, 84626 MB / 92112 MB avail pgs: 192 active+clean io: client: 564 kB/s rd, 767 B/s wr, 33 op/s rd, 0 op/s wr - - Log: 2017-10-17 16:15:25.466807 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 29.864632 seconds 2017-10-17 16:15:25.467557 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:15:25.467587 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:15:27.494526 mon.alantra_fs-02 [WRN] Health check failed: Degraded data redundancy: 63 pgs unclean (PG_DEGRADED) 2017-10-17 16:15:27.501956 mon.alantra_fs-02 [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2017-10-17 16:15:27.501997 mon.alantra_fs-02 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) 2017-10-17 16:15:27.502012 mon.alantra_fs-02 [INF] Cluster is now healthy 2017-10-17 16:15:27.518798 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:15:26.414023 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:15:30.470477 mon.alantra_fs-02 [INF] osd.1 marked down after no beacon for 25.007336 seconds 2017-10-17 16:15:30.471014 mon.alantra_fs-02 [WRN] Health check failed: 1 osds down (OSD_DOWN) 2017-10-17 16:15:30.471047 mon.alantra_fs-02 [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:15:30.532427 mon.alantra_fs-02 [WRN] overall HEALTH
[ceph-users] OSD are marked as down after jewel -> luminous upgrade
-02 [INF] mon.2 10.20.1.216:6789/0 2017-10-17 16:17:16.885662 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 96 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:17:25.528348 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.004060 seconds 2017-10-17 16:17:25.528960 mon.alantra_fs-02 [WRN] Health check update: 2 osds down (OSD_DOWN) 2017-10-17 16:17:25.528991 mon.alantra_fs-02 [WRN] Health check update: 3 hosts (2 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:25.529011 mon.alantra_fs-02 [WRN] Health check failed: 1 root (2 osds) down (OSD_ROOT_DOWN) 2017-10-17 16:17:26.544228 mon.alantra_fs-02 [INF] Health check cleared: OSD_ROOT_DOWN (was: 1 root (2 osds) down) 2017-10-17 16:17:26.568819 mon.alantra_fs-02 [INF] osd.0 10.20.1.109:6801/3319 boot 2017-10-17 16:17:25.557037 osd.0 [WRN] Monitor daemon marked osd.0 down, but it is still running 2017-10-17 16:17:30.532840 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; Degraded data redundancy: 40177/80354 objects degraded (50.000%), 96 pgs unclean, 192 pgs degraded 2017-10-17 16:17:30.538294 mon.alantra_fs-02 [WRN] Health check update: 1 osds down (OSD_DOWN) 2017-10-17 16:17:30.538333 mon.alantra_fs-02 [WRN] Health check update: 1 host (1 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:31.602434 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded (PG_DEGRADED) 2017-10-17 16:17:55.540005 mon.alantra_fs-02 [INF] osd.0 marked down after no beacon for 25.001599 seconds 2017-10-17 16:17:55.540538 mon.alantra_fs-02 [WRN] Health check update: 2 osds down (OSD_DOWN) 2017-10-17 16:17:55.540562 mon.alantra_fs-02 [WRN] Health check update: 3 hosts (2 osds) down (OSD_HOST_DOWN) 2017-10-17 16:17:55.540585 mon.alantra_fs-02 [WRN] Health check failed: 1 root (2 osds) down (OSD_ROOT_DOWN) 2017-10-17 16:18:28.916734 mon.alantra_fs-02 [WRN] Health check update: Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded, 192 pgs undersized (PG_DEGRADED) 2017-10-17 16:18:30.533096 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 2 osds down; 3 hosts (2 osds) down; 1 root (2 osds) down; Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192 pgs degraded, 192 pgs undersized 2017-10-17 16:18:56.929295 mon.alantra_fs-02 [WRN] Health check failed: Reduced data availability: 192 pgs stale (PG_AVAILABILITY) - - ceph.conf [global] fsid = 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 public_network = 10.20.1.0/24 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx ## ### OSD ## [osd] osd_mon_heartbeat_interval = 5 osd_mon_report_interval_max = 10 osd_heartbeat_grace = 10 osd_fast_fail_on_connection_refused = True osd_pool_default_pg_num = 128 osd_pool_default_pgp_num = 128 osd_pool_default_size = 2 osd_pool_default_min_size = 2 ## ### Monitores ## [mon] mon_allow_pool_delete = false mon_osd_report_timeout = 25 mon_osd_min_down_reporters = 1 [mon.alantra_fs-01] host = alantra_fs-01 mon_addr = 10.20.1.109:6789 [mon.alantra_fs-02] host = alantra_fs-02 mon_addr = 10.20.1.97:6789 [mon.alantra_fs-03] host = alantra_fs-03 mon_addr = 10.20.1.216:6789 ## ### MDS ## [mds] mds_cache_size = 25 ## ### Client ## [client] client_cache_size = 32768 client_mount_timeout = 30 client_oc_max_objects = 2000 client_oc_size = 629145600 rbd_cache = true rbd_cache_size = 671088640 - - For now I've added the nodown flag to keep all OSD online, and all is working fine, but this is not the best way to do it. Someone knows how to fix this problem?. Maybe this release needs to open new ports on firewall? Thanks!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Connections between services secure?
Mainly fuse clients with the other (MDS, ODS and MON will be on a private network), and maybe one day I'll try to create a multi-site cluster. Greetings!! El 30 jun. 2017 8:33 p. m., "David Turner" <drakonst...@gmail.com> escribió: Which part of ceph are you looking at using through the Internet? RGW, multi-site, multi-datacenter crush maps, etc? On Fri, Jun 30, 2017 at 2:28 PM Daniel Carrasco <d.carra...@i2tic.com> wrote: > Hello, > > My question is about steam security of connections between ceph services. > I've read that connection is verified by private keys and signed packets, > but my question is if that packets are ciphered in any way to avoid packets > sniffers, because I want to know if can be used through internet without > problem or I need an VPN. > > Thanks!! > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Connections between services secure?
Hello, My question is about steam security of connections between ceph services. I've read that connection is verified by private keys and signed packets, but my question is if that packets are ciphered in any way to avoid packets sniffers, because I want to know if can be used through internet without problem or I need an VPN. Thanks!! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
Hello, I just write to say that after more than a week the server still working without problem and the OSD are not marked as down erroneously. On my tests the webpage stop working for less than a minute when i stop an OSD, so the failover is working fine. Greetings and thanks for all your help!! 2017-06-15 19:04 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > Hello, thanks for the info. > > I'll give a try tomorrow. On one of my test I got the messages that yo say > (wrongfully marked), but i've lowered other options and now is fine. For > now the OSD are not reporting down messages even with an high load test, > but I'll see the logs tomorrow to confirm. > > The most of time the server is used as RO and the load is not high, so if > an OSD is marked as down for some seconds is not a big problem (at least I > think that recovery traffic is low because it only has to check that pgs > are in both OSD). > > Greetings and thanks again! > > 2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>: > >> osd_heartbeat_grace is a setting for how many seconds since the last time >> an osd received a successful response from another osd before telling the >> mons that it's down. This is one you may want to lower from its default >> value of 20 seconds. >> >> mon_osd_min_down_reporters is a setting for how many osds need to report >> an osd as down before the mons will mark it as down. I recommend setting >> this to N+1 where N is how many osds you have in a node or failure domain. >> If you end up with a network problem and you have 1 osd node that can talk >> to the mons, but not the other osd nodes, then you will end up with that >> one node marking the entire cluster down while the rest of the cluster >> marks that node down. If your min_down_reporters is N+1, then 1 node cannot >> mark down the rest of the cluster. The default setting is 1 so that small >> test clusters can mark down osds, but if you have 3+ nodes, you should set >> it to N+1 if you can. Setting it to more than 2 nodes is equally as >> problematic. However, if you just want things to report as fast as >> possible, leaving this at 1 still might be optimal to getting it marked >> down sooner. >> >> The downside to lowering these settings is if OSDs are getting marked >> down for running slower, then they will re-assert themselves to the mons >> and end up causing backfilling and peering for no really good reason. >> You'll want to monitor your cluster for OSDs being marked down for a few >> seconds before marking themselves back up. You can see this in the OSD >> logs where the OSD says it was wrongfully marked down in one line and then >> the next is where it tells the mons it is actually up. >> >> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com> >> wrote: >> >>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD >>> daemons has started to use only a 5% (about 200MB). Is like magic, and now >>> I've about 3.2Gb of free RAM. >>> >>> Greetings!! >>> >>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: >>> >>>> Finally, the problem was W3Total Cache, that seems to be unable to >>>> manage HA and when the master redis host is down, it stop working without >>>> try the slave. >>>> >>>> I've added some options to make it faster to detect a down OSD and the >>>> page is online again in about 40s. >>>> >>>> [global] >>>> fsid = Hidden >>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 >>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 >>>> auth_cluster_required = cephx >>>> auth_service_required = cephx >>>> auth_client_required = cephx >>>> osd mon heartbeat interval = 5 >>>> osd mon report interval max = 10 >>>> mon osd report timeout = 15 >>>> osd fast fail on connection refused = True >>>> >>>> public network = 10.20.1.0/24 >>>> osd pool default size = 2 >>>> >>>> >>>> Greetings and thanks for all your help. >>>> >>>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>: >>>> >>>>> I've used the kernel client and the ceph-fuse driver for mapping the >>>>> cephfs volume. I didn't notice any network hiccups while failing over, >>>>> but >>>>> I was reading large files during my tests (and live) and some caching may >>>>> h
Re: [ceph-users] What is "up:standby"? in ceph mds stat => e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby
On MDS nodes, by default only the first you add is active: The others joins the cluster as standby MDS daemons. When the active fails, then an standby MDS becomes active and continues with the work. You can change this behaviour setting the max active mds option, but it still on testing and is not recommended for production enviorments. Greetings!! 2017-06-16 12:40 GMT+02:00 Stéphane Klein <cont...@stephane-klein.info>: > Hi, > > I have installed mdss role with Ansible. > > Now, I have this: > > root@ceph-test-1:/home/vagrant# ceph fs ls > name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ] > root@ceph-test-1:/home/vagrant# ceph mds stat > e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby > root@ceph-test-1:/home/vagrant# ceph status > cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac > health HEALTH_OK > monmap e1: 3 mons at {ceph-test-1=172.28.128.3: > 6789/0,ceph-test-2=172.28.128.4:6789/0,ceph-test-3=172.28.128.5:6789/0} > election epoch 10, quorum 0,1,2 ceph-test-1,ceph-test-2,ceph- > test-3 > fsmap e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby > osdmap e14: 3 osds: 3 up, 3 in > flags sortbitwise,require_jewel_osds > pgmap v36: 164 pgs, 3 pools, 2068 bytes data, 20 objects > 102 MB used, 10652 MB / 10754 MB avail > 164 active+clean > > What is up:standby"? in > > # ceph mds stat > e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby > > Best regards, > Stéphane > -- > Stéphane Klein <cont...@stephane-klein.info> > blog: http://stephane-klein.info > cv : http://cv.stephane-klein.info > Twitter: http://twitter.com/klein_stephane > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
Hello, thanks for the info. I'll give a try tomorrow. On one of my test I got the messages that yo say (wrongfully marked), but i've lowered other options and now is fine. For now the OSD are not reporting down messages even with an high load test, but I'll see the logs tomorrow to confirm. The most of time the server is used as RO and the load is not high, so if an OSD is marked as down for some seconds is not a big problem (at least I think that recovery traffic is low because it only has to check that pgs are in both OSD). Greetings and thanks again! 2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>: > osd_heartbeat_grace is a setting for how many seconds since the last time > an osd received a successful response from another osd before telling the > mons that it's down. This is one you may want to lower from its default > value of 20 seconds. > > mon_osd_min_down_reporters is a setting for how many osds need to report > an osd as down before the mons will mark it as down. I recommend setting > this to N+1 where N is how many osds you have in a node or failure domain. > If you end up with a network problem and you have 1 osd node that can talk > to the mons, but not the other osd nodes, then you will end up with that > one node marking the entire cluster down while the rest of the cluster > marks that node down. If your min_down_reporters is N+1, then 1 node cannot > mark down the rest of the cluster. The default setting is 1 so that small > test clusters can mark down osds, but if you have 3+ nodes, you should set > it to N+1 if you can. Setting it to more than 2 nodes is equally as > problematic. However, if you just want things to report as fast as > possible, leaving this at 1 still might be optimal to getting it marked > down sooner. > > The downside to lowering these settings is if OSDs are getting marked down > for running slower, then they will re-assert themselves to the mons and end > up causing backfilling and peering for no really good reason. You'll want > to monitor your cluster for OSDs being marked down for a few seconds before > marking themselves back up. You can see this in the OSD logs where the OSD > says it was wrongfully marked down in one line and then the next is where > it tells the mons it is actually up. > > On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD >> daemons has started to use only a 5% (about 200MB). Is like magic, and now >> I've about 3.2Gb of free RAM. >> >> Greetings!! >> >> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: >> >>> Finally, the problem was W3Total Cache, that seems to be unable to >>> manage HA and when the master redis host is down, it stop working without >>> try the slave. >>> >>> I've added some options to make it faster to detect a down OSD and the >>> page is online again in about 40s. >>> >>> [global] >>> fsid = Hidden >>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 >>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 >>> auth_cluster_required = cephx >>> auth_service_required = cephx >>> auth_client_required = cephx >>> osd mon heartbeat interval = 5 >>> osd mon report interval max = 10 >>> mon osd report timeout = 15 >>> osd fast fail on connection refused = True >>> >>> public network = 10.20.1.0/24 >>> osd pool default size = 2 >>> >>> >>> Greetings and thanks for all your help. >>> >>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>: >>> >>>> I've used the kernel client and the ceph-fuse driver for mapping the >>>> cephfs volume. I didn't notice any network hiccups while failing over, but >>>> I was reading large files during my tests (and live) and some caching may >>>> have hidden hidden network hiccups for my use case. >>>> >>>> Going back to the memory potentially being a problem. Ceph has a >>>> tendency to start using 2-3x more memory while it's in a degraded state as >>>> opposed to when everything is health_ok. Always plan for over-provisioning >>>> your memory to account for a minimum of 2x. I've seen clusters stuck in an >>>> OOM killer death spiral because it kept killing OSDs for running out of >>>> memory, that caused more peering and backfilling, ... which caused more >>>> OSDs to be killed by OOM killer. >>>> >>>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
I forgot to say that after upgrade the machine RAM to 4Gb, the OSD daemons has started to use only a 5% (about 200MB). Is like magic, and now I've about 3.2Gb of free RAM. Greetings!! 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>: > Finally, the problem was W3Total Cache, that seems to be unable to manage > HA and when the master redis host is down, it stop working without try the > slave. > > I've added some options to make it faster to detect a down OSD and the > page is online again in about 40s. > > [global] > fsid = Hidden > mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 > mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > osd mon heartbeat interval = 5 > osd mon report interval max = 10 > mon osd report timeout = 15 > osd fast fail on connection refused = True > > public network = 10.20.1.0/24 > osd pool default size = 2 > > > Greetings and thanks for all your help. > > 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>: > >> I've used the kernel client and the ceph-fuse driver for mapping the >> cephfs volume. I didn't notice any network hiccups while failing over, but >> I was reading large files during my tests (and live) and some caching may >> have hidden hidden network hiccups for my use case. >> >> Going back to the memory potentially being a problem. Ceph has a >> tendency to start using 2-3x more memory while it's in a degraded state as >> opposed to when everything is health_ok. Always plan for over-provisioning >> your memory to account for a minimum of 2x. I've seen clusters stuck in an >> OOM killer death spiral because it kept killing OSDs for running out of >> memory, that caused more peering and backfilling, ... which caused more >> OSDs to be killed by OOM killer. >> >> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com> >> wrote: >> >>> Is strange because on my test cluster (three nodes) with two nodes with >>> OSD, and all with MON and MDS, I've configured the size to 2 and min_size >>> to 1, I've restarted all nodes one by one and the client loose the >>> connection for about 5 seconds until connect to other MDS. >>> >>> Are you using ceph client or kernel client? >>> I forgot to say that I'm using Debian 8. >>> >>> Anyway, maybe the problem was what I've said before, the clients >>> connection with that node started to fail, but the node was not officially >>> down. And it wasn't a client problem, because it happened on both clients >>> and on my monitoring service at same time. >>> >>> Just now I'm not on the office, so I can't post the config file. >>> Tomorrow I'll send it. >>> Anyway, is the basic file generated by ceph-deploy with client network >>> and min_size configurations. Just like my test config. >>> >>> Thanks!!, and greetings!! >>> >>> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com> >>> escribió: >>> >>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at >>> a time to do ceph and kernel upgrades. The VM's running out of ceph, the >>> clients accessing MDS, etc all keep working fine without any problem during >>> these restarts. What is your full ceph configuration? There must be >>> something not quite right in there. >>> >>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com> >>> wrote: >>> >>>> >>>> >>>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com> >>>> escribió: >>>> >>>> Not just the min_size of your cephfs data pool, but also your >>>> cephfs_metadata pool. >>>> >>>> >>>> Both were at 1. I don't know why because I don't remember to have >>>> changed the min_size and the cluster has 3 odd from beginning (I did >>>> it on another cluster for testing purposes, but I don't remember to have >>>> changed on this). I've changed both to two, but after the fail. >>>> >>>> About the size, I use 50Gb because it's for a single webpage and I >>>> don't need more space. >>>> >>>> I'll try to increase the memory to 3Gb. >>>> >>>> Greetings!! >>>> >>>> >>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com>
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
Finally, the problem was W3Total Cache, that seems to be unable to manage HA and when the master redis host is down, it stop working without try the slave. I've added some options to make it faster to detect a down OSD and the page is online again in about 40s. [global] fsid = Hidden mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03 mon_host = 10.20.1.109,10.20.1.97,10.20.1.216 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd mon heartbeat interval = 5 osd mon report interval max = 10 mon osd report timeout = 15 osd fast fail on connection refused = True public network = 10.20.1.0/24 osd pool default size = 2 Greetings and thanks for all your help. 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>: > I've used the kernel client and the ceph-fuse driver for mapping the > cephfs volume. I didn't notice any network hiccups while failing over, but > I was reading large files during my tests (and live) and some caching may > have hidden hidden network hiccups for my use case. > > Going back to the memory potentially being a problem. Ceph has a tendency > to start using 2-3x more memory while it's in a degraded state as opposed > to when everything is health_ok. Always plan for over-provisioning your > memory to account for a minimum of 2x. I've seen clusters stuck in an OOM > killer death spiral because it kept killing OSDs for running out of memory, > that caused more peering and backfilling, ... which caused more OSDs to be > killed by OOM killer. > > On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> Is strange because on my test cluster (three nodes) with two nodes with >> OSD, and all with MON and MDS, I've configured the size to 2 and min_size >> to 1, I've restarted all nodes one by one and the client loose the >> connection for about 5 seconds until connect to other MDS. >> >> Are you using ceph client or kernel client? >> I forgot to say that I'm using Debian 8. >> >> Anyway, maybe the problem was what I've said before, the clients >> connection with that node started to fail, but the node was not officially >> down. And it wasn't a client problem, because it happened on both clients >> and on my monitoring service at same time. >> >> Just now I'm not on the office, so I can't post the config file. Tomorrow >> I'll send it. >> Anyway, is the basic file generated by ceph-deploy with client network >> and min_size configurations. Just like my test config. >> >> Thanks!!, and greetings!! >> >> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com> >> escribió: >> >> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at >> a time to do ceph and kernel upgrades. The VM's running out of ceph, the >> clients accessing MDS, etc all keep working fine without any problem during >> these restarts. What is your full ceph configuration? There must be >> something not quite right in there. >> >> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com> >> wrote: >> >>> >>> >>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com> >>> escribió: >>> >>> Not just the min_size of your cephfs data pool, but also your >>> cephfs_metadata pool. >>> >>> >>> Both were at 1. I don't know why because I don't remember to have >>> changed the min_size and the cluster has 3 odd from beginning (I did it >>> on another cluster for testing purposes, but I don't remember to have >>> changed on this). I've changed both to two, but after the fail. >>> >>> About the size, I use 50Gb because it's for a single webpage and I don't >>> need more space. >>> >>> I'll try to increase the memory to 3Gb. >>> >>> Greetings!! >>> >>> >>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com> >>> wrote: >>> >>>> Ceph recommends 1GB of RAM for ever 1TB of OSD space. Your 2GB nodes >>>> are definitely on the low end. 50GB OSDs... I don't know what that will >>>> require, but where you're running the mon and mds on the same node, I'd >>>> still say that 2GB is low. The Ceph OSD daemon using 1GB of RAM is not >>>> surprising, even at that size. >>>> >>>> When you say you increased the size of the pools to 3, what did you do >>>> to the min_size? Is that still set to 2? >>>> >>>> On Wed, Jun 14, 2017 at 3:17 PM Daniel
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
Is strange because on my test cluster (three nodes) with two nodes with OSD, and all with MON and MDS, I've configured the size to 2 and min_size to 1, I've restarted all nodes one by one and the client loose the connection for about 5 seconds until connect to other MDS. Are you using ceph client or kernel client? I forgot to say that I'm using Debian 8. Anyway, maybe the problem was what I've said before, the clients connection with that node started to fail, but the node was not officially down. And it wasn't a client problem, because it happened on both clients and on my monitoring service at same time. Just now I'm not on the office, so I can't post the config file. Tomorrow I'll send it. Anyway, is the basic file generated by ceph-deploy with client network and min_size configurations. Just like my test config. Thanks!!, and greetings!! El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com> escribió: I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at a time to do ceph and kernel upgrades. The VM's running out of ceph, the clients accessing MDS, etc all keep working fine without any problem during these restarts. What is your full ceph configuration? There must be something not quite right in there. On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com> wrote: > > > El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com> > escribió: > > Not just the min_size of your cephfs data pool, but also your > cephfs_metadata pool. > > > Both were at 1. I don't know why because I don't remember to have changed > the min_size and the cluster has 3 odd from beginning (I did it on > another cluster for testing purposes, but I don't remember to have changed > on this). I've changed both to two, but after the fail. > > About the size, I use 50Gb because it's for a single webpage and I don't > need more space. > > I'll try to increase the memory to 3Gb. > > Greetings!! > > > On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com> > wrote: > >> Ceph recommends 1GB of RAM for ever 1TB of OSD space. Your 2GB nodes are >> definitely on the low end. 50GB OSDs... I don't know what that will >> require, but where you're running the mon and mds on the same node, I'd >> still say that 2GB is low. The Ceph OSD daemon using 1GB of RAM is not >> surprising, even at that size. >> >> When you say you increased the size of the pools to 3, what did you do to >> the min_size? Is that still set to 2? >> >> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco <d.carra...@i2tic.com> >> wrote: >> >>> Finally I've created three nodes, I've increased the size of pools to 3 >>> and I've created 3 MDS (active, standby, standby). >>> >>> Today the server has decided to fail and I've noticed that failover is >>> not working... The ceph -s command shows like everything was OK but the >>> clients weren't able to connect and I had to restart the failing node and >>> reconect the clients manually to make it work again (even I think that the >>> active MDS was in another node). >>> >>> I don't know if maybe is because the server was not fully down, and only >>> some connections were failing. I'll do some tests too see. >>> >>> Another question: How many memory needs a node to work?, because I've >>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high >>> memory usage (more than 1GB on the OSD). >>> The OSD size is 50GB and the data that contains is less than 3GB. >>> >>> Thanks, and Greetings!! >>> >>> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>: >>> >>>> Since your app is an Apache / php app is it possible for you to >>>> reconfigure the app to use S3 module rather than a posix open file()? Then >>>> with Ceph drop CephFS and configure Civetweb S3 gateway? You can have >>>> "active-active" endpoints with round robin dns or F5 or something. You >>>> would also have to repopulate objects into the rados pools. >>>> >>>> Also increase that size parameter to 3. ;-) >>>> >>>> Lots of work for active-active but the whole stack will be much more >>>> resilient coming from some with a ClearCase / NFS / stale file handles up >>>> the wazoo background >>>> >>>> >>>> >>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com >>>> > wrote: >>>> >>>>> 2017-06-12 16:10 GMT+02:00 David Turner <drako
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com> escribió: Not just the min_size of your cephfs data pool, but also your cephfs_metadata pool. Both were at 1. I don't know why because I don't remember to have changed the min_size and the cluster has 3 odd from beginning (I did it on another cluster for testing purposes, but I don't remember to have changed on this). I've changed both to two, but after the fail. About the size, I use 50Gb because it's for a single webpage and I don't need more space. I'll try to increase the memory to 3Gb. Greetings!! On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com> wrote: > Ceph recommends 1GB of RAM for ever 1TB of OSD space. Your 2GB nodes are > definitely on the low end. 50GB OSDs... I don't know what that will > require, but where you're running the mon and mds on the same node, I'd > still say that 2GB is low. The Ceph OSD daemon using 1GB of RAM is not > surprising, even at that size. > > When you say you increased the size of the pools to 3, what did you do to > the min_size? Is that still set to 2? > > On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> Finally I've created three nodes, I've increased the size of pools to 3 >> and I've created 3 MDS (active, standby, standby). >> >> Today the server has decided to fail and I've noticed that failover is >> not working... The ceph -s command shows like everything was OK but the >> clients weren't able to connect and I had to restart the failing node and >> reconect the clients manually to make it work again (even I think that the >> active MDS was in another node). >> >> I don't know if maybe is because the server was not fully down, and only >> some connections were failing. I'll do some tests too see. >> >> Another question: How many memory needs a node to work?, because I've >> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high >> memory usage (more than 1GB on the OSD). >> The OSD size is 50GB and the data that contains is less than 3GB. >> >> Thanks, and Greetings!! >> >> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>: >> >>> Since your app is an Apache / php app is it possible for you to >>> reconfigure the app to use S3 module rather than a posix open file()? Then >>> with Ceph drop CephFS and configure Civetweb S3 gateway? You can have >>> "active-active" endpoints with round robin dns or F5 or something. You >>> would also have to repopulate objects into the rados pools. >>> >>> Also increase that size parameter to 3. ;-) >>> >>> Lots of work for active-active but the whole stack will be much more >>> resilient coming from some with a ClearCase / NFS / stale file handles up >>> the wazoo background >>> >>> >>> >>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com> >>> wrote: >>> >>>> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>: >>>> >>>>> I have an incredibly light-weight cephfs configuration. I set up an >>>>> MDS on each mon (3 total), and have 9TB of data in cephfs. This data only >>>>> has 1 client that reads a few files at a time. I haven't noticed any >>>>> downtime when it fails over to a standby MDS. So it definitely depends on >>>>> your workload as to how a failover will affect your environment. >>>>> >>>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com> >>>>> wrote: >>>>> >>>>>> We use the following in our ceph.conf for MDS failover. We're running >>>>>> one active and one standby. Last time it failed over there was about 2 >>>>>> minutes of downtime before the mounts started responding again but it did >>>>>> recover gracefully. >>>>>> >>>>>> [mds] >>>>>> max_mds = 1 >>>>>> mds_standby_for_rank = 0 >>>>>> mds_standby_replay = true >>>>>> >>>>>> ___ >>>>>> >>>>>> John Petrini >>>>>> ___ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> >>>> >>>> Thanks to both. >>>> Just now i'm
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
Finally I've created three nodes, I've increased the size of pools to 3 and I've created 3 MDS (active, standby, standby). Today the server has decided to fail and I've noticed that failover is not working... The ceph -s command shows like everything was OK but the clients weren't able to connect and I had to restart the failing node and reconect the clients manually to make it work again (even I think that the active MDS was in another node). I don't know if maybe is because the server was not fully down, and only some connections were failing. I'll do some tests too see. Another question: How many memory needs a node to work?, because I've nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high memory usage (more than 1GB on the OSD). The OSD size is 50GB and the data that contains is less than 3GB. Thanks, and Greetings!! 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>: > Since your app is an Apache / php app is it possible for you to > reconfigure the app to use S3 module rather than a posix open file()? Then > with Ceph drop CephFS and configure Civetweb S3 gateway? You can have > "active-active" endpoints with round robin dns or F5 or something. You > would also have to repopulate objects into the rados pools. > > Also increase that size parameter to 3. ;-) > > Lots of work for active-active but the whole stack will be much more > resilient coming from some with a ClearCase / NFS / stale file handles up > the wazoo background > > > > On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com> > wrote: > >> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>: >> >>> I have an incredibly light-weight cephfs configuration. I set up an MDS >>> on each mon (3 total), and have 9TB of data in cephfs. This data only has >>> 1 client that reads a few files at a time. I haven't noticed any downtime >>> when it fails over to a standby MDS. So it definitely depends on your >>> workload as to how a failover will affect your environment. >>> >>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com> >>> wrote: >>> >>>> We use the following in our ceph.conf for MDS failover. We're running >>>> one active and one standby. Last time it failed over there was about 2 >>>> minutes of downtime before the mounts started responding again but it did >>>> recover gracefully. >>>> >>>> [mds] >>>> max_mds = 1 >>>> mds_standby_for_rank = 0 >>>> mds_standby_replay = true >>>> >>>> ___ >>>> >>>> John Petrini >>>> ___ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >> >> Thanks to both. >> Just now i'm working on that because I needs a very fast failover. For >> now the tests give me a very fast response when an OSD fails (about 5 >> seconds), but a very slow response when the main MDS fails (I've not tested >> the real time, but was not working for a long time). Maybe was because I >> created the other MDS after mount, because I've done some test just before >> send this email and now looks very fast (i've not noticed the downtime). >> >> Greetings!! >> >> >> -- >> _ >> >> Daniel Carrasco Marín >> Ingeniería para la Innovación i2TIC, S.L. >> Tlf: +34 911 12 32 84 Ext: 223 >> www.i2tic.com >> _ >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>: > I have an incredibly light-weight cephfs configuration. I set up an MDS > on each mon (3 total), and have 9TB of data in cephfs. This data only has > 1 client that reads a few files at a time. I haven't noticed any downtime > when it fails over to a standby MDS. So it definitely depends on your > workload as to how a failover will affect your environment. > > On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com> > wrote: > >> We use the following in our ceph.conf for MDS failover. We're running one >> active and one standby. Last time it failed over there was about 2 minutes >> of downtime before the mounts started responding again but it did recover >> gracefully. >> >> [mds] >> max_mds = 1 >> mds_standby_for_rank = 0 >> mds_standby_replay = true >> >> ___ >> >> John Petrini >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > Thanks to both. Just now i'm working on that because I needs a very fast failover. For now the tests give me a very fast response when an OSD fails (about 5 seconds), but a very slow response when the main MDS fails (I've not tested the real time, but was not working for a long time). Maybe was because I created the other MDS after mount, because I've done some test just before send this email and now looks very fast (i've not noticed the downtime). Greetings!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.
2017-06-12 10:49 GMT+02:00 Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de>: > Hi, > > > On 06/12/2017 10:31 AM, Daniel Carrasco wrote: > >> Hello, >> >> I'm very new on Ceph, so maybe this question is a noob question. >> >> We have an architecture that have some web servers (nginx, php...) with a >> common File Server through NFS. Of course that is a SPOF, so we want to >> create a multi FS to avoid future problems. >> >> We've already tested GlusterFS, but is very slow reading small files >> using the oficial client (from 600ms to 1700ms to read the Doc page), and >> through NFS Ganesha it fails a lot (permissions errors, 404 when the file >> exists...). >> The next we're trying is Ceph that looks very well and have a good >> performance even with small files (near to NFS performance: 90-100ms to >> 100-120ms), but on some tests that I've done, it stop working when an OSD >> is down. >> >> My test architecture are two servers with one OSD and one MON each, and a >> third with a MON and an MDS. I've configured the cluster to have two copies >> of every PG (just like a RAID1) and all looks fine (health OK, three >> monitors...). >> My test client also works fine: it connects to the cluster and is able to >> server the webpage without problems, but my problems comes when an OSD goes >> down. The cluster detects that is down, it shows like needs more OSD to >> keep the two copies, designates a new MON and looks like is working, but >> the client is unable to receive new files until I power on the OSD again >> (it happens with both OSD). >> >> My question is: Is there any way to say Ceph to keep serving files even >> when an OSD is down? >> > > I assume the data pool is configured with size=2 and min_size=2. This > means that you need two active replicates to allow I/O to a PG. With one > OSD down this requirement cannot be met. > > You can either: > - add a third OSD > - set min_size=1 > The later might be fine for test setup, but do not run this configuration > in production. NEVER. EVER. Search the mailing list for more details. Thanks!! , just what I thought, a noob question hehe. Now is working. I'll search later in list, but looks like is to avoid split brain or similar. > > > > >> >> My other question is about MDS: >> Multi-MDS enviorement is stable?, because if I have multiple FS to avoid >> SPOF and I only can deploy an MDS, then we have a new SPOF... >> This is to know if maybe i need to use Block Devices pools instead File >> Server pools. >> > > AFAIK active/active MDS setups are still considered experimental; > active/standby(-replay) is a supported setup. We currently use one active > and one standby-replay MDS for our CephFS instance serving several million > files. > > Failover between the MDS works, but might run into problems with a large > number of open files (each requiring a stat operation). Depending on the > number of open files failover takes some seconds up to 5-10 minutes in our > setup. > Thanks again for your response, Is not for performance purporse so an active/standby will be enough. I'll search about this configuration. About time, always is better to keep the page down for some seconds instead wait for an admin to fix it. > > Regards, > Burkhard Linke > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > Greetings!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HA of MDS daemon.
Hello, I'm very new on Ceph, so maybe this question is a noob question. We have an architecture that have some web servers (nginx, php...) with a common File Server through NFS. Of course that is a SPOF, so we want to create a multi FS to avoid future problems. We've already tested GlusterFS, but is very slow reading small files using the oficial client (from 600ms to 1700ms to read the Doc page), and through NFS Ganesha it fails a lot (permissions errors, 404 when the file exists...). The next we're trying is Ceph that looks very well and have a good performance even with small files (near to NFS performance: 90-100ms to 100-120ms), but on some tests that I've done, it stop working when an OSD is down. My test architecture are two servers with one OSD and one MON each, and a third with a MON and an MDS. I've configured the cluster to have two copies of every PG (just like a RAID1) and all looks fine (health OK, three monitors...). My test client also works fine: it connects to the cluster and is able to server the webpage without problems, but my problems comes when an OSD goes down. The cluster detects that is down, it shows like needs more OSD to keep the two copies, designates a new MON and looks like is working, but the client is unable to receive new files until I power on the OSD again (it happens with both OSD). My question is: Is there any way to say Ceph to keep serving files even when an OSD is down? My other question is about MDS: Multi-MDS enviorement is stable?, because if I have multiple FS to avoid SPOF and I only can deploy an MDS, then we have a new SPOF... This is to know if maybe i need to use Block Devices pools instead File Server pools. Thanks!!! and greetings!! -- _ Daniel Carrasco Marín Ingeniería para la Innovación i2TIC, S.L. Tlf: +34 911 12 32 84 Ext: 223 www.i2tic.com _ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com