from:"Daniel Carrasco"

Re: [ceph-users] Best version and SO for CefhFS

2018-10-10 Thread Daniel Carrasco

Ok, thanks. The I'll use standby-replay mode (typo error on other mail).

Greetings!!

El mié., 10 oct. 2018 a las 13:06, Sergey Malinin ()
escribió:

> Standby MDS is required for HA. It can be configured in standby-replay
> mode for faster failover. Otherwise, replaying the journal is incurred
> which can take somewhat longer.
>
>
> On 10.10.2018, at 13:57, Daniel Carrasco  wrote:
>
> Thanks for your response.
>
> I'll point in that direction.
> I also need a fast recovery in case that MDS die so, Standby MDS are
> recomended or recovery is fast enought to be useful?
>
> Greetings!
>
> El mié., 10 oct. 2018 a las 12:26, Sergey Malinin ()
> escribió:
>
>>
>>
>> On 10.10.2018, at 10:49, Daniel Carrasco  wrote:
>>
>>
>>- Wich is the best configuration to avoid that MDS problems.
>>
>> Single active MDS with lots of RAM.
>>
>>
>
> --
> _
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>
>
>

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Best version and SO for CefhFS

2018-10-10 Thread Daniel Carrasco

Thanks for your response.

I'll point in that direction.
I also need a fast recovery in case that MDS die so, Standby MDS are
recomended or recovery is fast enought to be useful?

Greetings!

El mié., 10 oct. 2018 a las 12:26, Sergey Malinin ()
escribió:

>
>
> On 10.10.2018, at 10:49, Daniel Carrasco  wrote:
>
>
>- Wich is the best configuration to avoid that MDS problems.
>
> Single active MDS with lots of RAM.
>
>

-- 
_________

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Best version and SO for CefhFS

2018-10-10 Thread Daniel Carrasco

Hello,

I'm trying to create a simple cluster to archieve HA on a webpage:

   - Three nodes with MDS, OSD, MON, y MGR
   - Replication factor of three (one copy on every node)
   - Two active and a backup MDS to allow a fail of one server
   - CephFS mounted using kernel driver
   - One disk by node of less than 500GB

I've already tested other solutions like EFS, GlusterFS, NFS Master-Slave,
and all are slower than CephFS or don't have HA (NFS).
For now I've got a lot of troubles with FS (all MDS related), like high
memory consumption caused by memory leaks (12.2.4), slow MDS requests after
update and after a few hours (12.2.8, resolved restarting the MDS)... and
for now I had to remove the entire cluster and mount a DRBD cluster (wich
is not as good as Ceph and don't have HA).

With all this problems my questions are:

   - Wich is the best free OS to create an small cluster?. For now i've
   used only debian based (Debian 9 and Ubuntu 16.04).
   - Wich is the vest Ceph version?. Maybe the 10.2.10 is more stable?
   - Wich is the best configuration to avoid that MDS problems.

Our servers have 8GB of RAM, but is shared with other daemons that uses
about 4GB.

Thanks!

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-08 Thread Daniel Carrasco

El lun., 8 oct. 2018 5:44, Yan, Zheng  escribió:

> On Mon, Oct 8, 2018 at 11:34 AM Daniel Carrasco 
> wrote:
> >
> > I've got several problems on 12.2.8 too. All my standby MDS uses a lot
> of memory (while active uses normal memory), and I'm receiving a lot of
> slow MDS messages (causing the webpage to freeze and fail until MDS are
> restarted)... Finally I had to copy the entire site to DRBD and use NFS to
> solve all problems...
> >
>
> was standby-replay enabled?
>

I've tried both and I've seen more less the same behavior, maybe less when
is not in replay mode.

Anyway, we've deactivated CephFS for now there. I'll try with older
versions on a test environment


> > El lun., 8 oct. 2018 a las 5:21, Alex Litvak (<
> alexander.v.lit...@gmail.com>) escribió:
> >>
> >> How is this not an emergency announcement?  Also I wonder if I can
> >> downgrade at all ?  I am using ceph with docker deployed with
> >> ceph-ansible.  I wonder if I should push downgrade or basically wait for
> >> the fix.  I believe, a fix needs to be provided.
> >>
> >> Thank you,
> >>
> >> On 10/7/2018 9:30 PM, Yan, Zheng wrote:
> >> > There is a bug in v13.2.2 mds, which causes decoding purge queue to
> >> > fail. If mds is already in damaged state, please downgrade mds to
> >> > 13.2.1, then run 'ceph mds repaired fs_name:damaged_rank' .
> >> >
> >> > Sorry for all the trouble I caused.
> >> > Yan, Zheng
> >> >
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> > --
> > _
> >
> >   Daniel Carrasco Marín
> >   Ingeniería para la Innovación i2TIC, S.L.
> >   Tlf:  +34 911 12 32 84 Ext: 223
> >   www.i2tic.com
> > _
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Don't upgrade to 13.2.2 if you use cephfs

2018-10-07 Thread Daniel Carrasco

I've got several problems on 12.2.8 too. All my standby MDS uses a lot of
memory (while active uses normal memory), and I'm receiving a lot of slow
MDS messages (causing the webpage to freeze and fail until MDS are
restarted)... Finally I had to copy the entire site to DRBD and use NFS to
solve all problems...

El lun., 8 oct. 2018 a las 5:21, Alex Litvak ()
escribió:

> How is this not an emergency announcement?  Also I wonder if I can
> downgrade at all ?  I am using ceph with docker deployed with
> ceph-ansible.  I wonder if I should push downgrade or basically wait for
> the fix.  I believe, a fix needs to be provided.
>
> Thank you,
>
> On 10/7/2018 9:30 PM, Yan, Zheng wrote:
> > There is a bug in v13.2.2 mds, which causes decoding purge queue to
> > fail. If mds is already in damaged state, please downgrade mds to
> > 13.2.1, then run 'ceph mds repaired fs_name:damaged_rank' .
> >
> > Sorry for all the trouble I caused.
> > Yan, Zheng
> >
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Connect client to cluster on other subnet

2018-08-23 Thread Daniel Carrasco

Hello,

I've a Ceph cluster working on a subnet where clients on same subnet can
connect without problem, but now I need to connect some clients that are on
other subnet and I'm getting a connection timeout error.
Both subnets are connected and I've disabled the FW for testing if maybe is
blocker, but still failing. I'm able to connect to Ceph ports using telnet,
and even I see how client connection is logged on server:
2018-08-23 14:41:09.937766 7ff80700  0 -- 10.22.0.168:6789/0 >>
10.20.0.185:0/1905845915 pipe(0x55a332b9d400 sd=10 :6789 s=0 pgs=0 cs=0 l=0
c=0x55a332b0bc20).accept peer addr is really 10.20.0.185:0/1905845915
(socket is -)

But I still getting the timeout problem.

The Ceph cluster is on 10.20.0.0/24 network, and client is on 10.22.0.0/24
network. My public network configuration is *public network = 10.20.0.0/24
<http://10.20.0.0/24>*. Maybe Is just adding the other subnet to public
network (*public network = 10.20.0.0/24,10.22.0.0/24
<http://10.20.0.0/24,10.22.0.0/24>)*, and add *cluster network =
10.20.0.0/24 <http://10.20.0.0/24>* to config file, but is a production
cluster and I need to be sure.

Someone has tried it?

Thanks!!

-- 
_____

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-26 Thread Daniel Carrasco

Hello, just to report,

Looks like change the message type to simple help to avoid the memory leak.
Just about a day later the memory still OK:
   1264 ceph  20   0 12,547g 1,247g  16652 S   3,3  8,2 110:16.93
ceph-mds


The memory usage is more than 2x of MDS limit (512Mb), but maybe is the
daemon overhead and the memory fragmentation. At least is not 13-15Gb like
before.

Greetings!!

2018-07-25 23:16 GMT+02:00 Daniel Carrasco :

> I've changed the configuration adding your line and changing the mds
> memory limit to 512Mb, and for now looks stable (its on about 3-6% and
> sometimes even below 3%). I've got a very high usage on boot:
> 1264 ceph  20   0 12,543g 6,251g  16184 S   2,0 41,1%   0:19.34
> ceph-mds
>
> but now looks acceptable:
> 1264 ceph  20   0 12,543g 737952  16188 S   1,0  4,6%   0:41.05
> ceph-mds
>
> Anyway, I need time to test it, because 15 minutes is too less.
>
> Greetings!!
>
> 2018-07-25 17:16 GMT+02:00 Daniel Carrasco :
>
>> Hello,
>>
>> Thanks for all your help.
>>
>> The dd is an option of any command?, because at least on Debian/Ubuntu is
>> an aplication to copy blocks, and then fails.
>> For now I cannot change the configuration, but later I'll try.
>> About the logs, I've not seen nothing about "warning", "error", "failed",
>> "message" or something similar, so looks like there are no messages of that
>> kind.
>>
>>
>> Greetings!!
>>
>> 2018-07-25 14:48 GMT+02:00 Yan, Zheng :
>>
>>> On Wed, Jul 25, 2018 at 8:12 PM Yan, Zheng  wrote:
>>> >
>>> > On Wed, Jul 25, 2018 at 5:04 PM Daniel Carrasco 
>>> wrote:
>>> > >
>>> > > Hello,
>>> > >
>>> > > I've attached the PDF.
>>> > >
>>> > > I don't know if is important, but I made changes on configuration
>>> and I've restarted the servers after dump that heap file. I've changed the
>>> memory_limit to 25Mb to test if stil with aceptable values of RAM.
>>> > >
>>> >
>>> > Looks like there are memory leak in async messenger.  what's output of
>>> > "dd /usr/bin/ceph-mds"? Could you try simple messenger (add "ms type =
>>> > simple" to 'global' section of ceph.conf)
>>> >
>>>
>>> Besides, are there any suspicious messages in mds log? such as "failed
>>> to decode message of type"
>>>
>>>
>>>
>>>
>>> > Regards
>>> > Yan, Zheng
>>> >
>>> > > Greetings!
>>> > >
>>> > > 2018-07-25 2:53 GMT+02:00 Yan, Zheng :
>>> > >>
>>> > >> On Wed, Jul 25, 2018 at 4:52 AM Daniel Carrasco <
>>> d.carra...@i2tic.com> wrote:
>>> > >> >
>>> > >> > Hello,
>>> > >> >
>>> > >> > I've run the profiler for about 5-6 minutes and this is what I've
>>> got:
>>> > >> >
>>> > >>
>>> > >> please run pprof --pdf /usr/bin/ceph-mds
>>> > >> /var/log/ceph/ceph-mds.x.profile..heap >
>>> > >> /tmp/profile.pdf. and send me the pdf
>>> > >>
>>> > >>
>>> > >>
>>> > >> > 
>>> 
>>> > >> > 
>>> 
>>> > >> > 
>>> 
>>> > >> > Using local file /usr/bin/ceph-mds.
>>> > >> > Using local file /var/log/ceph/mds.kavehome-mgt
>>> o-pro-fs01.profile.0009.heap.
>>> > >> > Total: 400.0 MB
>>> > >> >362.5  90.6%  90.6%362.5  90.6%
>>> ceph::buffer::create_aligned_in_mempool
>>> > >> > 20.4   5.1%  95.7% 29.8   7.5% CDir::_load_dentry
>>> > >> >  5.9   1.5%  97.2%  6.9   1.7% CDir::add_primary_dentry
>>> > >> >  4.7   1.2%  98.4%  4.7   1.2%
>>> ceph::logging::Log::create_entry
>>> > >> >  1.8   0.5%  98.8%  1.8   0.5%
>>> std::_Rb_tree::_M_emplace_hint_unique
>>> > >> >  1.8   0.5%  99.3%  2.2   0.5% compact_map_base::decode
>>> > >> >  0.6   0.1%  99.4%  0

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Daniel Carrasco

Hello,

How many time is neccesary?, because is a production environment and memory
profiler + low cache size because the problem, gives a lot of CPU usage
from OSD and MDS that makes it fails while profiler is running. Is there
any problem if is done in a low traffic time? (less usage and maybe it
don't fails, but maybe less info about usage).

Greetings!

2018-07-24 10:21 GMT+02:00 Yan, Zheng :

> I mean:
>
> ceph tell mds.x heap start_profiler
>
> ... wait for some time
>
> ceph tell mds.x heap stop_profiler
>
> pprof --text  /usr/bin/ceph-mds
> /var/log/ceph/ceph-mds.x.profile..heap
>
>
>
>
> On Tue, Jul 24, 2018 at 3:18 PM Daniel Carrasco 
> wrote:
> >
> > This is what i get:
> >
> > 
> > 
> > 
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap dump
> > 2018-07-24 09:05:19.350720 7fc562ffd700  0 client.1452545
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:05:29.103903 7fc563fff700  0 client.1452548
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > mds.kavehome-mgto-pro-fs01 dumping heap profile now.
> > 
> > MALLOC:  760199640 (  725.0 MiB) Bytes in use by application
> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > MALLOC: +246962320 (  235.5 MiB) Bytes in central cache freelist
> > MALLOC: + 43933664 (   41.9 MiB) Bytes in transfer cache freelist
> > MALLOC: + 41012664 (   39.1 MiB) Bytes in thread cache freelists
> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> > MALLOC:   
> > MALLOC: =   1102295200 ( 1051.2 MiB) Actual memory used (physical + swap)
> > MALLOC: +   4268335104 ( 4070.6 MiB) Bytes released to OS (aka unmapped)
> > MALLOC:   
> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> > MALLOC:
> > MALLOC:  33027  Spans in use
> > MALLOC: 19  Thread heaps in use
> > MALLOC:   8192  Tcmalloc page size
> > 
> > Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> > Bytes released to the OS take up virtual address space but no physical
> memory.
> >
> >
> > 
> > 
> > 
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> > 2018-07-24 09:14:25.747706 7f94f700  0 client.1452578
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:14:25.754034 7f95057fa700  0 client.1452581
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> > MALLOC:  960649328 (  916.1 MiB) Bytes in use by application
> > MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> > MALLOC: +108867288 (  103.8 MiB) Bytes in central cache freelist
> > MALLOC: + 37179424 (   35.5 MiB) Bytes in transfer cache freelist
> > MALLOC: + 40143000 (   38.3 MiB) Bytes in thread cache freelists
> > MALLOC: + 10186912 (9.7 MiB) Bytes in malloc metadata
> > MALLOC:   
> > MALLOC: =   1157025952 ( 1103.4 MiB) Actual memory used (physical + swap)
> > MALLOC: +   4213604352 ( 4018.4 MiB) Bytes released to OS (aka unmapped)
> > MALLOC:   
> > MALLOC: =   5370630304 ( 5121.8 MiB) Virtual address space used
> > MALLOC:
> > MALLOC:  33028  Spans in use
> > MALLOC: 19  Thread heaps in use
> > MALLOC:   8192  Tcmalloc page size
> > 
> > Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> > Bytes released to the OS take up virtual address space but no physical
> memory.
> >
> > 
> > 
> > 
> > After heap release:
> > :/# ceph tell mds.kavehome-mgto-pro-fs01 heap stats
> > 2018-07-24 09:15:28.540203 7f2f7affd700  0 client.1443339
> ms_handle_reset on 10.22.0.168:6800/1685786126
> > 2018-07-24 09:15:28.547153 7f2f7bfff700  0 client.1443342
> ms_handle_reset on 10.22.0.168:6

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-24 Thread Daniel Carrasco

t;
> http://docs.ceph.com/docs/mimic/rados/troubleshooting/memory-profiling/
> On Tue, Jul 24, 2018 at 7:54 AM Daniel Carrasco 
> wrote:
> >
> > Yeah, is also my thread. This thread was created before lower the cache
> size from 512Mb to 8Mb. I thought that maybe was my fault and I did a
> misconfiguration, so I've ignored the problem until now.
> >
> > Greetings!
> >
> > El mar., 24 jul. 2018 1:00, Gregory Farnum 
> escribió:
> >>
> >> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly 
> wrote:
> >>>
> >>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco 
> wrote:
> >>> > Hi, thanks for your response.
> >>> >
> >>> > Clients are about 6, and 4 of them are the most of time on standby.
> Only two
> >>> > are active servers that are serving the webpage. Also we've a
> varnish on
> >>> > front, so are not getting all the load (below 30% in PHP is not
> much).
> >>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
> >>>
> >>> What! Please post `ceph daemon mds. config diff`,  `... perf
> >>> dump`, and `... dump_mempools `  from the server the active MDS is on.
> >>>
> >>> > I've tested
> >>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage
> grows up to
> >>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb,
> at least
> >>> > the memory usage is stable on less than 6Gb (now is using about 1GB
> of RAM).
> >>>
> >>> We've seen reports of possible memory leaks before and the potential
> >>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
> >>> Your MDS cache size should be configured to 1-8GB (depending on your
> >>> preference) so it's disturbing to see you set it so low.
> >>
> >>
> >> See also the thread "[ceph-users] Fwd: MDS memory usage is very high",
> which had more discussion of that. The MDS daemon seemingly had 9.5GB of
> allocated RSS but only believed 489MB was in use for the cache...
> >> -Greg
> >>
> >>>
> >>>
> >>> --
> >>> Patrick Donnelly
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco

Yeah, is also my thread. This thread was created before lower the cache
size from 512Mb to 8Mb. I thought that maybe was my fault and I did a
misconfiguration, so I've ignored the problem until now.

Greetings!

El mar., 24 jul. 2018 1:00, Gregory Farnum  escribió:

> On Mon, Jul 23, 2018 at 11:08 AM Patrick Donnelly 
> wrote:
>
>> On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco 
>> wrote:
>> > Hi, thanks for your response.
>> >
>> > Clients are about 6, and 4 of them are the most of time on standby.
>> Only two
>> > are active servers that are serving the webpage. Also we've a varnish on
>> > front, so are not getting all the load (below 30% in PHP is not much).
>> > About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.
>>
>> What! Please post `ceph daemon mds. config diff`,  `... perf
>> dump`, and `... dump_mempools `  from the server the active MDS is on.
>>
>> > I've tested
>> > also 512Mb, but the CPU usage is the same and the MDS RAM usage grows
>> up to
>> > 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
>> least
>> > the memory usage is stable on less than 6Gb (now is using about 1GB of
>> RAM).
>>
>> We've seen reports of possible memory leaks before and the potential
>> fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
>> Your MDS cache size should be configured to 1-8GB (depending on your
>> preference) so it's disturbing to see you set it so low.
>>
>
> See also the thread "[ceph-users] Fwd: MDS memory usage is very high",
> which had more discussion of that. The MDS daemon seemingly had 9.5GB of
> allocated RSS but only believed 489MB was in use for the cache...
> -Greg
>
>
>>
>> --
>> Patrick Donnelly
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco

Hi,

I forgot to say that maybe the Diff is lower than real (8Mb), because the
memory usage was still high and i've prepared a new configuration with
lower limit (5Mb). I've not reloaded the daemons for now, but maybe the
configuration was loaded again today and that's the reason why is using
less than 1Gb of RAM just now. Of course I've not rebooted the machine, but
maybe if the daemon was killed for high memory usage then the new
configuration is loaded now.

Greetings!


2018-07-23 21:07 GMT+02:00 Daniel Carrasco :

> Thanks!,
>
> It's true that I've seen a continuous memory growth, but I've not thought
> in a memory leak. I don't remember exactly how many hours were neccesary to
> fill the memory, but I calculate that were about 14h.
>
> With the new configuration looks like memory grows slowly and when it
> reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and
> down again to less than 1Gb grown again to 5-6Gb slowly.
>
> Just today I don't know why and how, because I've not changed anything on
> the ceph cluster, but the memory has down to less than 1 Gb and still there
> 8 hours later. I've only deployed a git repository with some changes.
>
> I've some nodes on version 12.2.5 because I've detected this problem and I
> didn't know if was for the latest version, so I've stopped the update. The
> one that is the active MDS is on latest version (12.2.7), and I've
> programmed an update for the rest of nodes the thursday.
>
> A graphic of the memory usage of latest days with that configuration:
> https://imgur.com/a/uSsvBi4
>
> I haven't info about when the problem was worst (512MB of MDS memory limit
> and 15-16Gb of usage), because memory usage was not logged. I've only a
> heap stats from that were dumped when the daemon was in progress to fill
> the memory:
>
> # ceph tell mds.kavehome-mgto-pro-fs01  heap stats
> 2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> 2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset
> on 10.22.0.168:6800/1129848128
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
> MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
> MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
> MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
> MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
> MALLOC:
> MALLOC:  63875  Spans in use
> MALLOC: 16  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
>
>
> Here's the Diff:
> 
> 
> {
> "diff": {
> "current": {
> "admin_socket": "/var/run/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.asok",
> "auth_client_required": "cephx",
> "bluestore_cache_size_hdd": "80530636",
> "bluestore_cache_size_ssd": "80530636",
> "err_to_stderr": "true",
> "fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8",
> "internal_safe_to_start_threads": "true",
> "keyring": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/
> keyring",
> "log_file": "/var/log/ceph/ceph-mds.
> kavehome-mgto-pro-fs01.log",
> "log_max_recent": "1",
> "log_to_stderr": "false",
> "mds_cache_memory_limit": "53687091",
> "mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01",
> "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01",
> "mon_cluster_log_file": "default=/var/log/ceph/ceph.$channel.log
> cluster=/var/log/ceph/ceph.log&qu

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-23 Thread Daniel Carrasco

Hi,

I forgot to say that maybe the Diff is lower than real (8Mb), because the
memory usage was still high and i've prepared a new configuration with
lower limit (5Mb). I've not reloaded the daemons for now, but maybe the
configuration was loaded again today and that's the reason why is using
less than 1Gb of RAM just now. Of course I've not rebooted the machine, but
maybe if the daemon was killed for high memory usage then the new
configuration is loaded now.

Greetings!

2018-07-19 11:35 GMT+02:00 Daniel Carrasco :

> Hello again,
>
> It is still early to say that is working fine now, but looks like the MDS
> memory is now under 20% of RAM and the most of time between 6-9%. Maybe was
> a mistake on configuration.
>
> As appointment, I've changed this client config:
> [global]
> ...
> bluestore_cache_size_ssd = 805306360
> bluestore_cache_size_hdd = 805306360
> mds_cache_memory_limit = 536870910
>
> [client]
>   client_reconnect_stale = true
>   client_cache_size = 32768
>   client_mount_timeout = 30
>   client_oc_max_objects = 2000
>   client_oc_size = 629145600
>   rbd_cache = true
>   rbd_cache_size = 671088640
>
>
> for this (just client cache sizes / 10):
> [global]
> ...
> bluestore_cache_size_ssd = 80530636
> bluestore_cache_size_hdd = 80530636
> mds_cache_memory_limit = 53687091
>
> [client]
>   client_cache_size = 32768
>   client_mount_timeout = 30
>   client_oc_max_objects = 2000
>   client_oc_size = 62914560
>   rbd_cache = true
>   rbd_cache_size = 67108864
>
>
>
> Now the heap stats are:
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC:  714063568 (  681.0 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +132992224 (  126.8 MiB) Bytes in central cache freelist
> MALLOC: + 21929920 (   20.9 MiB) Bytes in transfer cache freelist
> MALLOC: + 31806608 (   30.3 MiB) Bytes in thread cache freelists
> MALLOC: + 30666912 (   29.2 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =931459232 (  888.3 MiB) Actual memory used (physical + swap)
> MALLOC: +  21886803968 (20872.9 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =  22818263200 (21761.2 MiB) Virtual address space used
> MALLOC:
> MALLOC:  21311  Spans in use
> MALLOC: 18  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> 
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
> And sometimes even better (taken later than above):
> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
> 
> MALLOC:  516434072 (  492.5 MiB) Bytes in use by application
> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
> MALLOC: +  7564936 (7.2 MiB) Bytes in central cache freelist
> MALLOC: +  2751072 (2.6 MiB) Bytes in transfer cache freelist
> MALLOC: +  2707072 (2.6 MiB) Bytes in thread cache freelists
> MALLOC: +  2715808 (2.6 MiB) Bytes in malloc metadata
> MALLOC:   
> MALLOC: =532172960 (  507.5 MiB) Actual memory used (physical + swap)
> MALLOC: +   573440 (0.5 MiB) Bytes released to OS (aka unmapped)
> MALLOC:   
> MALLOC: =532746400 (  508.1 MiB) Virtual address space used
> MALLOC:
> MALLOC:  21990  Spans in use
> MALLOC: 16  Thread heaps in use
> MALLOC:   8192  Tcmalloc page size
> --------
> Call ReleaseFreeMemory() to release freelist memory to the OS (via
> madvise()).
> Bytes released to the OS take up virtual address space but no physical
> memory.
>
>
> Greetings!!
>
> 2018-07-19 10:24 GMT+02:00 Daniel Carrasco :
>
>> Hello,
>>
>> Finally I've to remove CephFS and use a simple NFS, because the MDS
>> daemon starts to use a lot of memory and is unstable. After reboot one node
>> because it started to swap (the cluster will be able to survive without a
>> node), the cluster goes down because one of the other MDS starts to use
>> about 15Gb of RAM and crash all the time, so the cluster is unable to come
>> back. The only solution is to reboot all nodes and is not good for HA.
>>
>> If somebody knows something about this, I'll be pleased to test it on a
>> test environment to see if we can find a solution.
>>
>> Greetings!
>>
>> 2018-

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco

,
"osdop_sparse_read": 0,
"osdop_clonerange": 0,
"osdop_getxattr": 100784,
"osdop_setxattr": 0,
"osdop_cmpxattr": 0,
"osdop_rmxattr": 0,
"osdop_resetxattrs": 0,
"osdop_tmap_up": 0,
"osdop_tmap_put": 0,
"osdop_tmap_get": 0,
"osdop_call": 0,
"osdop_watch": 0,
"osdop_notify": 0,
"osdop_src_cmpxattr": 0,
"osdop_pgls": 0,
"osdop_pgls_filter": 0,
"osdop_other": 3,
"linger_active": 0,
"linger_send": 0,
"linger_resend": 0,
"linger_ping": 0,
"poolop_active": 0,
"poolop_send": 0,
"poolop_resend": 0,
"poolstat_active": 0,
"poolstat_send": 0,
"poolstat_resend": 0,
"statfs_active": 0,
"statfs_send": 0,
"statfs_resend": 0,
"command_active": 0,
"command_send": 0,
"command_resend": 0,
"map_epoch": 468,
"map_full": 0,
"map_inc": 39,
"osd_sessions": 3,
"osd_session_open": 479,
"osd_session_close": 476,
"osd_laggy": 0,
"omap_wr": 7,
"omap_rd": 202074,
"omap_del": 1
},
"purge_queue": {
"pq_executing_ops": 0,
"pq_executing": 0,
"pq_executed": 124
},
"throttle-msgr_dispatch_throttler-mds": {
"val": 0,
"max": 104857600,
"get_started": 0,
"get": 6140428,
"get_sum": 2077944682,
"get_or_fail_fail": 0,
"get_or_fail_success": 6140428,
"take": 0,
"take_sum": 0,
"put": 6140428,
"put_sum": 2077944682,
"wait": {
"avgcount": 0,
"sum": 0.0,
"avgtime": 0.0
}
},
"throttle-objecter_bytes": {
"val": 0,
"max": 104857600,
"get_started": 0,
"get": 0,
    "get_sum": 0,
"get_or_fail_fail": 0,
"get_or_fail_success": 0,
"take": 136767,
"take_sum": 339484250,
"put": 136523,
"put_sum": 339484250,
"wait": {
"avgcount": 0,
"sum": 0.0,
"avgtime": 0.0
}
},
"throttle-objecter_ops": {
"val": 0,
"max": 1024,
"get_started": 0,
"get": 0,
"get_sum": 0,
"get_or_fail_fail": 0,
"get_or_fail_success": 0,
"take": 136767,
"take_sum": 136767,
"put": 136767,
"put_sum": 136767,
"wait": {
"avgcount": 0,
"sum": 0.0,
"avgtime": 0.0
}
},
"throttle-write_buf_throttle": {
"val": 0,
"max": 3758096384,
"get_started": 0,
"get": 124,
"get_sum": 11532,
"get_or_fail_fail": 0,
"get_or_fail_success": 124,
"take": 0,
"take_sum": 0,
"put": 109,
"put_sum": 11532,
"wait": {
"avgcount": 0,
"sum": 0.0,
"avgtime": 0.0
}
},
"throttle-write_buf_throttle-0x55faf5ba4220": {
"val": 0,
"max": 3758096384,
"get_started": 0,
"get": 125666,
"get_sum": 198900816,
"get_or_fail_fail": 0,
"get_or_fail_success": 125666,
"take": 0,
"take_sum": 0,
"put": 23473,
"put_sum": 198900816,
"wait": {
"avgcount": 0,
"sum": 0.0,
"avgtime": 0.0
}
}
}
--



dump_mempools
--
{
"bloom_filter": {
"items": 120,
"bytes": 120
},
"bluestore_alloc": {
"items": 0,
"bytes": 0
},
"bluestore_cache_data": {
"items": 0,
"bytes": 0
},
"bluestore_cache_onode": {
"items": 0,
"bytes": 0
},
"bluestore_cache_other": {
"items": 0,
"bytes": 0
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 0,
"bytes": 0
},
"bluestore_writing_deferred": {
"items": 0,
"bytes": 0
},
"bluestore_writing": {
"items": 0,
"bytes": 0
},
"bluefs": {
"items": 0,
"bytes": 0
},
"buffer_anon": {
"items": 96401,
"bytes": 16010198
},
"buffer_meta": {
"items": 1,
"bytes": 88
},
"osd": {
"items": 0,
"bytes": 0
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 0,
"bytes": 0
},
"osdmap": {
"items": 80,
"bytes": 3296
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 17604,
"bytes": 2330840
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
},
"total": {
"items": 114206,
"bytes": 18344542
}
}
---


Sorry for my english!.


Greetings!!



El 23 jul. 2018 20:08, "Patrick Donnelly"  escribió:

On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco 
wrote:
> Hi, thanks for your response.
>
> Clients are about 6, and 4 of them are the most of time on standby. Only
two
> are active servers that are serving the webpage. Also we've a varnish on
> front, so are not getting all the load (below 30% in PHP is not much).
> About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.

What! Please post `ceph daemon mds. config diff`,  `... perf
dump`, and `... dump_mempools `  from the server the active MDS is on.


> I've tested
> also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up
to
> 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
least
> the memory usage is stable on less than 6Gb (now is using about 1GB of
RAM).

We've seen reports of possible memory leaks before and the potential
fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
Your MDS cache size should be configured to 1-8GB (depending on your
preference) so it's disturbing to see you set it so low.


-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco

Hi, thanks for your response.

Clients are about 6, and 4 of them are the most of time on standby. Only
two are active servers that are serving the webpage. Also we've a varnish
on front, so are not getting all the load (below 30% in PHP is not much).
About the MDS cache, now I've the mds_cache_memory_limit at 8Mb. I've
tested also 512Mb, but the CPU usage is the same and the MDS RAM usage
grows up to 15GB (on a 16Gb server it starts to swap and all fails). With
8Mb, at least the memory usage is stable on less than 6Gb (now is using
about 1GB of RAM).

What catches my attention, is the huge difference between kernel and fuse.
Why the kernel client is not notizable and the fuse client is using the
most of CPU power...

Greetings.

2018-07-23 14:01 GMT+02:00 Paul Emmerich :

> Hi,
>
> do you happen to have a relatively large number of clients and a
> relatively small cache size on the MDS?
>
>
> Paul
>
> 2018-07-23 13:16 GMT+02:00 Daniel Carrasco :
>
>> Hello,
>>
>> I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds
>> with two active). This cluster is for mainly for server a webpage (small
>> files) and is configured to have three copies of files (a copy on every
>> OSD).
>> My question is about ceph.fuse clients: I've noticed an insane CPU usage
>> when the fuse client is used, while the kernel client usage is unnoticeable.
>>
>> For example, now i've that machines working with kernel client and the
>> CPU usage is less than 30% (all used by php processes). When I change to
>> ceph.fuse the CPU usage raise to more than 130% and even sometimes up to
>> 190-200% (on a two cores machines means burn the CPU).
>>
>> Now I've seen two warnings on the cluster:
>> 1 MDSs report oversized cache
>> 4 clients failing to respond to cache pressure
>>
>> and I think that maybe is a lack of capabilities on ceph kernel modules,
>> so I want to give a try to fuse module but I've the above problem.
>>
>> My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph
>> server/client version is 12.2.7.
>>
>> How I can debug why that CPU usage?.
>>
>> Thanks!
>> --
>> _
>>
>>   Daniel Carrasco Marín
>>   Ingeniería para la Innovación i2TIC, S.L.
>>   Tlf:  +34 911 12 32 84 Ext: 223
>>   www.i2tic.com
>> _
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> <https://maps.google.com/?q=Freseniusstr.+31h+81247+M%C3%BCnchen=gmail=g>
> 81247 München
> <https://maps.google.com/?q=Freseniusstr.+31h+81247+M%C3%BCnchen=gmail=g>
> www.croit.io
> Tel: +49 89 1896585 90
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Insane CPU utilization in ceph.fuse

2018-07-23 Thread Daniel Carrasco

Hello,

I've created a Ceph cluster of 3 nodes (3 mons, 3 osd, 3 mgr and 3 mds with
two active). This cluster is for mainly for server a webpage (small files)
and is configured to have three copies of files (a copy on every OSD).
My question is about ceph.fuse clients: I've noticed an insane CPU usage
when the fuse client is used, while the kernel client usage is unnoticeable.

For example, now i've that machines working with kernel client and the CPU
usage is less than 30% (all used by php processes). When I change to
ceph.fuse the CPU usage raise to more than 130% and even sometimes up to
190-200% (on a two cores machines means burn the CPU).

Now I've seen two warnings on the cluster:
1 MDSs report oversized cache
4 clients failing to respond to cache pressure

and I think that maybe is a lack of capabilities on ceph kernel modules, so
I want to give a try to fuse module but I've the above problem.

My SO is Ubuntu 16.04 x64 with kernel version 4.13.0-45-generic and ceph
server/client version is 12.2.7.

How I can debug why that CPU usage?.

Thanks!
-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-19 Thread Daniel Carrasco

Hello again,

It is still early to say that is working fine now, but looks like the MDS
memory is now under 20% of RAM and the most of time between 6-9%. Maybe was
a mistake on configuration.

As appointment, I've changed this client config:
[global]
...
bluestore_cache_size_ssd = 805306360
bluestore_cache_size_hdd = 805306360
mds_cache_memory_limit = 536870910

[client]
  client_reconnect_stale = true
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 629145600
  rbd_cache = true
  rbd_cache_size = 671088640


for this (just client cache sizes / 10):
[global]
...
bluestore_cache_size_ssd = 80530636
bluestore_cache_size_hdd = 80530636
mds_cache_memory_limit = 53687091

[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 62914560
  rbd_cache = true
  rbd_cache_size = 67108864



Now the heap stats are:
mds.kavehome-mgto-pro-fs01 tcmalloc heap
stats:
MALLOC:  714063568 (  681.0 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +132992224 (  126.8 MiB) Bytes in central cache freelist
MALLOC: + 21929920 (   20.9 MiB) Bytes in transfer cache freelist
MALLOC: + 31806608 (   30.3 MiB) Bytes in thread cache freelists
MALLOC: + 30666912 (   29.2 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =931459232 (  888.3 MiB) Actual memory used (physical + swap)
MALLOC: +  21886803968 (20872.9 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  22818263200 (21761.2 MiB) Virtual address space used
MALLOC:
MALLOC:  21311  Spans in use
MALLOC: 18  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.

And sometimes even better (taken later than above):
mds.kavehome-mgto-pro-fs01 tcmalloc heap
stats:
MALLOC:  516434072 (  492.5 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +  7564936 (7.2 MiB) Bytes in central cache freelist
MALLOC: +  2751072 (2.6 MiB) Bytes in transfer cache freelist
MALLOC: +  2707072 (2.6 MiB) Bytes in thread cache freelists
MALLOC: +  2715808 (2.6 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =532172960 (  507.5 MiB) Actual memory used (physical + swap)
MALLOC: +   573440 (0.5 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =532746400 (  508.1 MiB) Virtual address space used
MALLOC:
MALLOC:  21990  Spans in use
MALLOC: 16  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.


Greetings!!

2018-07-19 10:24 GMT+02:00 Daniel Carrasco :

> Hello,
>
> Finally I've to remove CephFS and use a simple NFS, because the MDS daemon
> starts to use a lot of memory and is unstable. After reboot one node
> because it started to swap (the cluster will be able to survive without a
> node), the cluster goes down because one of the other MDS starts to use
> about 15Gb of RAM and crash all the time, so the cluster is unable to come
> back. The only solution is to reboot all nodes and is not good for HA.
>
> If somebody knows something about this, I'll be pleased to test it on a
> test environment to see if we can find a solution.
>
> Greetings!
>
> 2018-07-19 1:07 GMT+02:00 Daniel Carrasco :
>
>> Thanks again,
>>
>> I was trying to use fuse client instead Ubuntu 16.04 kernel module to see
>> if maybe is a client side problem, but CPU usage on fuse client is very
>> high (a 100% and even more in a two cores machine), so I'd to rever to
>> kernel client that uses much less CPU.
>>
>> Is a web server, so maybe the problem is that. PHP and Nginx should open
>> a lot of files and maybe that uses a lot of RAM.
>>
>> For now I've rebooted the machine because is the only way to free the
>> memory, but I cannot restart the machine every few hours...
>>
>> Greetings!!
>>
>> 2018-07-19 1:00 GMT+02:00 Gregory Farnum :
>>
>>> Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of
>>> the cache! Hopefully one of the current FS users or devs has some idea. All
>>> I can suggest is looking to see if there are a bunch of stuck requests or
&

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-19 Thread Daniel Carrasco

Hello,

Finally I've to remove CephFS and use a simple NFS, because the MDS daemon
starts to use a lot of memory and is unstable. After reboot one node
because it started to swap (the cluster will be able to survive without a
node), the cluster goes down because one of the other MDS starts to use
about 15Gb of RAM and crash all the time, so the cluster is unable to come
back. The only solution is to reboot all nodes and is not good for HA.

If somebody knows something about this, I'll be pleased to test it on a
test environment to see if we can find a solution.

Greetings!

2018-07-19 1:07 GMT+02:00 Daniel Carrasco :

> Thanks again,
>
> I was trying to use fuse client instead Ubuntu 16.04 kernel module to see
> if maybe is a client side problem, but CPU usage on fuse client is very
> high (a 100% and even more in a two cores machine), so I'd to rever to
> kernel client that uses much less CPU.
>
> Is a web server, so maybe the problem is that. PHP and Nginx should open a
> lot of files and maybe that uses a lot of RAM.
>
> For now I've rebooted the machine because is the only way to free the
> memory, but I cannot restart the machine every few hours...
>
> Greetings!!
>
> 2018-07-19 1:00 GMT+02:00 Gregory Farnum :
>
>> Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of
>> the cache! Hopefully one of the current FS users or devs has some idea. All
>> I can suggest is looking to see if there are a bunch of stuck requests or
>> something that are taking up memory which isn’t properly counted.
>>
>> On Wed, Jul 18, 2018 at 3:48 PM Daniel Carrasco 
>> wrote:
>>
>>> Hello, thanks for your response.
>>>
>>> This is what I get:
>>>
>>> # ceph tell mds.kavehome-mgto-pro-fs01  heap stats
>>> 2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388
>>> ms_handle_reset on 10.22.0.168:6800/1129848128
>>> 2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391
>>> ms_handle_reset on 10.22.0.168:6800/1129848128
>>> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
>>> 
>>> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
>>> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
>>> MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
>>> MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
>>> MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
>>> MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
>>> MALLOC:   
>>> MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
>>> MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
>>> MALLOC:   
>>> MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
>>> MALLOC:
>>> MALLOC:  63875  Spans in use
>>> MALLOC: 16  Thread heaps in use
>>> MALLOC:   8192  Tcmalloc page size
>>> 
>>> Call ReleaseFreeMemory() to release freelist memory to the OS (via
>>> madvise()).
>>> Bytes released to the OS take up virtual address space but no physical
>>> memory.
>>>
>>>
>>> I've tried the release command but it keeps using the same memory.
>>>
>>> greetings!
>>>
>>>
>>> 2018-07-19 0:25 GMT+02:00 Gregory Farnum :
>>>
>>>> The MDS think it's using 486MB of cache right now, and while that's
>>>> not a complete accounting (I believe you should generally multiply by
>>>> 1.5 the configured cache limit to get a realistic memory consumption
>>>> model) it's obviously a long way from 12.5GB. You might try going in
>>>> with the "ceph daemon" command and looking at the heap stats (I forget
>>>> the exact command, but it will tell you if you run "help" against it)
>>>> and seeing what those say — you may have one of the slightly-broken
>>>> base systems and find that running the "heap release" (or similar
>>>> wording) command will free up a lot of RAM back to the OS!
>>>> -Greg
>>>>
>>>> On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco 
>>>> wrote:
>>>> > Hello,
>>>> >
>>>> > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2
>>>> MDS
>>>> > actives), and I've noticed that MDS is using a lot of memory (just
>>>> now is
>>&

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-18 Thread Daniel Carrasco

Thanks again,

I was trying to use fuse client instead Ubuntu 16.04 kernel module to see
if maybe is a client side problem, but CPU usage on fuse client is very
high (a 100% and even more in a two cores machine), so I'd to rever to
kernel client that uses much less CPU.

Is a web server, so maybe the problem is that. PHP and Nginx should open a
lot of files and maybe that uses a lot of RAM.

For now I've rebooted the machine because is the only way to free the
memory, but I cannot restart the machine every few hours...

Greetings!!

2018-07-19 1:00 GMT+02:00 Gregory Farnum :

> Wow, yep, apparently the MDS has another 9GB of allocated RAM outside of
> the cache! Hopefully one of the current FS users or devs has some idea. All
> I can suggest is looking to see if there are a bunch of stuck requests or
> something that are taking up memory which isn’t properly counted.
>
> On Wed, Jul 18, 2018 at 3:48 PM Daniel Carrasco 
> wrote:
>
>> Hello, thanks for your response.
>>
>> This is what I get:
>>
>> # ceph tell mds.kavehome-mgto-pro-fs01  heap stats
>> 2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset
>> on 10.22.0.168:6800/1129848128
>> 2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset
>> on 10.22.0.168:6800/1129848128
>> mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:
>> 
>> MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
>> MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
>> MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
>> MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
>> MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
>> MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
>> MALLOC:   
>> MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
>> MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
>> MALLOC:   
>> MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
>> MALLOC:
>> MALLOC:  63875  Spans in use
>> MALLOC: 16  Thread heaps in use
>> MALLOC:   8192  Tcmalloc page size
>> 
>> Call ReleaseFreeMemory() to release freelist memory to the OS (via
>> madvise()).
>> Bytes released to the OS take up virtual address space but no physical
>> memory.
>>
>>
>> I've tried the release command but it keeps using the same memory.
>>
>> greetings!
>>
>>
>> 2018-07-19 0:25 GMT+02:00 Gregory Farnum :
>>
>>> The MDS think it's using 486MB of cache right now, and while that's
>>> not a complete accounting (I believe you should generally multiply by
>>> 1.5 the configured cache limit to get a realistic memory consumption
>>> model) it's obviously a long way from 12.5GB. You might try going in
>>> with the "ceph daemon" command and looking at the heap stats (I forget
>>> the exact command, but it will tell you if you run "help" against it)
>>> and seeing what those say — you may have one of the slightly-broken
>>> base systems and find that running the "heap release" (or similar
>>> wording) command will free up a lot of RAM back to the OS!
>>> -Greg
>>>
>>> On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco 
>>> wrote:
>>> > Hello,
>>> >
>>> > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS
>>> > actives), and I've noticed that MDS is using a lot of memory (just now
>>> is
>>> > using 12.5GB of RAM):
>>> > # ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c
>>> '.mds_co';
>>> > ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss'
>>> > {"items":9272259,"bytes":510032260}
>>> > 12466648
>>> >
>>> > I've configured the limit:
>>> > mds_cache_memory_limit = 536870912
>>> >
>>> > But looks like is ignored, because is about 512Mb and is using a lot
>>> more.
>>> >
>>> > Is there any way to limit the memory usage of MDS, because is giving a
>>> lot
>>> > of troubles because start to swap.
>>> > Maybe I've to limit the cached inodes?
>>> >
>>> > The other active MDS is using a lot less memory (2.5Gb). but also is
>>> using
>>>

Re: [ceph-users] Fwd: MDS memory usage is very high

2018-07-18 Thread Daniel Carrasco

Hello, thanks for your response.

This is what I get:

# ceph tell mds.kavehome-mgto-pro-fs01  heap stats
2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset
on 10.22.0.168:6800/1129848128
2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset
on 10.22.0.168:6800/1129848128
mds.kavehome-mgto-pro-fs01 tcmalloc heap
stats:
MALLOC: 9982980144 ( 9520.5 MiB) Bytes in use by application
MALLOC: +0 (0.0 MiB) Bytes in page heap freelist
MALLOC: +172148208 (  164.2 MiB) Bytes in central cache freelist
MALLOC: + 19031168 (   18.1 MiB) Bytes in transfer cache freelist
MALLOC: + 23987552 (   22.9 MiB) Bytes in thread cache freelists
MALLOC: + 20869280 (   19.9 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
MALLOC:
MALLOC:  63875  Spans in use
MALLOC: 16  Thread heaps in use
MALLOC:   8192  Tcmalloc page size

Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.


I've tried the release command but it keeps using the same memory.

greetings!


2018-07-19 0:25 GMT+02:00 Gregory Farnum :

> The MDS think it's using 486MB of cache right now, and while that's
> not a complete accounting (I believe you should generally multiply by
> 1.5 the configured cache limit to get a realistic memory consumption
> model) it's obviously a long way from 12.5GB. You might try going in
> with the "ceph daemon" command and looking at the heap stats (I forget
> the exact command, but it will tell you if you run "help" against it)
> and seeing what those say — you may have one of the slightly-broken
> base systems and find that running the "heap release" (or similar
> wording) command will free up a lot of RAM back to the OS!
> -Greg
>
> On Wed, Jul 18, 2018 at 1:53 PM, Daniel Carrasco 
> wrote:
> > Hello,
> >
> > I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS
> > actives), and I've noticed that MDS is using a lot of memory (just now is
> > using 12.5GB of RAM):
> > # ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c '.mds_co';
> > ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss'
> > {"items":9272259,"bytes":510032260}
> > 12466648
> >
> > I've configured the limit:
> > mds_cache_memory_limit = 536870912
> >
> > But looks like is ignored, because is about 512Mb and is using a lot
> more.
> >
> > Is there any way to limit the memory usage of MDS, because is giving a
> lot
> > of troubles because start to swap.
> > Maybe I've to limit the cached inodes?
> >
> > The other active MDS is using a lot less memory (2.5Gb). but also is
> using
> > more than 512Mb. The standby MDS is not using memory it all.
> >
> > I'm using the version:
> > ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
> > (stable).
> >
> > Thanks!!
> > --
> > _
> >
> >   Daniel Carrasco Marín
> >   Ingeniería para la Innovación i2TIC, S.L.
> >   Tlf:  +34 911 12 32 84 Ext: 223
> >   www.i2tic.com
> > _
> >
> >
> >
> > --
> > _
> >
> >   Daniel Carrasco Marín
> >   Ingeniería para la Innovación i2TIC, S.L.
> >   Tlf:  +34 911 12 32 84 Ext: 223
> >   www.i2tic.com
> > _
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Fwd: MDS memory usage is very high

2018-07-18 Thread Daniel Carrasco

Hello,

I've created a 3 nodes cluster with MON, MGR, OSD and MDS on all (2 MDS
actives), and I've noticed that MDS is using a lot of memory (just now is
using 12.5GB of RAM):
# ceph daemon mds.kavehome-mgto-pro-fs01 dump_mempools | jq -c '.mds_co';
ceph daemon mds.kavehome-mgto-pro-fs01 perf dump | jq '.mds_mem.rss'
{"items":9272259,"bytes":510032260}
12466648

I've configured the limit:
mds_cache_memory_limit = 536870912

But looks like is ignored, because is about 512Mb and is using a lot more.

Is there any way to limit the memory usage of MDS, because is giving a lot
of troubles because start to swap.
Maybe I've to limit the cached inodes?

The other active MDS is using a lot less memory (2.5Gb). but also is using
more than 512Mb. The standby MDS is not using memory it all.

I'm using the version:
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous
(stable).

Thanks!!
-- 
_____

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_



-- 
_____

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Slow clients after git pull

2018-03-01 Thread Daniel Carrasco

Hello,

Some data is not in git repository and also needs to be updated on all
servers at same time (uploads...), that's why I'm searching for a
centralized solution.

I think I've found a "patch" to do it... All our server are connected to a
manager, so I've created a task in that managet to stop nginx, umount the
FS, remount the FS and then start the Nginx when the git repository is
deployed. Look like it works as expected and with the cache I'm planing to
add to webpage the impact should be minimal.

Thaks and greetings!!.

2018-03-01 16:28 GMT+01:00 David Turner <drakonst...@gmail.com>:

> This removes ceph completely, or any other networked storage, but git has
> triggers. If your website is stopped in git and you just need to make sure
> that nginx always has access to the latest data, just configure git
> triggers to auto-update the repository when there is a commit to the
> repository from elsewhere. This would be on local storage and remove a lot
> of complexity. All front-end servers would update automatically via git.
>
> If something like that doesn't work, it would seem you have a workaround
> that works for you.
>
>
> On Thu, Mar 1, 2018, 10:12 AM Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> Hello,
>>
>> Our problem is that the webpage is on a autoscaling group, so the created
>> machine is not always updated and needs to have the latest data always.
>> I've tried several ways to do it:
>>
>>- Local Storage synced: Sometimes the sync fails and data is not
>>updated
>>- NFS: If NFS server goes down, all clients die
>>- Two NFS Server synced+Monit: when a NFS server is down umount
>>freezes and is not able to change to the other NFS server
>>- GlusterFS: Too slow for webpages
>>
>> CephFS is near to NFS on speed and have auto recovery if one node goes
>> down (clients connects to other MDS automatically).
>>
>> About to use RBD, my problem is that I need a FS, because Nginx is not
>> able to read directly from Ceph in other ways.
>> About S3 and similar, I've also tried AWS NFS method but is much slower
>> (even more than GlusterFS).
>>
>> My problem is that CephFS fits what I need.
>>
>> Doing tests I've noticed that maybe the file is updated on ceph node
>> while client has file sessions open, so until I remount the FS that
>> sessions continue opened. When I open the files with vim I notice that is a
>> bit slower while is updating the repository, but after the update it works
>> as fast as before.
>>
>> It fails even on Jewel so I think that maybe the only way to do it is to
>> create a task to remount the FS when I deploy.
>>
>> Greetings and thanks!!
>>
>>
>> 2018-03-01 15:29 GMT+01:00 David Turner <drakonst...@gmail.com>:
>>
>>> Using CephFS for something like this is about the last thing I would
>>> do.  Does it need to be on a networked posix filesystem that can be mounted
>>> on multiple machines at the same time?  If so, then you're kinda stuck and
>>> we can start looking at your MDS hardware and see if there are any MDS
>>> settings that need to be configured differently for this to work.
>>>
>>> If you don't NEED CephFS, then I would recommend utilizing an RBD for
>>> something like this.  Its limitation is only being able to be mapped to 1
>>> server at a time, but that's decent enough for most failover scenarios for
>>> build setups.  If you need to failover, unmap it from the primary and map
>>> it to another server to resume workloads.
>>>
>>> Hosting websites out of CephFS also seems counter-intuitive.  Have you
>>> looked at S3 websites?  RGW supports configuring websites out of a bucket
>>> that might be of interest.  Your RGW daemon configuration could easily
>>> become an HA website with an LB in front of them.
>>>
>>> I'm biased here a bit, but I don't like to use networked filesystems
>>> unless nothing else can be worked out or the software using it is 3rd party
>>> and just doesn't support anything else.
>>>
>>> On Thu, Mar 1, 2018 at 9:05 AM Daniel Carrasco <d.carra...@i2tic.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I've tried to change a lot of things on configuration and use ceph-fuse
>>>> but nothing makes it work better... When I deploy the git repository it
>>>> becomes much slower until I remount the FS (just executing systemctl stop
>>>> nginx && umount /mnt/ceph && mount -a && systemctl start nginx). It happen
>>>>

Re: [ceph-users] Slow clients after git pull

2018-03-01 Thread Daniel Carrasco

Hello,

Our problem is that the webpage is on a autoscaling group, so the created
machine is not always updated and needs to have the latest data always.
I've tried several ways to do it:

   - Local Storage synced: Sometimes the sync fails and data is not updated
   - NFS: If NFS server goes down, all clients die
   - Two NFS Server synced+Monit: when a NFS server is down umount freezes
   and is not able to change to the other NFS server
   - GlusterFS: Too slow for webpages

CephFS is near to NFS on speed and have auto recovery if one node goes down
(clients connects to other MDS automatically).

About to use RBD, my problem is that I need a FS, because Nginx is not able
to read directly from Ceph in other ways.
About S3 and similar, I've also tried AWS NFS method but is much slower
(even more than GlusterFS).

My problem is that CephFS fits what I need.

Doing tests I've noticed that maybe the file is updated on ceph node while
client has file sessions open, so until I remount the FS that sessions
continue opened. When I open the files with vim I notice that is a bit
slower while is updating the repository, but after the update it works as
fast as before.

It fails even on Jewel so I think that maybe the only way to do it is to
create a task to remount the FS when I deploy.

Greetings and thanks!!


2018-03-01 15:29 GMT+01:00 David Turner <drakonst...@gmail.com>:

> Using CephFS for something like this is about the last thing I would do.
> Does it need to be on a networked posix filesystem that can be mounted on
> multiple machines at the same time?  If so, then you're kinda stuck and we
> can start looking at your MDS hardware and see if there are any MDS
> settings that need to be configured differently for this to work.
>
> If you don't NEED CephFS, then I would recommend utilizing an RBD for
> something like this.  Its limitation is only being able to be mapped to 1
> server at a time, but that's decent enough for most failover scenarios for
> build setups.  If you need to failover, unmap it from the primary and map
> it to another server to resume workloads.
>
> Hosting websites out of CephFS also seems counter-intuitive.  Have you
> looked at S3 websites?  RGW supports configuring websites out of a bucket
> that might be of interest.  Your RGW daemon configuration could easily
> become an HA website with an LB in front of them.
>
> I'm biased here a bit, but I don't like to use networked filesystems
> unless nothing else can be worked out or the software using it is 3rd party
> and just doesn't support anything else.
>
> On Thu, Mar 1, 2018 at 9:05 AM Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> Hello,
>>
>> I've tried to change a lot of things on configuration and use ceph-fuse
>> but nothing makes it work better... When I deploy the git repository it
>> becomes much slower until I remount the FS (just executing systemctl stop
>> nginx && umount /mnt/ceph && mount -a && systemctl start nginx). It happen
>> when the FS gets a lot of IO because when I execute Rsync I got the same
>> problem.
>>
>> I'm thinking about to downgrade to a lower version of ceph like for
>> example jewel to see if works better. I know that will be deprecated soon,
>> but I don't know what other tests I can do...
>>
>> Greetings!!
>>
>> 2018-02-28 17:11 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>:
>>
>>> Hello,
>>>
>>> I've created a Ceph cluster with 3 nodes and a FS to serve a webpage.
>>> The webpage speed is good enough (near to NFS speed), and have HA if one FS
>>> die.
>>> My problem comes when I deploy a git repository on that FS. The server
>>> makes a lot of IOPS to check the files that have to update and then all
>>> clients starts to have problems to use the FS (it becomes much slower).
>>> In a normal usage the web takes about 400ms to load, and when the
>>> problem start it takes more than 3s. To fix the problem I just have to
>>> remount the FS on clients, but I can't remount the FS on every deploy...
>>>
>>> While is deploying I see how the CPU on MDS is a bit higher, but when it
>>> ends the CPU usage goes down again, so look like is not a problem of CPU.
>>>
>>> My config file is:
>>> [global]
>>> fsid = bf56854..e611c08
>>> mon_initial_members = fs-01, fs-02, fs-03
>>> mon_host = 10.50.0.94,10.50.1.216,10.50.2.52
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>>
>>> public network = 10.50.0.0/22
>>> osd pool default size = 3
>>>
>>> ##
>>> ### OSD

Re: [ceph-users] Slow clients after git pull

2018-03-01 Thread Daniel Carrasco

Hello,

I've tried to change a lot of things on configuration and use ceph-fuse but
nothing makes it work better... When I deploy the git repository it becomes
much slower until I remount the FS (just executing systemctl stop nginx &&
umount /mnt/ceph && mount -a && systemctl start nginx). It happen when the
FS gets a lot of IO because when I execute Rsync I got the same problem.

I'm thinking about to downgrade to a lower version of ceph like for example
jewel to see if works better. I know that will be deprecated soon, but I
don't know what other tests I can do...

Greetings!!

2018-02-28 17:11 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Hello,
>
> I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. The
> webpage speed is good enough (near to NFS speed), and have HA if one FS die.
> My problem comes when I deploy a git repository on that FS. The server
> makes a lot of IOPS to check the files that have to update and then all
> clients starts to have problems to use the FS (it becomes much slower).
> In a normal usage the web takes about 400ms to load, and when the problem
> start it takes more than 3s. To fix the problem I just have to remount the
> FS on clients, but I can't remount the FS on every deploy...
>
> While is deploying I see how the CPU on MDS is a bit higher, but when it
> ends the CPU usage goes down again, so look like is not a problem of CPU.
>
> My config file is:
> [global]
> fsid = bf56854..e611c08
> mon_initial_members = fs-01, fs-02, fs-03
> mon_host = 10.50.0.94,10.50.1.216,10.50.2.52
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> public network = 10.50.0.0/22
> osd pool default size = 3
>
> ##
> ### OSD
> ##
> [osd]
>   osd_mon_heartbeat_interval = 5
>   osd_mon_report_interval_max = 10
>   osd_heartbeat_grace = 15
>   osd_fast_fail_on_connection_refused = True
>   osd_pool_default_pg_num = 128
>   osd_pool_default_pgp_num = 128
>   osd_pool_default_size = 2
>   osd_pool_default_min_size = 2
>
> ##
> ### Monitores
> ##
> [mon]
>   mon_osd_min_down_reporters = 1
>
> ##
> ### MDS
> ##
> [mds]
>   mds_cache_memory_limit = 792723456
>   mds_bal_mode = 1
>
> ##
> ### Client
> ##
> [client]
>   client_cache_size = 32768
>   client_mount_timeout = 30
>   client_oc_max_objects = 2000
>   client_oc_size = 629145600
>   client_permissions = false
>   rbd_cache = true
>   rbd_cache_size = 671088640
>
> My cluster and clients uses Debian 9 with latest ceph version (12.2.4).
> The clients uses kernel modules to mount the share, because are a bit
> faster than fuse modules. The deploy is done on one of the Ceph nodes, that
> have the FS mounted by kernel module too.
> My cluster is not a high usage cluster, so have all daemons on one machine
> (3 machines with OSD, MON, MGR and MDS). All OSD has a copy of the data,
> only one MGR is active and two of the MDS are active with one on standby.
> The clients mount the FS using the three MDS IP addresses and just now
> don't have any request because is not published.
>
> Someone knows what can be happening?, because all works fine (even on
> other cluster I did with an high load), but just deploy the git repository
> and all start to work very slow.
>
> Thanks!!
>
>
> --
> _____
>
>   Daniel Carrasco Marín
>   Ingeniería para la Innovación i2TIC, S.L.
>   Tlf:  +34 911 12 32 84 Ext: 223
>   www.i2tic.com
> _
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Slow clients after git pull

2018-02-28 Thread Daniel Carrasco

Hello,

I've created a Ceph cluster with 3 nodes and a FS to serve a webpage. The
webpage speed is good enough (near to NFS speed), and have HA if one FS die.
My problem comes when I deploy a git repository on that FS. The server
makes a lot of IOPS to check the files that have to update and then all
clients starts to have problems to use the FS (it becomes much slower).
In a normal usage the web takes about 400ms to load, and when the problem
start it takes more than 3s. To fix the problem I just have to remount the
FS on clients, but I can't remount the FS on every deploy...

While is deploying I see how the CPU on MDS is a bit higher, but when it
ends the CPU usage goes down again, so look like is not a problem of CPU.

My config file is:
[global]
fsid = bf56854..e611c08
mon_initial_members = fs-01, fs-02, fs-03
mon_host = 10.50.0.94,10.50.1.216,10.50.2.52
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public network = 10.50.0.0/22
osd pool default size = 3

##
### OSD
##
[osd]
  osd_mon_heartbeat_interval = 5
  osd_mon_report_interval_max = 10
  osd_heartbeat_grace = 15
  osd_fast_fail_on_connection_refused = True
  osd_pool_default_pg_num = 128
  osd_pool_default_pgp_num = 128
  osd_pool_default_size = 2
  osd_pool_default_min_size = 2

##
### Monitores
##
[mon]
  mon_osd_min_down_reporters = 1

##
### MDS
##
[mds]
  mds_cache_memory_limit = 792723456
  mds_bal_mode = 1

##
### Client
##
[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 629145600
  client_permissions = false
  rbd_cache = true
  rbd_cache_size = 671088640

My cluster and clients uses Debian 9 with latest ceph version (12.2.4). The
clients uses kernel modules to mount the share, because are a bit faster
than fuse modules. The deploy is done on one of the Ceph nodes, that have
the FS mounted by kernel module too.
My cluster is not a high usage cluster, so have all daemons on one machine
(3 machines with OSD, MON, MGR and MDS). All OSD has a copy of the data,
only one MGR is active and two of the MDS are active with one on standby.
The clients mount the FS using the three MDS IP addresses and just now
don't have any request because is not published.

Someone knows what can be happening?, because all works fine (even on other
cluster I did with an high load), but just deploy the git repository and
all start to work very slow.

Thanks!!


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-23 Thread Daniel Carrasco

Finally, I've changed the configuration to the following:

##
### MDS
##
[mds]
  mds_cache_memory_limit = 792723456
  mds_bal_mode = 1

##
### Client
##
[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_obijects = 2
  client_oc_size = 1048576000
  client_permissions = false
  client_quota = false
  rbd_cache = true
  rbd_cache_size = 671088640

I've disabled client_permissions and client_quota because the cluster is
only for the webpage and the network is isolated, so it don't need to check
the permissions every time, and I've disabled the quota check because there
is no quota on this cluster.
This will lower the petitions to MDS and CPU usage, right?

Greetings!!

2018-02-22 19:34 GMT+01:00 Patrick Donnelly <pdonn...@redhat.com>:

> On Wed, Feb 21, 2018 at 11:17 PM, Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
> > I want to search also if there is any way to cache file metadata on
> client,
> > to lower the MDS load. I suppose that files are cached but the client
> check
> > with MDS if there are changes on files. On my server files are the most
> of
> > time read-only so MDS data can be also cached for a while.
>
> The MDS issues capabilities that allow clients to coherently cache
> metadata.
>
> --
> Patrick Donnelly
>



-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco

Thanks, I'll check it.

I want to search also if there is any way to cache file metadata on client,
to lower the MDS load. I suppose that files are cached but the client check
with MDS if there are changes on files. On my server files are the most of
time read-only so MDS data can be also cached for a while.

Greetings!!

El 22 feb. 2018 3:59, "Patrick Donnelly" <pdonn...@redhat.com> escribió:

> Hello Daniel,
>
> On Wed, Feb 21, 2018 at 10:26 AM, Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
> > Is possible to make a better distribution on the MDS load of both nodes?.
>
> We are aware of bugs with the balancer which are being worked on. You
> can also manually create a partition if the workload can benefit:
>
> https://ceph.com/community/new-luminous-cephfs-subtree-pinning/
>
> > Is posible to set all nodes as Active without problems?
>
> No. I recommend you read the docs carefully:
>
> http://docs.ceph.com/docs/master/cephfs/multimds/
>
> > My last question is if someone can recomend me a good client
> configuration
> > like cache size, and maybe something to lower the metadata servers load.
>
> >>
> >> ##
> >> [mds]
> >>  mds_cache_size = 25
> >>  mds_cache_memory_limit = 792723456
>
> You should only specify one of those. See also:
>
> http://docs.ceph.com/docs/master/cephfs/cache-size-limits/
>
> --
> Patrick Donnelly
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco

2018-02-21 19:26 GMT+01:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Hello,
>
> I've created a Ceph cluster with 3 nodes to serve files to an high traffic
> webpage. I've configured two MDS as active and one as standby, but after
> add the new system to production I've noticed that MDS are not balanced and
> one server get the most of clients petitions (One MDS about 700 or less vs
> 4.000 or more the other).
>
> Is possible to make a better distribution on the MDS load of both nodes?.
> Is posible to set all nodes as Active without problems?
>
> I know that is possible to set max_mds to 3 and all will be active, but I
> want to know what happen if one node goes down for example, or if there are
> another side effects.
>
>
> My last question is if someone can recomend me a good client configuration
> like cache size, and maybe something to lower the metadata servers load.
>
>
> Thanks!!
>

I forgot to say my configuration xD.

I've a three nodes cluster with AIO:

   - 3 Monitors
   - 3 OSD
   - 3 MDS (2 actives and one standby)
   - 3 MGR (1 active)

The data has 3 copies, so is in every node.

My configuration file is:
[global]
fsid = BlahBlahBlah
mon_initial_members = fs-01, fs-02, fs-03
mon_host = 192.168.4.199,192.168.4.200,192.168.4.201
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public network = 192.168.4.0/24
osd pool default size = 3


##
### OSD
##
[osd]
  osd_pool_default_pg_num = 128
  osd_pool_default_pgp_num = 128
  osd_pool_default_size = 3
  osd_pool_default_min_size = 2

  osd_mon_heartbeat_interval = 5
  osd_mon_report_interval_max = 10
  osd_heartbeat_grace = 15
  osd_fast_fail_on_connection_refused = True


##
### MON
##
[mon]
  mon_osd_min_down_reporters = 2

##
### MDS
##
[mds]
  mds_cache_size = 25
  mds_cache_memory_limit = 792723456

##
### Client
##
[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 629145600
  rbd_cache = true
  rbd_cache_size = 671088640


Thanks!!!

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Balanced MDS, all as active and recomended client settings.

2018-02-21 Thread Daniel Carrasco

Hello,

I've created a Ceph cluster with 3 nodes to serve files to an high traffic
webpage. I've configured two MDS as active and one as standby, but after
add the new system to production I've noticed that MDS are not balanced and
one server get the most of clients petitions (One MDS about 700 or less vs
4.000 or more the other).

Is possible to make a better distribution on the MDS load of both nodes?.
Is posible to set all nodes as Active without problems?

I know that is possible to set max_mds to 3 and all will be active, but I
want to know what happen if one node goes down for example, or if there are
another side effects.


My last question is if someone can recomend me a good client configuration
like cache size, and maybe something to lower the metadata servers load.


Thanks!!
-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-18 Thread Daniel Carrasco

Finally i've disabled the mon_osd_report_timeout option and seems to works
fine.

Greetings!.

2017-10-17 19:02 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Thanks!!
>
> I'll take a look later.
>
> Anyway, all my Ceph daemons are in same version on all nodes (I've
> upgraded the whole cluster).
>
> Cheers!!
>
>
> El 17 oct. 2017 6:39 p. m., "Marc Roos" <m.r...@f1-outsourcing.eu>
> escribió:
>
> Did you check this?
>
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html
>
>
>
>
>
>
>
>
> -Original Message-
> From: Daniel Carrasco [mailto:d.carra...@i2tic.com]
> Sent: dinsdag 17 oktober 2017 17:49
> To: ceph-us...@ceph.com
> Subject: [ceph-users] OSD are marked as down after jewel -> luminous
> upgrade
>
> Hello,
>
> Today I've decided to upgrade my Ceph cluster to latest LTS version. To
> do it I've used the steps posted on release notes:
> http://ceph.com/releases/v12-2-0-luminous-released/
>
> After upgrade all the daemons I've noticed that all OSD daemons are
> marked as down even when all are working, so the cluster becomes down.
>
> Maybe the problem is the command "ceph osd require-osd-release
> luminous", but all OSD are on Luminous version.
>
> 
> -
>
> 
> -
>
> # ceph versions
> {
> "mon": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 3
> },
> "mgr": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 3
> },
> "osd": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 2
> },
> "mds": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 2
> },
> "overall": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 10
> }
> }
>
> 
> -
>
> 
> -
>
> # ceph osd versions
> {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 2 }
>
> # ceph osd tree
>
> ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
> -1   0.08780 root default
> -2   0.04390 host alantra_fs-01
>  0   ssd 0.04390 osd.0  up  1.0 1.0
> -3   0.04390 host alantra_fs-02
>  1   ssd 0.04390 osd.1  up  1.0 1.0
> -4 0  host alantra_fs-03
>
> 
> -
>
> 
> -
>
> # ceph -s
>   cluster:
> id: 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e
> health: HEALTH_WARN
> nodown flag(s) set
>
>   services:
> mon: 3 daemons, quorum alantra_fs-02,alantra_fs-01,alantra_fs-03
> mgr: alantra_fs-03(active), standbys: alantra_fs-01, alantra_fs-02
> mds: cephfs-1/1/1 up  {0=alantra_fs-01=up:active}, 1 up:standby
> osd: 2 osds: 2 up, 2 in
>  flags nodown
>
>   data:
> pools:   3 pools, 192 pgs
> objects: 40177 objects, 3510 MB
> usage:   7486 MB used, 84626 MB / 92112 MB avail
> pgs: 192 active+clean
>
>   io:
> client:   564 kB/s rd, 767 B/s wr, 33 op/s rd, 0 op/s wr
>
> 
> -
>
> 
> -
> Log:
> 2017-10-17 16:15:25.466807 mon.alantra_fs-02 [INF] osd.0 marked down
> after no beacon for 29.864632 seconds
> 2017-10-17 16:15:25.467557 mon.alantra_fs-02 [WRN] Health check failed:
> 1 osds down (OSD_DOWN)
> 2017-10-17 16:15:25.467587 mon.alantra_fs-02 [WRN] Health check failed:
> 1 host (1 osds) down (OSD_HOST_DOWN)
> 2017-10-17 16:15:27.494526 mon.alantra_fs-02 [WRN] Health check failed:
> Degraded data redundancy: 63 pgs unclean (PG_DEGRADED)
> 2017-10-17 16:15:27.5

Re: [ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-17 Thread Daniel Carrasco

Thanks!!

I'll take a look later.

Anyway, all my Ceph daemons are in same version on all nodes (I've upgraded
the whole cluster).

Cheers!!

El 17 oct. 2017 6:39 p. m., "Marc Roos" <m.r...@f1-outsourcing.eu> escribió:

Did you check this?

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg39886.html








-Original Message-----
From: Daniel Carrasco [mailto:d.carra...@i2tic.com]
Sent: dinsdag 17 oktober 2017 17:49
To: ceph-us...@ceph.com
Subject: [ceph-users] OSD are marked as down after jewel -> luminous
upgrade

Hello,

Today I've decided to upgrade my Ceph cluster to latest LTS version. To
do it I've used the steps posted on release notes:
http://ceph.com/releases/v12-2-0-luminous-released/

After upgrade all the daemons I've noticed that all OSD daemons are
marked as down even when all are working, so the cluster becomes down.

Maybe the problem is the command "ceph osd require-osd-release
luminous", but all OSD are on Luminous version.


-


-

# ceph versions
{
"mon": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 3
},
"mgr": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 3
},
"osd": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 2
},
"mds": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 2
},
"overall": {
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 10
}
}


-


-

# ceph osd versions
{
"ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
luminous (stable)": 2 }

# ceph osd tree

ID CLASS WEIGHT  TYPE NAME  STATUS REWEIGHT PRI-AFF
-1   0.08780 root default
-2   0.04390 host alantra_fs-01
 0   ssd 0.04390 osd.0  up  1.0 1.0
-3   0.04390 host alantra_fs-02
 1   ssd 0.04390 osd.1  up  1.0 1.0
-4 0  host alantra_fs-03


-


-

# ceph -s
  cluster:
id: 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e
health: HEALTH_WARN
nodown flag(s) set

  services:
mon: 3 daemons, quorum alantra_fs-02,alantra_fs-01,alantra_fs-03
mgr: alantra_fs-03(active), standbys: alantra_fs-01, alantra_fs-02
mds: cephfs-1/1/1 up  {0=alantra_fs-01=up:active}, 1 up:standby
osd: 2 osds: 2 up, 2 in
 flags nodown

  data:
pools:   3 pools, 192 pgs
objects: 40177 objects, 3510 MB
usage:   7486 MB used, 84626 MB / 92112 MB avail
pgs: 192 active+clean

  io:
client:   564 kB/s rd, 767 B/s wr, 33 op/s rd, 0 op/s wr


-


-
Log:
2017-10-17 16:15:25.466807 mon.alantra_fs-02 [INF] osd.0 marked down
after no beacon for 29.864632 seconds
2017-10-17 16:15:25.467557 mon.alantra_fs-02 [WRN] Health check failed:
1 osds down (OSD_DOWN)
2017-10-17 16:15:25.467587 mon.alantra_fs-02 [WRN] Health check failed:
1 host (1 osds) down (OSD_HOST_DOWN)
2017-10-17 16:15:27.494526 mon.alantra_fs-02 [WRN] Health check failed:
Degraded data redundancy: 63 pgs unclean (PG_DEGRADED)
2017-10-17 16:15:27.501956 mon.alantra_fs-02 [INF] Health check cleared:
OSD_DOWN (was: 1 osds down)
2017-10-17 16:15:27.501997 mon.alantra_fs-02 [INF] Health check cleared:
OSD_HOST_DOWN (was: 1 host (1 osds) down)
2017-10-17 16:15:27.502012 mon.alantra_fs-02 [INF] Cluster is now
healthy
2017-10-17 16:15:27.518798 mon.alantra_fs-02 [INF] osd.0
10.20.1.109:6801/3319 boot
2017-10-17 16:15:26.414023 osd.0 [WRN] Monitor daemon marked osd.0 down,
but it is still running
2017-10-17 16:15:30.470477 mon.alantra_fs-02 [INF] osd.1 marked down
after no beacon for 25.007336 seconds
2017-10-17 16:15:30.471014 mon.alantra_fs-02 [WRN] Health check failed:
1 osds down (OSD_DOWN)
2017-10-17 16:15:30.471047 mon.alantra_fs-02 [WRN] Health check failed:
1 host (1 osds) down (OSD_HOST_DOWN)
2017-10-17 16:15:30.532427 mon.alantra_fs-02 [WRN] overall HEALTH

[ceph-users] OSD are marked as down after jewel -> luminous upgrade

2017-10-17 Thread Daniel Carrasco

-02 [INF] mon.2 10.20.1.216:6789/0
2017-10-17 16:17:16.885662 mon.alantra_fs-02 [WRN] Health check update:
Degraded data redundancy: 40177/80354 objects degraded (50.000%), 96 pgs
unclean, 192 pgs degraded (PG_DEGRADED)
2017-10-17 16:17:25.528348 mon.alantra_fs-02 [INF] osd.0 marked down after
no beacon for 25.004060 seconds
2017-10-17 16:17:25.528960 mon.alantra_fs-02 [WRN] Health check update: 2
osds down (OSD_DOWN)
2017-10-17 16:17:25.528991 mon.alantra_fs-02 [WRN] Health check update: 3
hosts (2 osds) down (OSD_HOST_DOWN)
2017-10-17 16:17:25.529011 mon.alantra_fs-02 [WRN] Health check failed: 1
root (2 osds) down (OSD_ROOT_DOWN)
2017-10-17 16:17:26.544228 mon.alantra_fs-02 [INF] Health check cleared:
OSD_ROOT_DOWN (was: 1 root (2 osds) down)
2017-10-17 16:17:26.568819 mon.alantra_fs-02 [INF] osd.0
10.20.1.109:6801/3319 boot
2017-10-17 16:17:25.557037 osd.0 [WRN] Monitor daemon marked osd.0 down,
but it is still running
2017-10-17 16:17:30.532840 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 1
osds down; 1 host (1 osds) down; Degraded data redundancy: 40177/80354
objects degraded (50.000%), 96 pgs unclean, 192 pgs degraded
2017-10-17 16:17:30.538294 mon.alantra_fs-02 [WRN] Health check update: 1
osds down (OSD_DOWN)
2017-10-17 16:17:30.538333 mon.alantra_fs-02 [WRN] Health check update: 1
host (1 osds) down (OSD_HOST_DOWN)
2017-10-17 16:17:31.602434 mon.alantra_fs-02 [WRN] Health check update:
Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs
unclean, 192 pgs degraded (PG_DEGRADED)
2017-10-17 16:17:55.540005 mon.alantra_fs-02 [INF] osd.0 marked down after
no beacon for 25.001599 seconds
2017-10-17 16:17:55.540538 mon.alantra_fs-02 [WRN] Health check update: 2
osds down (OSD_DOWN)
2017-10-17 16:17:55.540562 mon.alantra_fs-02 [WRN] Health check update: 3
hosts (2 osds) down (OSD_HOST_DOWN)
2017-10-17 16:17:55.540585 mon.alantra_fs-02 [WRN] Health check failed: 1
root (2 osds) down (OSD_ROOT_DOWN)
2017-10-17 16:18:28.916734 mon.alantra_fs-02 [WRN] Health check update:
Degraded data redundancy: 40177/80354 objects degraded (50.000%), 192 pgs
unclean, 192 pgs degraded, 192 pgs undersized (PG_DEGRADED)
2017-10-17 16:18:30.533096 mon.alantra_fs-02 [WRN] overall HEALTH_WARN 2
osds down; 3 hosts (2 osds) down; 1 root (2 osds) down; Degraded data
redundancy: 40177/80354 objects degraded (50.000%), 192 pgs unclean, 192
pgs degraded, 192 pgs undersized
2017-10-17 16:18:56.929295 mon.alantra_fs-02 [WRN] Health check failed:
Reduced data availability: 192 pgs stale (PG_AVAILABILITY)


-
-
ceph.conf

[global]
  fsid = 5f8e66b5-1adc-4930-b5d8-c0f44dc2037e
  mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
  mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
  public_network = 10.20.1.0/24
  auth_cluster_required = cephx
  auth_service_required = cephx
  auth_client_required = cephx


##
### OSD
##
[osd]
  osd_mon_heartbeat_interval = 5
  osd_mon_report_interval_max = 10
  osd_heartbeat_grace = 10
  osd_fast_fail_on_connection_refused = True
  osd_pool_default_pg_num = 128
  osd_pool_default_pgp_num = 128
  osd_pool_default_size = 2
  osd_pool_default_min_size = 2

##
### Monitores
##
[mon]
  mon_allow_pool_delete = false
  mon_osd_report_timeout = 25
  mon_osd_min_down_reporters = 1

[mon.alantra_fs-01]
  host = alantra_fs-01
  mon_addr = 10.20.1.109:6789

[mon.alantra_fs-02]
  host = alantra_fs-02
  mon_addr = 10.20.1.97:6789

[mon.alantra_fs-03]
  host = alantra_fs-03
  mon_addr = 10.20.1.216:6789


##
### MDS
##
[mds]
  mds_cache_size = 25


##
### Client
##
[client]
  client_cache_size = 32768
  client_mount_timeout = 30
  client_oc_max_objects = 2000
  client_oc_size = 629145600
  rbd_cache = true
  rbd_cache_size = 671088640
-
-


For now I've added the nodown flag to keep all OSD online, and all is
working fine, but this is not the best way to do it.

Someone knows how to fix this problem?. Maybe this release needs to open
new ports on firewall?

Thanks!!

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Connections between services secure?

2017-06-30 Thread Daniel Carrasco

Mainly fuse clients with the other (MDS, ODS and MON will be on a private
network), and maybe one day I'll try to create a multi-site cluster.

Greetings!!

El 30 jun. 2017 8:33 p. m., "David Turner" <drakonst...@gmail.com> escribió:

Which part of ceph are you looking at using through the Internet?  RGW,
multi-site, multi-datacenter crush maps, etc?

On Fri, Jun 30, 2017 at 2:28 PM Daniel Carrasco <d.carra...@i2tic.com>
wrote:

> Hello,
>
> My question is about steam security of connections between ceph services.
> I've read that connection is verified by private keys and signed packets,
> but my question is if that packets are ciphered in any way to avoid packets
> sniffers, because I want to know if can be used through internet without
> problem or I need an VPN.
>
> Thanks!!
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Connections between services secure?

2017-06-30 Thread Daniel Carrasco

Hello,

My question is about steam security of connections between ceph services.
I've read that connection is verified by private keys and signed packets,
but my question is if that packets are ciphered in any way to avoid packets
sniffers, because I want to know if can be used through internet without
problem or I need an VPN.

Thanks!!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-27 Thread Daniel Carrasco

Hello,

I just write to say that after more than a week the server still working
without problem and the OSD are not marked as down erroneously. On my tests
the webpage stop working for less than a minute when i stop an OSD, so the
failover is working fine.

Greetings and thanks for all your help!!

2017-06-15 19:04 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Hello, thanks for the info.
>
> I'll give a try tomorrow. On one of my test I got the messages that yo say
> (wrongfully marked), but i've lowered other options and now is fine. For
> now the OSD are not reporting down messages even with an high load test,
> but I'll see the logs tomorrow to confirm.
>
> The most of time the server is used as RO and the load is not high, so if
> an OSD is marked as down for some seconds is not a big problem (at least I
> think that recovery traffic is low because it only has to check that pgs
> are in both OSD).
>
> Greetings and thanks again!
>
> 2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>:
>
>> osd_heartbeat_grace is a setting for how many seconds since the last time
>> an osd received a successful response from another osd before telling the
>> mons that it's down.  This is one you may want to lower from its default
>> value of 20 seconds.
>>
>> mon_osd_min_down_reporters is a setting for how many osds need to report
>> an osd as down before the mons will mark it as down.  I recommend setting
>> this to N+1 where N is how many osds you have in a node or failure domain.
>> If you end up with a network problem and you have 1 osd node that can talk
>> to the mons, but not the other osd nodes, then you will end up with that
>> one node marking the entire cluster down while the rest of the cluster
>> marks that node down. If your min_down_reporters is N+1, then 1 node cannot
>> mark down the rest of the cluster.  The default setting is 1 so that small
>> test clusters can mark down osds, but if you have 3+ nodes, you should set
>> it to N+1 if you can.  Setting it to more than 2 nodes is equally as
>> problematic.  However, if you just want things to report as fast as
>> possible, leaving this at 1 still might be optimal to getting it marked
>> down sooner.
>>
>> The downside to lowering these settings is if OSDs are getting marked
>> down for running slower, then they will re-assert themselves to the mons
>> and end up causing backfilling and peering for no really good reason.
>> You'll want to monitor your cluster for OSDs being marked down for a few
>> seconds before marking themselves back up.  You can see this in the OSD
>> logs where the OSD says it was wrongfully marked down in one line and then
>> the next is where it tells the mons it is actually up.
>>
>> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com>
>> wrote:
>>
>>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD
>>> daemons has started to use only a 5% (about 200MB). Is like magic, and now
>>> I've about 3.2Gb of free RAM.
>>>
>>> Greetings!!
>>>
>>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:
>>>
>>>> Finally, the problem was W3Total Cache, that seems to be unable to
>>>> manage HA and when the master redis host is down, it stop working without
>>>> try the slave.
>>>>
>>>> I've added some options to make it faster to detect a down OSD and the
>>>> page is online again in about 40s.
>>>>
>>>> [global]
>>>> fsid = Hidden
>>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
>>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
>>>> auth_cluster_required = cephx
>>>> auth_service_required = cephx
>>>> auth_client_required = cephx
>>>> osd mon heartbeat interval = 5
>>>> osd mon report interval max = 10
>>>> mon osd report timeout = 15
>>>> osd fast fail on connection refused = True
>>>>
>>>> public network = 10.20.1.0/24
>>>> osd pool default size = 2
>>>>
>>>>
>>>> Greetings and thanks for all your help.
>>>>
>>>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>>>
>>>>> I've used the kernel client and the ceph-fuse driver for mapping the
>>>>> cephfs volume.  I didn't notice any network hiccups while failing over, 
>>>>> but
>>>>> I was reading large files during my tests (and live) and some caching may
>>>>> h

Re: [ceph-users] What is "up:standby"? in ceph mds stat => e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby

2017-06-16 Thread Daniel Carrasco

On MDS nodes, by default only the first you add is active: The others joins
the cluster as standby MDS daemons. When the active fails, then an standby
MDS becomes active and continues with the work.

You can change this behaviour setting the max active mds option, but it
still on testing and is not recommended for production enviorments.

Greetings!!

2017-06-16 12:40 GMT+02:00 Stéphane Klein <cont...@stephane-klein.info>:

> Hi,
>
> I have installed mdss role with Ansible.
>
> Now, I have this:
>
> root@ceph-test-1:/home/vagrant# ceph fs ls
> name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]
> root@ceph-test-1:/home/vagrant# ceph mds stat
> e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby
> root@ceph-test-1:/home/vagrant# ceph status
> cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
>  health HEALTH_OK
>  monmap e1: 3 mons at {ceph-test-1=172.28.128.3:
> 6789/0,ceph-test-2=172.28.128.4:6789/0,ceph-test-3=172.28.128.5:6789/0}
> election epoch 10, quorum 0,1,2 ceph-test-1,ceph-test-2,ceph-
> test-3
>   fsmap e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby
>  osdmap e14: 3 osds: 3 up, 3 in
> flags sortbitwise,require_jewel_osds
>   pgmap v36: 164 pgs, 3 pools, 2068 bytes data, 20 objects
> 102 MB used, 10652 MB / 10754 MB avail
>  164 active+clean
>
> What is up:standby"? in
>
> # ceph mds stat
> e5: 1/1/1 up {0=ceph-test-3=up:active}, 2 up:standby
>
> Best regards,
> Stéphane
> --
> Stéphane Klein <cont...@stephane-klein.info>
> blog: http://stephane-klein.info
> cv : http://cv.stephane-klein.info
> Twitter: http://twitter.com/klein_stephane
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco

Hello, thanks for the info.

I'll give a try tomorrow. On one of my test I got the messages that yo
say (wrongfully
marked), but i've lowered other options and now is fine. For now the OSD
are not reporting down messages even with an high load test, but I'll see
the logs tomorrow to confirm.

The most of time the server is used as RO and the load is not high, so if
an OSD is marked as down for some seconds is not a big problem (at least I
think that recovery traffic is low because it only has to check that pgs
are in both OSD).

Greetings and thanks again!

2017-06-15 18:13 GMT+02:00 David Turner <drakonst...@gmail.com>:

> osd_heartbeat_grace is a setting for how many seconds since the last time
> an osd received a successful response from another osd before telling the
> mons that it's down.  This is one you may want to lower from its default
> value of 20 seconds.
>
> mon_osd_min_down_reporters is a setting for how many osds need to report
> an osd as down before the mons will mark it as down.  I recommend setting
> this to N+1 where N is how many osds you have in a node or failure domain.
> If you end up with a network problem and you have 1 osd node that can talk
> to the mons, but not the other osd nodes, then you will end up with that
> one node marking the entire cluster down while the rest of the cluster
> marks that node down. If your min_down_reporters is N+1, then 1 node cannot
> mark down the rest of the cluster.  The default setting is 1 so that small
> test clusters can mark down osds, but if you have 3+ nodes, you should set
> it to N+1 if you can.  Setting it to more than 2 nodes is equally as
> problematic.  However, if you just want things to report as fast as
> possible, leaving this at 1 still might be optimal to getting it marked
> down sooner.
>
> The downside to lowering these settings is if OSDs are getting marked down
> for running slower, then they will re-assert themselves to the mons and end
> up causing backfilling and peering for no really good reason.  You'll want
> to monitor your cluster for OSDs being marked down for a few seconds before
> marking themselves back up.  You can see this in the OSD logs where the OSD
> says it was wrongfully marked down in one line and then the next is where
> it tells the mons it is actually up.
>
> On Thu, Jun 15, 2017 at 10:44 AM Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> I forgot to say that after upgrade the machine RAM to 4Gb, the OSD
>> daemons has started to use only a 5% (about 200MB). Is like magic, and now
>> I've about 3.2Gb of free RAM.
>>
>> Greetings!!
>>
>> 2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:
>>
>>> Finally, the problem was W3Total Cache, that seems to be unable to
>>> manage HA and when the master redis host is down, it stop working without
>>> try the slave.
>>>
>>> I've added some options to make it faster to detect a down OSD and the
>>> page is online again in about 40s.
>>>
>>> [global]
>>> fsid = Hidden
>>> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
>>> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
>>> auth_cluster_required = cephx
>>> auth_service_required = cephx
>>> auth_client_required = cephx
>>> osd mon heartbeat interval = 5
>>> osd mon report interval max = 10
>>> mon osd report timeout = 15
>>> osd fast fail on connection refused = True
>>>
>>> public network = 10.20.1.0/24
>>> osd pool default size = 2
>>>
>>>
>>> Greetings and thanks for all your help.
>>>
>>> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>>
>>>> I've used the kernel client and the ceph-fuse driver for mapping the
>>>> cephfs volume.  I didn't notice any network hiccups while failing over, but
>>>> I was reading large files during my tests (and live) and some caching may
>>>> have hidden hidden network hiccups for my use case.
>>>>
>>>> Going back to the memory potentially being a problem.  Ceph has a
>>>> tendency to start using 2-3x more memory while it's in a degraded state as
>>>> opposed to when everything is health_ok.  Always plan for over-provisioning
>>>> your memory to account for a minimum of 2x.  I've seen clusters stuck in an
>>>> OOM killer death spiral because it kept killing OSDs for running out of
>>>> memory, that caused more peering and backfilling, ... which caused more
>>>> OSDs to be killed by OOM killer.
>>>>
>>>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco

I forgot to say that after upgrade the machine RAM to 4Gb, the OSD daemons
has started to use only a 5% (about 200MB). Is like magic, and now I've
about 3.2Gb of free RAM.

Greetings!!

2017-06-15 15:08 GMT+02:00 Daniel Carrasco <d.carra...@i2tic.com>:

> Finally, the problem was W3Total Cache, that seems to be unable to manage
> HA and when the master redis host is down, it stop working without try the
> slave.
>
> I've added some options to make it faster to detect a down OSD and the
> page is online again in about 40s.
>
> [global]
> fsid = Hidden
> mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
> mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd mon heartbeat interval = 5
> osd mon report interval max = 10
> mon osd report timeout = 15
> osd fast fail on connection refused = True
>
> public network = 10.20.1.0/24
> osd pool default size = 2
>
>
> Greetings and thanks for all your help.
>
> 2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>:
>
>> I've used the kernel client and the ceph-fuse driver for mapping the
>> cephfs volume.  I didn't notice any network hiccups while failing over, but
>> I was reading large files during my tests (and live) and some caching may
>> have hidden hidden network hiccups for my use case.
>>
>> Going back to the memory potentially being a problem.  Ceph has a
>> tendency to start using 2-3x more memory while it's in a degraded state as
>> opposed to when everything is health_ok.  Always plan for over-provisioning
>> your memory to account for a minimum of 2x.  I've seen clusters stuck in an
>> OOM killer death spiral because it kept killing OSDs for running out of
>> memory, that caused more peering and backfilling, ... which caused more
>> OSDs to be killed by OOM killer.
>>
>> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com>
>> wrote:
>>
>>> Is strange because on my test cluster (three nodes) with two nodes with
>>> OSD, and all with MON and MDS, I've configured the size to 2 and min_size
>>> to 1, I've restarted all nodes one by one and the client loose the
>>> connection for about 5 seconds until connect to other MDS.
>>>
>>> Are you using ceph client or kernel client?
>>> I forgot to say that I'm using Debian 8.
>>>
>>> Anyway, maybe the problem was what I've said before, the clients
>>> connection with that node started to fail, but the node was not officially
>>> down. And it wasn't a client problem, because it happened on both clients
>>> and on my monitoring service at same time.
>>>
>>> Just now I'm not on the office, so I can't post the config file.
>>> Tomorrow I'll send it.
>>> Anyway, is the basic file generated by ceph-deploy with client network
>>> and min_size configurations. Just like my test config.
>>>
>>> Thanks!!, and greetings!!
>>>
>>> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com>
>>> escribió:
>>>
>>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at
>>> a time to do ceph and kernel upgrades.  The VM's running out of ceph, the
>>> clients accessing MDS, etc all keep working fine without any problem during
>>> these restarts.  What is your full ceph configuration?  There must be
>>> something not quite right in there.
>>>
>>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com>
>>>> escribió:
>>>>
>>>> Not just the min_size of your cephfs data pool, but also your
>>>> cephfs_metadata pool.
>>>>
>>>>
>>>> Both were at 1. I don't know why because I don't remember to have
>>>> changed the min_size and the cluster has 3 odd from beginning (I did
>>>> it on another cluster for testing purposes, but I don't remember to have
>>>> changed on this). I've changed both to two, but after the fail.
>>>>
>>>> About the size, I use 50Gb because it's for a single webpage and I
>>>> don't need more space.
>>>>
>>>> I'll try to increase the memory to 3Gb.
>>>>
>>>> Greetings!!
>>>>
>>>>
>>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com>

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-15 Thread Daniel Carrasco

Finally, the problem was W3Total Cache, that seems to be unable to manage
HA and when the master redis host is down, it stop working without try the
slave.

I've added some options to make it faster to detect a down OSD and the page
is online again in about 40s.

[global]
fsid = Hidden
mon_initial_members = alantra_fs-01, alantra_fs-02, alantra_fs-03
mon_host = 10.20.1.109,10.20.1.97,10.20.1.216
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd mon heartbeat interval = 5
osd mon report interval max = 10
mon osd report timeout = 15
osd fast fail on connection refused = True

public network = 10.20.1.0/24
osd pool default size = 2


Greetings and thanks for all your help.

2017-06-14 23:09 GMT+02:00 David Turner <drakonst...@gmail.com>:

> I've used the kernel client and the ceph-fuse driver for mapping the
> cephfs volume.  I didn't notice any network hiccups while failing over, but
> I was reading large files during my tests (and live) and some caching may
> have hidden hidden network hiccups for my use case.
>
> Going back to the memory potentially being a problem.  Ceph has a tendency
> to start using 2-3x more memory while it's in a degraded state as opposed
> to when everything is health_ok.  Always plan for over-provisioning your
> memory to account for a minimum of 2x.  I've seen clusters stuck in an OOM
> killer death spiral because it kept killing OSDs for running out of memory,
> that caused more peering and backfilling, ... which caused more OSDs to be
> killed by OOM killer.
>
> On Wed, Jun 14, 2017 at 5:01 PM Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> Is strange because on my test cluster (three nodes) with two nodes with
>> OSD, and all with MON and MDS, I've configured the size to 2 and min_size
>> to 1, I've restarted all nodes one by one and the client loose the
>> connection for about 5 seconds until connect to other MDS.
>>
>> Are you using ceph client or kernel client?
>> I forgot to say that I'm using Debian 8.
>>
>> Anyway, maybe the problem was what I've said before, the clients
>> connection with that node started to fail, but the node was not officially
>> down. And it wasn't a client problem, because it happened on both clients
>> and on my monitoring service at same time.
>>
>> Just now I'm not on the office, so I can't post the config file. Tomorrow
>> I'll send it.
>> Anyway, is the basic file generated by ceph-deploy with client network
>> and min_size configurations. Just like my test config.
>>
>> Thanks!!, and greetings!!
>>
>> El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com>
>> escribió:
>>
>> I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at
>> a time to do ceph and kernel upgrades.  The VM's running out of ceph, the
>> clients accessing MDS, etc all keep working fine without any problem during
>> these restarts.  What is your full ceph configuration?  There must be
>> something not quite right in there.
>>
>> On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com>
>> wrote:
>>
>>>
>>>
>>> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com>
>>> escribió:
>>>
>>> Not just the min_size of your cephfs data pool, but also your
>>> cephfs_metadata pool.
>>>
>>>
>>> Both were at 1. I don't know why because I don't remember to have
>>> changed the min_size and the cluster has 3 odd from beginning (I did it
>>> on another cluster for testing purposes, but I don't remember to have
>>> changed on this). I've changed both to two, but after the fail.
>>>
>>> About the size, I use 50Gb because it's for a single webpage and I don't
>>> need more space.
>>>
>>> I'll try to increase the memory to 3Gb.
>>>
>>> Greetings!!
>>>
>>>
>>> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com>
>>> wrote:
>>>
>>>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes
>>>> are definitely on the low end.  50GB OSDs... I don't know what that will
>>>> require, but where you're running the mon and mds on the same node, I'd
>>>> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
>>>> surprising, even at that size.
>>>>
>>>> When you say you increased the size of the pools to 3, what did you do
>>>> to the min_size?  Is that still set to 2?
>>>>
>>>> On Wed, Jun 14, 2017 at 3:17 PM Daniel

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco

Is strange because on my test cluster (three nodes) with two nodes with
OSD, and all with MON and MDS, I've configured the size to 2 and min_size
to 1, I've restarted all nodes one by one and the client loose the
connection for about 5 seconds until connect to other MDS.

Are you using ceph client or kernel client?
I forgot to say that I'm using Debian 8.

Anyway, maybe the problem was what I've said before, the clients connection
with that node started to fail, but the node was not officially down. And
it wasn't a client problem, because it happened on both clients and on my
monitoring service at same time.

Just now I'm not on the office, so I can't post the config file. Tomorrow
I'll send it.
Anyway, is the basic file generated by ceph-deploy with client network and
min_size configurations. Just like my test config.

Thanks!!, and greetings!!

El 14 jun. 2017 10:38 p. m., "David Turner" <drakonst...@gmail.com>
escribió:

I have 3 ceph nodes, size 3, min_size 2, and I can restart them all 1 at a
time to do ceph and kernel upgrades.  The VM's running out of ceph, the
clients accessing MDS, etc all keep working fine without any problem during
these restarts.  What is your full ceph configuration?  There must be
something not quite right in there.

On Wed, Jun 14, 2017 at 4:26 PM Daniel Carrasco <d.carra...@i2tic.com>
wrote:

>
>
> El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com>
> escribió:
>
> Not just the min_size of your cephfs data pool, but also your
> cephfs_metadata pool.
>
>
> Both were at 1. I don't know why because I don't remember to have changed
> the min_size and the cluster has 3 odd from beginning (I did it on
> another cluster for testing purposes, but I don't remember to have changed
> on this). I've changed both to two, but after the fail.
>
> About the size, I use 50Gb because it's for a single webpage and I don't
> need more space.
>
> I'll try to increase the memory to 3Gb.
>
> Greetings!!
>
>
> On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com>
> wrote:
>
>> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
>> definitely on the low end.  50GB OSDs... I don't know what that will
>> require, but where you're running the mon and mds on the same node, I'd
>> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
>> surprising, even at that size.
>>
>> When you say you increased the size of the pools to 3, what did you do to
>> the min_size?  Is that still set to 2?
>>
>> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco <d.carra...@i2tic.com>
>> wrote:
>>
>>> Finally I've created three nodes, I've increased the size of pools to 3
>>> and I've created 3 MDS (active, standby, standby).
>>>
>>> Today the server has decided to fail and I've noticed that failover is
>>> not working... The ceph -s command shows like everything was OK but the
>>> clients weren't able to connect and I had to restart the failing node and
>>> reconect the clients manually to make it work again (even I think that the
>>> active MDS was in another node).
>>>
>>> I don't know if maybe is because the server was not fully down, and only
>>> some connections were failing. I'll do some tests too see.
>>>
>>> Another question: How many memory needs a node to work?, because I've
>>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>>> memory usage (more than 1GB on the OSD).
>>> The OSD size is 50GB and the data that contains is less than 3GB.
>>>
>>> Thanks, and Greetings!!
>>>
>>> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>:
>>>
>>>> Since your app is an Apache / php app is it possible for you to
>>>> reconfigure the app to use S3 module rather than a posix open file()?  Then
>>>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
>>>> "active-active" endpoints with round robin dns or F5 or something.  You
>>>> would also have to repopulate objects into the rados pools.
>>>>
>>>> Also increase that size parameter to 3.  ;-)
>>>>
>>>> Lots of work for active-active but the whole stack will be much more
>>>> resilient coming from some with a ClearCase / NFS / stale file handles up
>>>> the wazoo background
>>>>
>>>>
>>>>
>>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com
>>>> > wrote:
>>>>
>>>>> 2017-06-12 16:10 GMT+02:00 David Turner <drako

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco

El 14 jun. 2017 10:08 p. m., "David Turner" <drakonst...@gmail.com>
escribió:

Not just the min_size of your cephfs data pool, but also your
cephfs_metadata pool.


Both were at 1. I don't know why because I don't remember to have changed
the min_size and the cluster has 3 odd from beginning (I did it on another
cluster for testing purposes, but I don't remember to have changed on
this). I've changed both to two, but after the fail.

About the size, I use 50Gb because it's for a single webpage and I don't
need more space.

I'll try to increase the memory to 3Gb.

Greetings!!


On Wed, Jun 14, 2017 at 4:07 PM David Turner <drakonst...@gmail.com> wrote:

> Ceph recommends 1GB of RAM for ever 1TB of OSD space.  Your 2GB nodes are
> definitely on the low end.  50GB OSDs... I don't know what that will
> require, but where you're running the mon and mds on the same node, I'd
> still say that 2GB is low.  The Ceph OSD daemon using 1GB of RAM is not
> surprising, even at that size.
>
> When you say you increased the size of the pools to 3, what did you do to
> the min_size?  Is that still set to 2?
>
> On Wed, Jun 14, 2017 at 3:17 PM Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> Finally I've created three nodes, I've increased the size of pools to 3
>> and I've created 3 MDS (active, standby, standby).
>>
>> Today the server has decided to fail and I've noticed that failover is
>> not working... The ceph -s command shows like everything was OK but the
>> clients weren't able to connect and I had to restart the failing node and
>> reconect the clients manually to make it work again (even I think that the
>> active MDS was in another node).
>>
>> I don't know if maybe is because the server was not fully down, and only
>> some connections were failing. I'll do some tests too see.
>>
>> Another question: How many memory needs a node to work?, because I've
>> nodes with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
>> memory usage (more than 1GB on the OSD).
>> The OSD size is 50GB and the data that contains is less than 3GB.
>>
>> Thanks, and Greetings!!
>>
>> 2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>:
>>
>>> Since your app is an Apache / php app is it possible for you to
>>> reconfigure the app to use S3 module rather than a posix open file()?  Then
>>> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
>>> "active-active" endpoints with round robin dns or F5 or something.  You
>>> would also have to repopulate objects into the rados pools.
>>>
>>> Also increase that size parameter to 3.  ;-)
>>>
>>> Lots of work for active-active but the whole stack will be much more
>>> resilient coming from some with a ClearCase / NFS / stale file handles up
>>> the wazoo background
>>>
>>>
>>>
>>> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com>
>>> wrote:
>>>
>>>> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>>>
>>>>> I have an incredibly light-weight cephfs configuration.  I set up an
>>>>> MDS on each mon (3 total), and have 9TB of data in cephfs.  This data only
>>>>> has 1 client that reads a few files at a time.  I haven't noticed any
>>>>> downtime when it fails over to a standby MDS.  So it definitely depends on
>>>>> your workload as to how a failover will affect your environment.
>>>>>
>>>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com>
>>>>> wrote:
>>>>>
>>>>>> We use the following in our ceph.conf for MDS failover. We're running
>>>>>> one active and one standby. Last time it failed over there was about 2
>>>>>> minutes of downtime before the mounts started responding again but it did
>>>>>> recover gracefully.
>>>>>>
>>>>>> [mds]
>>>>>> max_mds = 1
>>>>>> mds_standby_for_rank = 0
>>>>>> mds_standby_replay = true
>>>>>>
>>>>>> ___
>>>>>>
>>>>>> John Petrini
>>>>>> ___
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>
>>>>
>>>> Thanks to both.
>>>> Just now i'm

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-14 Thread Daniel Carrasco

Finally I've created three nodes, I've increased the size of pools to 3 and
I've created 3 MDS (active, standby, standby).

Today the server has decided to fail and I've noticed that failover is not
working... The ceph -s command shows like everything was OK but the clients
weren't able to connect and I had to restart the failing node and reconect
the clients manually to make it work again (even I think that the active
MDS was in another node).

I don't know if maybe is because the server was not fully down, and only
some connections were failing. I'll do some tests too see.

Another question: How many memory needs a node to work?, because I've nodes
with 2GB of RAM (one MDS, one MON and one OSD), and they have an high
memory usage (more than 1GB on the OSD).
The OSD size is 50GB and the data that contains is less than 3GB.

Thanks, and Greetings!!

2017-06-12 23:33 GMT+02:00 Mazzystr <mazzy...@gmail.com>:

> Since your app is an Apache / php app is it possible for you to
> reconfigure the app to use S3 module rather than a posix open file()?  Then
> with Ceph drop CephFS and configure Civetweb S3 gateway?  You can have
> "active-active" endpoints with round robin dns or F5 or something.  You
> would also have to repopulate objects into the rados pools.
>
> Also increase that size parameter to 3.  ;-)
>
> Lots of work for active-active but the whole stack will be much more
> resilient coming from some with a ClearCase / NFS / stale file handles up
> the wazoo background
>
>
>
> On Mon, Jun 12, 2017 at 10:41 AM, Daniel Carrasco <d.carra...@i2tic.com>
> wrote:
>
>> 2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>:
>>
>>> I have an incredibly light-weight cephfs configuration.  I set up an MDS
>>> on each mon (3 total), and have 9TB of data in cephfs.  This data only has
>>> 1 client that reads a few files at a time.  I haven't noticed any downtime
>>> when it fails over to a standby MDS.  So it definitely depends on your
>>> workload as to how a failover will affect your environment.
>>>
>>> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com>
>>> wrote:
>>>
>>>> We use the following in our ceph.conf for MDS failover. We're running
>>>> one active and one standby. Last time it failed over there was about 2
>>>> minutes of downtime before the mounts started responding again but it did
>>>> recover gracefully.
>>>>
>>>> [mds]
>>>> max_mds = 1
>>>> mds_standby_for_rank = 0
>>>> mds_standby_replay = true
>>>>
>>>> ___
>>>>
>>>> John Petrini
>>>> ___
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>> Thanks to both.
>> Just now i'm working on that because I needs a very fast failover. For
>> now the tests give me a very fast response when an OSD fails (about 5
>> seconds), but a very slow response when the main MDS fails (I've not tested
>> the real time, but was not working for a long time). Maybe was because I
>> created the other MDS after mount, because I've done some test just before
>> send this email and now looks very fast (i've not noticed the downtime).
>>
>> Greetings!!
>>
>>
>> --
>> _
>>
>>   Daniel Carrasco Marín
>>   Ingeniería para la Innovación i2TIC, S.L.
>>   Tlf:  +34 911 12 32 84 Ext: 223
>>   www.i2tic.com
>> _
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Daniel Carrasco

2017-06-12 16:10 GMT+02:00 David Turner <drakonst...@gmail.com>:

> I have an incredibly light-weight cephfs configuration.  I set up an MDS
> on each mon (3 total), and have 9TB of data in cephfs.  This data only has
> 1 client that reads a few files at a time.  I haven't noticed any downtime
> when it fails over to a standby MDS.  So it definitely depends on your
> workload as to how a failover will affect your environment.
>
> On Mon, Jun 12, 2017 at 9:59 AM John Petrini <jpetr...@coredial.com>
> wrote:
>
>> We use the following in our ceph.conf for MDS failover. We're running one
>> active and one standby. Last time it failed over there was about 2 minutes
>> of downtime before the mounts started responding again but it did recover
>> gracefully.
>>
>> [mds]
>> max_mds = 1
>> mds_standby_for_rank = 0
>> mds_standby_replay = true
>>
>> ___
>>
>> John Petrini
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

Thanks to both.
Just now i'm working on that because I needs a very fast failover. For now
the tests give me a very fast response when an OSD fails (about 5 seconds),
but a very slow response when the main MDS fails (I've not tested the real
time, but was not working for a long time). Maybe was because I created the
other MDS after mount, because I've done some test just before send this
email and now looks very fast (i've not noticed the downtime).

Greetings!!


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HAof MDS daemon.

2017-06-12 Thread Daniel Carrasco

2017-06-12 10:49 GMT+02:00 Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de>:

> Hi,
>
>
> On 06/12/2017 10:31 AM, Daniel Carrasco wrote:
>
>> Hello,
>>
>> I'm very new on Ceph, so maybe this question is a noob question.
>>
>> We have an architecture that have some web servers (nginx, php...) with a
>> common File Server through NFS. Of course that is a SPOF, so we want to
>> create a multi FS to avoid future problems.
>>
>> We've already tested GlusterFS, but is very slow reading small files
>> using the oficial client (from 600ms to 1700ms to read the Doc page), and
>> through NFS Ganesha it fails a lot (permissions errors, 404 when the file
>> exists...).
>> The next we're trying is Ceph that looks very well and have a good
>> performance even with small files (near to NFS performance: 90-100ms to
>> 100-120ms), but on some tests that I've done, it stop working when an OSD
>> is down.
>>
>> My test architecture are two servers with one OSD and one MON each, and a
>> third with a MON and an MDS. I've configured the cluster to have two copies
>> of every PG (just like a RAID1) and all looks fine (health OK, three
>> monitors...).
>> My test client also works fine: it connects to the cluster and is able to
>> server the webpage without problems, but my problems comes when an OSD goes
>> down. The cluster detects that is down, it shows like needs more OSD to
>> keep the two copies, designates a new MON and looks like is working, but
>> the client is unable to receive new files until I power on the OSD again
>> (it happens with both OSD).
>>
>> My question is: Is there any way to say Ceph to keep serving files even
>> when an OSD is down?
>>
>
> I assume the data pool is configured with size=2 and min_size=2. This
> means that you need two active replicates to allow I/O to a PG. With one
> OSD down this requirement cannot be met.
>
> You can either:
> - add a third OSD
> - set min_size=1


> The later might be fine for test setup, but do not run this configuration
> in production. NEVER. EVER. Search the mailing list for more details.


Thanks!! , just what I thought, a noob question hehe. Now is working.
I'll search later in list, but looks like is to avoid split brain or
similar.



>
>
>
>
>>
>> My other question is about MDS:
>> Multi-MDS enviorement is stable?, because if I have multiple FS to avoid
>> SPOF and I only can deploy an MDS, then we have a new SPOF...
>> This is to know if maybe i need to use Block Devices pools instead File
>> Server pools.
>>
>
> AFAIK active/active MDS setups are still considered experimental;
> active/standby(-replay) is a supported setup. We currently use one active
> and one standby-replay MDS for our CephFS instance serving several million
> files.
>
> Failover between the MDS works, but might run into problems with a large
> number of open files (each requiring a stat operation). Depending on the
> number of open files failover takes some seconds up to 5-10 minutes in our
> setup.
>

Thanks again for your response,
Is not for performance purporse so an active/standby will be enough. I'll
search about this configuration.

About time, always is better to keep the page down for some seconds instead
wait for an admin to fix it.



>
> Regards,
> Burkhard Linke
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



Greetings!!

-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] HA Filesystem mode (MON, OSD, MDS) with Ceph and HA of MDS daemon.

2017-06-12 Thread Daniel Carrasco

Hello,

I'm very new on Ceph, so maybe this question is a noob question.

We have an architecture that have some web servers (nginx, php...) with a
common File Server through NFS. Of course that is a SPOF, so we want to
create a multi FS to avoid future problems.

We've already tested GlusterFS, but is very slow reading small files using
the oficial client (from 600ms to 1700ms to read the Doc page), and through
NFS Ganesha it fails a lot (permissions errors, 404 when the file
exists...).
The next we're trying is Ceph that looks very well and have a good
performance even with small files (near to NFS performance: 90-100ms to
100-120ms), but on some tests that I've done, it stop working when an OSD
is down.

My test architecture are two servers with one OSD and one MON each, and a
third with a MON and an MDS. I've configured the cluster to have two copies
of every PG (just like a RAID1) and all looks fine (health OK, three
monitors...).
My test client also works fine: it connects to the cluster and is able to
server the webpage without problems, but my problems comes when an OSD goes
down. The cluster detects that is down, it shows like needs more OSD to
keep the two copies, designates a new MON and looks like is working, but
the client is unable to receive new files until I power on the OSD again
(it happens with both OSD).

My question is: Is there any way to say Ceph to keep serving files even
when an OSD is down?


My other question is about MDS:
Multi-MDS enviorement is stable?, because if I have multiple FS to avoid
SPOF and I only can deploy an MDS, then we have a new SPOF...
This is to know if maybe i need to use Block Devices pools instead File
Server pools.

Thanks!!! and greetings!!


-- 
_

  Daniel Carrasco Marín
  Ingeniería para la Innovación i2TIC, S.L.
  Tlf:  +34 911 12 32 84 Ext: 223
  www.i2tic.com
_
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

44 matches

Mail list logo