[ceph-users] Re: Dual network board setup info

2019-11-29 Thread Rodrigo Severo - Fábrica
Em qui., 28 de nov. de 2019 às 18:32, Rodrigo Severo - Fábrica
 escreveu:
>
> Em qui., 28 de nov. de 2019 às 13:39, Wido den Hollander
>  escreveu:
> >
> > On 11/28/19 5:23 PM, Rodrigo Severo - Fábrica wrote:
> > > Em qui., 28 de nov. de 2019 às 00:34, Konstantin Shalygin
> > >  escreveu:
> > >>
> > >>> My servers have 2 network boards each. I would like to use the current
> > >>> local one to talk to Cephs clients (both CephFS and Object Storage)
> > >>> and use the second one to all Cephs processes to talk one to the
> > >>> other.
> > >>>
> > >>
> > >> Ceph support `cluster network` and `public network` options. Only OSD
> > >> work with cluster network. Any other is a OSD clients - public network.
> > >
> > > Great. How do I migrate from my current single network board to a dual 
> > > one?
> > >
> > > Can I migrate servers one by one to the dual network setup or do I
> > > have to stop the whole ceph cluster and restart it all already on the
> > > dual setup?
> >
> > Set the cluster_network in the ceph.conf and restart the OSDs one by one.
>
> Just tried that. The first OSD that I'm trying to restart won't came
> up again.

Does anybody has any suggestion on how to get my ceph fs back to healthy status?

I'm even considering stop it all and restarting but I'm afraid it
won't come back up with the new config.

Ideias? Suggestions?


Rodrigo


It presents the following messages which aren't that useful
> to me:
>
> Nov 28 18:26:33 a2-df systemd[1]: ceph-osd@1.service: Start request
> repeated too quickly.
> Nov 28 18:26:33 a2-df systemd[1]: ceph-osd@1.service: Failed with
> result 'exit-code'.
> Nov 28 18:26:33 a2-df systemd[1]: Failed to start Ceph object storage
> daemon osd.1.
> -- Subject: Unit ceph-osd@1.service has failed
> -- Defined-By: systemd
> -- Support: http://www.ubuntu.com/support
> --
> -- Unit ceph-osd@1.service has failed.
> --
> -- The result is RESULT.
>
> I also see the following error messages:
> Nov 28 18:26:46 a2-df ceph-mon[2526]: 2019-11-28 18:26:46.230
> 7f487dee1700 -1 set_mon_vals failed to set cluster_network =
> 192.168.111.0/24: Configuration option 'cluster_network' may not be
> modified at runtime
> Nov 28 18:26:46 a2-df ceph-mon[2526]: 2019-11-28 18:26:46.230
> 7f487dee1700 -1 set_mon_vals failed to set public_network =
> 192.168.109.0/24: Configuration option 'public_network' may not be
> modified at runtime
>
> There are 2 things that I don't understand in these messages:
>
> 1. Why is it mentioning configuration option 'public_network' in these
> error messages as I didn't change the public_network config, I only
> added a cluster_network one?
>
> 2. Why are there messages from ceph-mon when I'm trying to restart ceph-osd?
>
> And the most important issue: how can I get my osd back online?
>
>
> Regards,
>
> Rodrigo
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph User Survey 2019 [EXT]

2019-11-29 Thread Janne Johansson
Den tors 28 nov. 2019 kl 16:15 skrev Matthew Vernon :

> Hi,
>
> > I'm pleased to announce after much discussion on the Ceph dev mailing
> > list [0] that the community has formed the Ceph Survey for 2019.
>
> The RGW questions include:
>
> "The largest object stored in gigabytes"
>
> Is there a tool that would answer this question for me? I can tell you
> how many GB in total we have, but short of iterating through all the
> objects in our RGWs (which would take ages), I don't know how to answer
> this one...
>

Especially since many of the ceph clients use large objects but split them
into 4M pieces in various ways (which is a good thing) so it's kind of hard
to tell what actually would be the largest.
If I didn't know better, I'd think it would be the metadata on my 0-object
rgw index pools or something that would be the largest. ;)


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Questions about the EC pool

2019-11-29 Thread Paul Emmerich
It should take ~25 seconds by default to detect a network failure, the
config option that controls this is "osd heartbeat grace" (default 20
seconds, but it takes a little longer for it to really detect the
failure).
Check ceph -w while performing the test.


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Fri, Nov 29, 2019 at 8:14 AM majia xiao  wrote:
>
> Hello,
>
>
> We have a Ceph cluster (version 12.2.4) with 10 hosts, and there are 21 OSDs 
> on each host.
>
>
>  An EC pool is created with the following commands:
>
>
> ceph osd erasure-code-profile set profile_jerasure_4_3_reed_sol_van \
>
>   plugin=jerasure \
>
>   k=4 \
>
>   m=3 \
>
>   technique=reed_sol_van \
>
>   packetsize=2048 \
>
>   crush-device-class=hdd \
>
>   crush-failure-domain=host
>
>
> ceph osd pool create pool_jerasure_4_3_reed_sol_van 2048 2048 erasure 
> profile_jerasure_4_3_reed_sol_van
>
>
>
> Here are my questions:
>
> The EC pool is created using k=4, m=3, and crush-device-class=hdd, so we just 
> disable the network interfaces of some hosts (using "ifdown" command) to 
> verify the functionality of the EC pool while performing ‘rados bench’ 
> command.
> However, the IO rate drops immediately to 0 when a single host goes offline, 
> and it takes a long time (~100 seconds) for the IO rate becoming normal.
> As far as I know, the default value of min_size is k+1 or 5, which means that 
> the EC pool can be still working even if there are two hosts offline.
> Is there something wrong with my understanding?
> According to our observations, it seems that the IO rate becomes normal when 
> Ceph detects all OSDs corresponding to the failed host.
> Is there any way to reduce the time needed for Ceph to detect all failed OSDs?
>
>
>
> Thanks for any help.
>
>
> Best regards,
>
> Majia Xiao
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mimic 13.2.6 too much broken connexions

2019-11-29 Thread Vincent Godin
Hello Franck,
Thank you for your help
Ceph is our Openstack main storage. We have 64 computes (ceph
clients), 36 Ceph-Hosts (client and cluster networks) and 3 Mons :  so
roughly 140 arp entries
Our ARP cache size is based on default so 128/512/1024. As 140 < 512,
default should works (i will check over time the arp cachesize however
We tried these settings below 2 weeks ago (we thought it should
improve our network) but it was worst !
net.core.rmem_max = 134217728 (for a 10Gbps with low latency)
net.core.wmem_max = 134217728 (for a 10Gbps with low latency)
net.core.netdev_max_backlog = 30
net.core.somaxconn = 2000
net.ipv4.ip_local_port_range=’1 65000’
net.ipv4.tcp_rmem = 4096 87380 134217728 (for a 10Gbps with low latency)
net.ipv4.tcp_wmem = 4096 87380 134217728 (for a 10Gbps with low latency)
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.tcp_max_syn_backlog = 3

Client and Cluster Network have a 9000 MTU. Each OSD-Host has two
teaming (LACP): 2x10Gbps for client and 2x10Gbps for cluster. Client
network is one level-2 lan, idem for Cluster network
As i said we didn't see significant errors counters on switchs or server

Vincent


Le ven. 29 nov. 2019 à 09:30, Frank Schilder  a écrit :
>
> How large is your arp cache? We have seen ceph dropping connections as soon 
> as the level-2 network (direct neighbours) is larger than the arp cache. We 
> adjusted the following settings:
>
> # Increase ARP cache size to accommodate large level-2 client network.
> net.ipv4.neigh.default.gc_thresh1 = 1024
> net.ipv4.neigh.default.gc_thresh2 = 2048
> net.ipv4.neigh.default.gc_thresh3 = 4096
>
> Another important group of parameters for TCP connections seems to be these, 
> with our values:
>
> ## Increase number of incoming connections. The value can be raised to bursts 
> of request, default is 128
> net.core.somaxconn = 2048
> ## Increase number of incoming connections backlog, default is 1000
> net.core.netdev_max_backlog = 5
> ## Maximum number of remembered connection requests, default is 128
> net.ipv4.tcp_max_syn_backlog = 3
>
> With this, we got rid of dropped connections in a cluster of 20 ceph nodes 
> and ca. 550 client nodes, accounting for about 1500 active ceph clients, 1400 
> cephfs and 170 RBD images.
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Vincent Godin 
> Sent: 27 November 2019 20:11:23
> To: Anthony D'Atri; ceph-users@ceph.io; Ceph Development
> Subject: [ceph-users] Re: mimic 13.2.6 too much broken connexions
>
> If it was a network issue, the counters should explose (as i said,
> with a log level of 5 on the messenger, we observed more then 80 000
> lossy channels per minute) but nothing abnormal is relevant on the
> counters (on switchs and servers)
> On the switchs  no drop, no crc error, no packet loss, only some
> output discards but not enough to be significant. On the NICs on the
> servers via ethtool -S, nothing is relevant.
> And as i said, an other mimic cluster with different hardware has the
> same behavior
> Ceph uses connexions pools from host to host but how does it check the
> availability of these connexions over the time ?
> And as the network doesn't seem to be guilty, what can explain these
> broken channels ?
>
> Le mer. 27 nov. 2019 à 19:05, Anthony D'Atri  a écrit :
> >
> > Are you bonding NIC ports?   If so do you have the correct hash policy 
> > defined? Have you looked at the *switch* side for packet loss, CRC errors, 
> > etc?   What you report could be consistent with this.  Since the host  
> > interface for a given connection will vary by the bond hash, some OSD 
> > connections will use one port and some the other.   So if one port has 
> > switch side errors, or is blackholed on the switch, you could see some 
> > heart beating impacted but not others.
> >
> > Also make sure you have the optimal reporters value.
> >
> > > On Nov 27, 2019, at 7:31 AM, Vincent Godin  wrote:
> > >
> > > Till i submit the mail below few days ago, we found some clues
> > > We observed a lot of lossy connexion like :
> > > ceph-osd.9.log:2019-11-27 11:03:49.369 7f6bb77d0700  0 --
> > > 192.168.4.181:6818/2281415 >> 192.168.4.41:0/1962809518
> > > conn(0x563979a9f600 :6818   s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH
> > > pgs=0 cs=0 l=1).handle_connect_msg accept replacing existing (lossy)
> > > channel (new one lossy=1)
> > > We raised the log of the messenger to 5/5 and observed for the whole
> > > cluster more than 80 000 lossy connexion per minute !!!
> > > We adjusted  the "ms_tcp_read_timeout" from 900 to 60 sec then no more
> > > lossy connexion in logs nor health check failed
> > > It's just a workaround but there is a real problem with these broken
> > > sessions and it leads to two