I use Ubiquiti equipment, mainly because I'm not a network admin... I
rebooted the 10G switches and now everything is working and recovering. I
hate when there's not a definitive answer but that's kind of the deal when
you use Ubiquiti stuff. Thank you Sean and Frank. Frank, you were right.
It
Yea, assuming you can ping with a lower MTU, check the MTU on your
switching.
On Mon, 25 Jul 2022, 23:05 Jeremy Hansen,
wrote:
> That results in packet loss:
>
> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14
> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
> ^C
> ---
That results in packet loss:
[root@cn01 ~]# ping -M do -s 8972 192.168.30.14
PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
^C
--- 192.168.30.14 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2062ms
That's very weird... but this gives me something to
Does ceph do any kind of io fencing if it notices an anomaly? Do I need to
do something to re-enable these hosts if they get marked as bad?
On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen
wrote:
> MTU is the same across all hosts:
>
> - cn01.ceph.la1.clx.corp-
> enp2s0:
MTU is the same across all hosts:
- cn01.ceph.la1.clx.corp-
enp2s0: flags=4163 mtu 9000
inet 192.168.30.11 netmask 255.255.255.0 broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid 0x20
ether 3c:8c:f8:ed:72:8d txqueuelen 1000
Here's some more info:
HEALTH_WARN 2 failed cephadm daemon(s); 3 hosts fail cephadm check; 2
filesystems are degraded; 1 MDSs report slow metadata IOs; 2/5 mons down,
quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down; Reduced data
availability: 13 pgs inactive, 9 pgs down; Degraded data
Pretty desperate here. Can someone suggest what I might be able to do to
get these OSDs back up. It looks like my recovery had stalled.
On Mon, Jul 25, 2022 at 7:26 AM Anthony D'Atri
wrote:
> Do your values for public and cluster network include the new addresses on
> all nodes?
>
This