[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
I use Ubiquiti equipment, mainly because I'm not a network admin... I rebooted the 10G switches and now everything is working and recovering. I hate when there's not a definitive answer but that's kind of the deal when you use Ubiquiti stuff. Thank you Sean and Frank. Frank, you were right. It

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Sean Redmond
Yea, assuming you can ping with a lower MTU, check the MTU on your switching. On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, wrote: > That results in packet loss: > > [root@cn01 ~]# ping -M do -s 8972 192.168.30.14 > PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data. > ^C > ---

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
That results in packet loss: [root@cn01 ~]# ping -M do -s 8972 192.168.30.14 PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data. ^C --- 192.168.30.14 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2062ms That's very weird... but this gives me something to

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Does ceph do any kind of io fencing if it notices an anomaly? Do I need to do something to re-enable these hosts if they get marked as bad? On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen wrote: > MTU is the same across all hosts: > > - cn01.ceph.la1.clx.corp- > enp2s0:

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
MTU is the same across all hosts: - cn01.ceph.la1.clx.corp- enp2s0: flags=4163 mtu 9000 inet 192.168.30.11 netmask 255.255.255.0 broadcast 192.168.30.255 inet6 fe80::3e8c:f8ff:feed:728d prefixlen 64 scopeid 0x20 ether 3c:8c:f8:ed:72:8d txqueuelen 1000

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Here's some more info: HEALTH_WARN 2 failed cephadm daemon(s); 3 hosts fail cephadm check; 2 filesystems are degraded; 1 MDSs report slow metadata IOs; 2/5 mons down, quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down; Reduced data availability: 13 pgs inactive, 9 pgs down; Degraded data

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Pretty desperate here. Can someone suggest what I might be able to do to get these OSDs back up. It looks like my recovery had stalled. On Mon, Jul 25, 2022 at 7:26 AM Anthony D'Atri wrote: > Do your values for public and cluster network include the new addresses on > all nodes? > This