I have recently have had three probes «die» on me in ways the hardware
actually seems fine, but the problem is with connectivity with the Atlas
infrastructure. I am therefore starting to wonder if my probes were all
OK after all, and that the problems are with the Atlas registration
servers or infrastructure in general.
Here's the timeline:
First problem was with V1 probe #138 back in January. It reported
through SOS messages that it could not connect on HTTPS, error code
«firewall issues suspected» or something like that. However, observing
the network traffic in/out of the probe revealed that the probe did
successfully establish a TCP three-way handshake with
reg01/reg02.atlas.ripe.net, only to immediately close the connection
with a FIN packet, without actually sending any payload. There was no
firewall blocking any traffic. RIPE NCC support essentially shrugged and
said V1 probes aren't really supported nowadays, so I gave up and
recycled the probe.
Second problem was with another V1 probe (#433), in May. That one was
left disconnected for an extended period of time, transitioning to
"abandoned" status. As it did not reconnect after I fixed the network, I
proceeded to write it off and re-connect it, then re-registered it with
my personal account. It too connected fine with the network, and the
event log showed a «Registration» event every time I power-cycled the
probe, but it never went further and the status was stuck at «No
Controller Connection». The written off status never cleared. RIPE NCC
support could confirm that the written off status should have cleared
after reconnection, but otherwise weren't able to help, so this probe
went in the bin as well.
Third problem is with a V3 probe (#10667), ongoing as I write this. It
was working perfectly fine up until last week, when I moved it from a
data centre to my home network. The probe had static IP configuration in
the data centre, and with no way of configuring it to back to using
DHCP, I thought I'd just re-install firmware by booting it up without
the USB in and plugging it back in after a while. It connected fine and
reported «NO-USB» SOS messages, and after plugging in the USB stick
«Registration» events with the hardware details of the USB stick
appeared (curiously enough, most of these events report an IP of
172.17.0.1). At that point the process gets stuck, the «Registration»
events with the USB stick details keep repeating approx. every 65
minutes, and there's a warning banner at the top of the page «Flash
Drive Filesystem Corrupted». So I have re-tried this with several
different USB sticks, all from reputable brands, and all passing
whole-device read/write tests to identify bad blocks when connected to
my laptop. I have also tried replacing the power supply – nothing helps,
it always ends up in the same «Registration» loop. During the process, a
partition table with three 1 GiB large partitions is created on the USB
stick, containing data/filesystems that my Linux laptop does not
recognise as anything familiar.
I could be persuaded that the two V1 probes did indeed have hardware
problems. However the V3 probe clearly does not, at least not with its
USB stick, and since its failure mode is rather similar to how #433
failed (process getting stuck after «Registration» events), I am
starting to wonder if at least #433 was fine all along.
Anyway, i am curious to hear if it is just me being very unlucky with my
probes this year or if anyone else have had similar problems recently?
Another issue, probably unrelated but I thought I'd mention it anyway,
is that when I try to change the ownership of the V3 probe to my
personal account, the web page just displays a red pop-up with the
unhelpful message «An error occurred», and the ownership change does not
happen.
Tore
-----
To unsubscribe from this mailing list or change your subscription options,
please visit: https://mailman.ripe.net/mailman3/lists/ripe-atlas.ripe.net/
As we have migrated to Mailman 3, you will need to create an account with the email matching your subscription before you can change your settings.
More details at: https://www.ripe.net/membership/mail/mailman-3-migration/