Re: [Ubnt_users] WS-8-250-DC rebooting

2016-06-28 Thread James Wilson
Got to get syslog set up...

Haven't had any more of these problems, still don't know what it was.
On Jun 28, 2016 1:40 PM, "Chris Ruschmann"  wrote:

> I have almost the same setup as you in several locations. I'm gonna lean
> on a bad ground somewhere. Check all your cables to your radios, solar and
> everything. Maybe a squirrel thought one of them tasted nice.
>
> Also setup syslog to a remote server.
> On May 30, 2016 1:32 PM, "James Wilson"  wrote:
>
>> I replaced the switch with a cold backup off the shelf with 1.4 firmware
>> and the problems stopped.  But it turned out to have a dead fan.
>>
>> So I put the original switch back in after loading 1.4 firmware and
>> manually rebuilding the configuration.  Seems to be all back to normal
>> now...
>>
>> On Fri, May 20, 2016 at 11:54 PM, RickG  wrote:
>>
>>> I saw this a couple times: Once with a bad POE surge protector and
>>> another time it had a bad cable.
>>> BTW: Firmware v1.4.0 - FINAL is out!!!
>>>
>>> On Fri, May 20, 2016 at 8:37 PM, James Wilson 
>>> wrote:
>>>
 This is sort of UBNT related - all of the radios are UBNT with a
 Netonix switch.  I posted this on the Netonix forums, but they aren't as
 widely traveled.

 So I'm kind of crossposting here to see if anyone has some ideas.
 Thanks!

 We're having a problem that we're having a lot of trouble
 troubleshooting. I brought this up with Chris today, but no answers yet.

 So we have a solar powered relay site that takes about an hour to get
 to. In the last few days the site has been going down intermittently. And
 it may have been longer than that, but maybe not long enough outages for
 AirControl 1 to send us an alarm.

 Yesterday afternoon the old switch (running 1.3.2) went down and didn't
 come back up until about 1:30 am this morning. We could log into the
 backhaul and the backhaul showed link at 100 Mbps Full duplex, but we
 couldn't log into the switch or anything past it. The backhaul is a
 NanoBridge and we have three Rockets hooked to it as APs.

 Some ants had gotten into the enclosure and we thought that was the
 problem with the switch. So we swapped in a new switch, (running 1.4.0rc25)
 stayed around about an hour to watch it and saw no problems.

 This afternoon the new switch started rebooting. And the backhaul radio
 would lose power when the switch rebooted.

 I updated the firmware on the old switch and let it run on the bench
 for several hours with no problem.

 There are no AirFibers connected to this switch, but we do have an AF
 connected to a different WS-8-250-DC at a different site.

 I copied the log from the new switch below and maybe I can figure out
 how to insert a screen shot if that will help.



 Help! :)








 Dec 31 19:00:04 sysinit: killall: udhcpc: no process killed
 Dec 31 19:00:05 netonix: 1.4.0rc25 on WS-8-250-DC
 Dec 31 19:00:07 kernel: vtss_core: module license '(c) Vitesse
 Semiconductor Inc.' taints kernel.
 Dec 31 19:00:07 kernel: switch: 'Luton26' board detected
 Dec 31 19:00:09 kernel: vtss_port: Loaded port module on board Luton26,
 type 5
 Dec 31 19:00:09 kernel: nf_conntrack version 0.5.0 (2048 buckets, 8192
 max)
 Dec 31 19:00:10 kernel: i2c /dev entries driver
 Dec 31 19:00:10 system: Setting MAC address from flash configuration:
 EC:13:B3:51:xx:xx
 Dec 31 19:00:10 kernel: i2c_vcoreiii i2c_vcoreiii: i2c bus driver on
 IRQ 19
 Dec 31 19:00:11 sysinit: Loading defaults
 Dec 31 19:00:11 sysinit: Adding custom chains
 Dec 31 19:00:11 system: starting ntpclient
 Dec 31 19:00:12 sysinit: Loading zones
 Dec 31 19:00:12 sysinit: Loading forwarding
 Dec 31 19:00:12 sysinit: Loading redirects
 Dec 31 19:00:12 sysinit: Loading rules
 Dec 31 19:00:12 sysinit: Loading includes
 Dec 31 19:00:12 admin: adding lan (eth0) to firewall zone lan
 Dec 31 19:00:17 Port: link state changed to 'up' (100M-F) on port 3
 Dec 31 19:00:17 Port: link state changed to 'up' (10M-F) on port 1
 Dec 31 19:00:18 sysinit: killall: telnetd: no process killed
 Dec 31 19:00:18 Port: link state changed to 'up' (10M-F) on port 5
 Dec 31 19:00:18 Port: link state changed to 'up' (10M-F) on port 7
 Dec 31 19:00:19 sysinit: 1969-12-31 19:00:18: (log.c.97) server started
 Dec 31 19:00:20 dropbear[741]: Running in background
 Dec 31 19:00:21 kernel: eth0: no IPv6 routers present
 Dec 31 19:00:22 Port: link state changed to 'down' on port 5
 Dec 31 19:00:22 Port: link state changed to 'down' on port 1
 Dec 31 19:00:23 Port: link state changed to 'down' on port 7
 Dec 31 19:00:23 switch[769]: Detected cold boot
 Dec 31 19:00:24 Port: link state changed to 'down' on port 3
 Dec 31 19:00:25 switch[769]: PoE enabled on port 1, PoE Smart is
 star

Re: [Ubnt_users] WS-8-250-DC rebooting

2016-06-28 Thread Chris Ruschmann
I have almost the same setup as you in several locations. I'm gonna lean on
a bad ground somewhere. Check all your cables to your radios, solar and
everything. Maybe a squirrel thought one of them tasted nice.

Also setup syslog to a remote server.
On May 30, 2016 1:32 PM, "James Wilson"  wrote:

> I replaced the switch with a cold backup off the shelf with 1.4 firmware
> and the problems stopped.  But it turned out to have a dead fan.
>
> So I put the original switch back in after loading 1.4 firmware and
> manually rebuilding the configuration.  Seems to be all back to normal
> now...
>
> On Fri, May 20, 2016 at 11:54 PM, RickG  wrote:
>
>> I saw this a couple times: Once with a bad POE surge protector and
>> another time it had a bad cable.
>> BTW: Firmware v1.4.0 - FINAL is out!!!
>>
>> On Fri, May 20, 2016 at 8:37 PM, James Wilson 
>> wrote:
>>
>>> This is sort of UBNT related - all of the radios are UBNT with a Netonix
>>> switch.  I posted this on the Netonix forums, but they aren't as widely
>>> traveled.
>>>
>>> So I'm kind of crossposting here to see if anyone has some ideas.
>>> Thanks!
>>>
>>> We're having a problem that we're having a lot of trouble
>>> troubleshooting. I brought this up with Chris today, but no answers yet.
>>>
>>> So we have a solar powered relay site that takes about an hour to get
>>> to. In the last few days the site has been going down intermittently. And
>>> it may have been longer than that, but maybe not long enough outages for
>>> AirControl 1 to send us an alarm.
>>>
>>> Yesterday afternoon the old switch (running 1.3.2) went down and didn't
>>> come back up until about 1:30 am this morning. We could log into the
>>> backhaul and the backhaul showed link at 100 Mbps Full duplex, but we
>>> couldn't log into the switch or anything past it. The backhaul is a
>>> NanoBridge and we have three Rockets hooked to it as APs.
>>>
>>> Some ants had gotten into the enclosure and we thought that was the
>>> problem with the switch. So we swapped in a new switch, (running 1.4.0rc25)
>>> stayed around about an hour to watch it and saw no problems.
>>>
>>> This afternoon the new switch started rebooting. And the backhaul radio
>>> would lose power when the switch rebooted.
>>>
>>> I updated the firmware on the old switch and let it run on the bench for
>>> several hours with no problem.
>>>
>>> There are no AirFibers connected to this switch, but we do have an AF
>>> connected to a different WS-8-250-DC at a different site.
>>>
>>> I copied the log from the new switch below and maybe I can figure out
>>> how to insert a screen shot if that will help.
>>>
>>>
>>>
>>> Help! :)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Dec 31 19:00:04 sysinit: killall: udhcpc: no process killed
>>> Dec 31 19:00:05 netonix: 1.4.0rc25 on WS-8-250-DC
>>> Dec 31 19:00:07 kernel: vtss_core: module license '(c) Vitesse
>>> Semiconductor Inc.' taints kernel.
>>> Dec 31 19:00:07 kernel: switch: 'Luton26' board detected
>>> Dec 31 19:00:09 kernel: vtss_port: Loaded port module on board Luton26,
>>> type 5
>>> Dec 31 19:00:09 kernel: nf_conntrack version 0.5.0 (2048 buckets, 8192
>>> max)
>>> Dec 31 19:00:10 kernel: i2c /dev entries driver
>>> Dec 31 19:00:10 system: Setting MAC address from flash configuration:
>>> EC:13:B3:51:xx:xx
>>> Dec 31 19:00:10 kernel: i2c_vcoreiii i2c_vcoreiii: i2c bus driver on IRQ
>>> 19
>>> Dec 31 19:00:11 sysinit: Loading defaults
>>> Dec 31 19:00:11 sysinit: Adding custom chains
>>> Dec 31 19:00:11 system: starting ntpclient
>>> Dec 31 19:00:12 sysinit: Loading zones
>>> Dec 31 19:00:12 sysinit: Loading forwarding
>>> Dec 31 19:00:12 sysinit: Loading redirects
>>> Dec 31 19:00:12 sysinit: Loading rules
>>> Dec 31 19:00:12 sysinit: Loading includes
>>> Dec 31 19:00:12 admin: adding lan (eth0) to firewall zone lan
>>> Dec 31 19:00:17 Port: link state changed to 'up' (100M-F) on port 3
>>> Dec 31 19:00:17 Port: link state changed to 'up' (10M-F) on port 1
>>> Dec 31 19:00:18 sysinit: killall: telnetd: no process killed
>>> Dec 31 19:00:18 Port: link state changed to 'up' (10M-F) on port 5
>>> Dec 31 19:00:18 Port: link state changed to 'up' (10M-F) on port 7
>>> Dec 31 19:00:19 sysinit: 1969-12-31 19:00:18: (log.c.97) server started
>>> Dec 31 19:00:20 dropbear[741]: Running in background
>>> Dec 31 19:00:21 kernel: eth0: no IPv6 routers present
>>> Dec 31 19:00:22 Port: link state changed to 'down' on port 5
>>> Dec 31 19:00:22 Port: link state changed to 'down' on port 1
>>> Dec 31 19:00:23 Port: link state changed to 'down' on port 7
>>> Dec 31 19:00:23 switch[769]: Detected cold boot
>>> Dec 31 19:00:24 Port: link state changed to 'down' on port 3
>>> Dec 31 19:00:25 switch[769]: PoE enabled on port 1, PoE Smart is
>>> starting cable check
>>> Dec 31 19:00:25 switch[769]: PoE enabled on port 3, PoE Smart is
>>> starting cable check
>>> Dec 31 19:00:25 switch[769]: PoE enabled on port 5, PoE Smart is
>>> starting cable check
>>> Dec 31 19:00:25 switch[769]: PoE enabled on port 7, PoE 

Re: [Ubnt_users] Dieing Gasp

2016-06-28 Thread alex phillips
I think you need to increase the number of monitor points to help with
that.  For example.

We monitor each thing at the tower to start and I am sure you are doing
that.

We have a device that we have that is plugged into unconditioned power
also.  Risky as it may get nailed but if we lose contact with it, generally
it means there was a loss of power and we are on backup.

We monitor CPU uptime on our radios also.   This tells me if an AP has been
rebooting a lot and if so, could mean it's going to die or that there is an
ethernet issue otherwise.

Information overload is the next best thing to being psychic I guess.

You bring up a good point in monitoring.   We have been trying to figure
out how we can automatically monitor things on our network that allows us
to start determining where and why issues happen.  I feel, with all the
monitoring we do, I am still blind to issues.   Ping monitoring does not
tell me when a customer is having poor speed issues, so we are not looking
into how we can track link rates, ccq, AMQ and many other factors that can
indicate when a customer is starting to degrade.   Let me warn you all,
with Title 2 rules and other BS the FCC is cooking up, we are all going to
need to have more eyes on our networks telling us stuff many of us don't
know about our networks.  I have had to stare at screens manually for hours
to figure out where and why a problem happens and given the volume of
customers we have now, I am either going to have to start hiring more
eyeballs or get some automation going.   I don't see any help from the
manufacturers right now on this front.

This would be a good discussion at the CTO round table at WISPAPalooza.

Or I can keep hoping that I crash into a truck with radioactive waste or
maybe get bitten by a radioactive spider and develop those super powers I
have always wanted.



*Alex Phillips*
CEO and General Manager
RBNS.net
HighSpeedLink.net
*WISPA.org Board of Directors ** (2011-2016)*
*WISPA President (2015-2016)*
*540-908-3993*

On Tue, Jun 28, 2016 at 8:01 AM, Matt Hoppes <
mattli...@rivervalleyinternet.net> wrote:

> I would love to see dieing gasp functionality added to Ubiquiti gear.
>
> It would greatly aid the troubleshooting process as we could quickly know
> if the radio died or was simply unplugged.
>
> Even so asking the end user of the PoE brick has power and can they power
> cycle it only results in success part of the time :/
>
> Same for APs. Did the tower just have a power issue, or did a lightening
> strike just take things out?
> ___
> Ubnt_users mailing list
> Ubnt_users@wispa.org
> http://lists.wispa.org/mailman/listinfo/ubnt_users
>
___
Ubnt_users mailing list
Ubnt_users@wispa.org
http://lists.wispa.org/mailman/listinfo/ubnt_users


[Ubnt_users] Dieing Gasp

2016-06-28 Thread Matt Hoppes
I would love to see dieing gasp functionality added to Ubiquiti gear. 

It would greatly aid the troubleshooting process as we could quickly know if 
the radio died or was simply unplugged. 

Even so asking the end user of the PoE brick has power and can they power cycle 
it only results in success part of the time :/

Same for APs. Did the tower just have a power issue, or did a lightening strike 
just take things out?
___
Ubnt_users mailing list
Ubnt_users@wispa.org
http://lists.wispa.org/mailman/listinfo/ubnt_users