Re: [pfSense] 3 hard locks this week... any ideas?

2016-10-16 Thread WebDawg
On Thu, Sep 8, 2016 at 2:29 PM, Todd Russell  wrote:
> Final update on this issue. When I took it down, I pulled the drive and
> started a Level 2 SpinRite on it while I took out and reseated the RAM then
> ran memtest. I found no errors in either test, so I also took out the Intel
> 4 port gigabit card and reseated that, then put everything back together.
> It has been running for a week straight now with no hiccups of any kind, so
> either the SpinRite forced the drive to correct some read errors or
> removing and reseating the RAM got around some dust or oxidation on the
> contacts. It wouldn't be the first time reseating the RAM cleared otherwise
> unexplainable issues with a machine for me, so I will assume that was the
> case. I wish I'd had time to run the memtest before and after reseating the
> RAM but... AIN'T NOBODY GOT TIME FOR THAT!
>
> Thanks to all for the feedback last week.
>
>
> Peace,
> Todd Russell
> Director of IT and Webmaster
> Saint Joseph Abbey and Seminary College
> 985-867-2266
> 985-789-4319
>


https://en.wikipedia.org/wiki/SpinRite#Solid_state_drives

I mean, even if that card was not inserted properly, you would have
had an issue.  You should have tested that ram before reseat, because
same thing there.  So many peoples comments here are just hearsay.
Hard-locks are usually bad hardware or incompatibility and in that
case you are usually happy when it is happening to get some kernel
messages/dumps that can help you out.  I am glad that you solved your
problem but is bad to make any conclusions that are not based on the
scientific method.
___
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold


Re: [pfSense] 3 hard locks this week... any ideas?

2016-10-16 Thread Volker Kuhlmann
On Fri 02 Sep 2016 13:33:35 NZST +1200, compdoc wrote:

> As for me, these days I install only SSDs in desktop systems that run
> 24/7, and also use them as boot drives for servers. Over the years I
> have had only one SSD fail, and it did show pending sectors in SMART.

That's not my observation with SSDs. Which SSD models do you use?
Or better, how do you select your SSDs? That's be really good to know
from those doing well there.

Thanks,

Volker

-- 
Volker Kuhlmann is list0570 with the domain in header.
http://volker.top.geek.nz/  Please do not CC list postings to me.
___
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold


Re: [pfSense] 3 hard locks this week... any ideas?

2016-10-16 Thread Volker Kuhlmann
On Fri 02 Sep 2016 10:14:59 NZST +1200, Todd Russell wrote:

> I will just run level 2 SpinRite on the SSD to force the drive to read
> every spot, which should trigger the error correction if that is happening.

Ehh, you use what for that? Toss spinrite into the bit bucket as
suggested. Log into your pfsense (or any unix!), obtain root
priviledges, and run
  dd bs=16k if=/dev/yourdisk of=/dev/null

Use what you have!! Why install extra cr^H^H^Hstuff? dd *always* works
as exected. Change buffer size as you see fit, and add an option to
prevent block buffering (if supported by bsd and if it works like
linux).

> plenty experience with that scourge.  :/  I did use the diagnostics in the
> web gui to check the SMART info and it didn't say anything out of the
> ordinary, but I have seen at least 2 Samsung SSDs over the years lose data
> with no warning and no errors in SMART.

The SMART info is effectively a status collected over time. Sectors going
bad without detectable warning by necessitiy don't give SMART a chance.
Ditto disks that fail suddenly and catastrophically. SMART is not a
fix-all, but is is very very usful in many cases.

Volker

-- 
Volker Kuhlmann is list0570 with the domain in header.
http://volker.top.geek.nz/  Please do not CC list postings to me.
___
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold


Re: [pfSense] Lightning strike

2016-10-16 Thread Volker Kuhlmann
On Fri 14 Oct 2016 16:41:22 NZDT +1300, Jim Thompson wrote:

> > Does a disappearing reX driver interface renumber the ueX interfaces?
> 
> On FreeBSD?  no.  On a linux system?  LIkely.

I am unsure whether that is still so for Linux, there seem to have been
changes there but I haven't looked at it as it's been inconsequential to
me. But pfsense runs on freebsd so linux behaviour has no relevance
here.

> Let's say you had one re(4) and two em(4) devices.   Let's assume for now
> you have:
> 
> em0: WAN
> em1: LAN
> re0:  OPT1
> 
> Case 0:
> em0 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em1: LAN
> re0: OPT1
> What should pfSense do in this instance?

Run! No change of interface assignments to ports. Ignore missing
interfaces. The way you are presenting this anyway.

> Case 1:
> em1 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em0: WAN
> re0: OPT1
> What should pfSense do in this instance?

Run with re0:OPT1 only. Ignore missing interfaces.

> Case 2:
> re0 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em0: WAN
> em1: LAN
> What should pfSense do in this instance?

Run. No change of interface assignments to ports. Ignore missing
interfaces.

> Case 3:
> pfSense is operating in a dual-WAN mode
> em0: WAN0
> em1: WAN1
> re0:  LAN
> 
> em0 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em1: WAN1
> re0:  LAN
> What should pfSense do in this instance?

Run with re0:LAN only. Ignore missing interfaces.

> Case 4:
> pfSense is operating in a dual-WAN mode
> em0: WAN0
> em1: WAN1
> re0:  LAN
> 
> em1 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em0: WAN0
> re0:  LAN
> What should pfSense do in this instance?

Run with re0:LAN only. Ignore missing interfaces.

> Case 5:
> pfSense is operating in a dual-WAN mode
> em0: WAN0
> em1: WAN1
> re0:  LAN
> 
> re0 gets fried in such a way that it doesn't enumerate on the bus.  We are
> left with:
> em0: WAN0
> em1: WAN1

Run with em0: WAN0, em1: WAN1 only. Ignore missing interfaces.

> Now let's say you have a 2440, with 4 igb(4) interfaces
> 
> igb0: WAN0
> igb1: WAN1
> igb2: LAN
> igb3: OPT1

All interfaces are igbX. No interfaces left that don't get shuffled
around. Stop.

All your remaining cases are the same.

> Now, having described the desired behavior for pfSense in each case,
> generalize an algorithm for up to 8 interfaces of
> the same device type, 8 different device types, or a mix of device types, that
> behaves correctly in each case.
> 
> Pseudo-code will do for now.

I had already given it in my previous email. It doesn't give improvement
in all cases, but in those which are safe. You'll need to store
user-chosen mappings of interfaces to ports. That's already done.

The current situation sucks. A user of a router appliance is not
primarily interested in as to why it sucks.

But Espen Johansen gave the solution: Don't touch primary OS-port names
or their braindead implementation. Create aliases based on MAC address.
Access port exclusively through alias name. Fix pfsense(!!) to keep
rules assigned to no interface accessible from the BUI, so the user can
manually re-assign them in bulk, instead of enforcing a click-me-stupid
orgy or XML file hacking. Aliases to emX, reX, igbX etc names are a
matter of today's intelligence in OS implementation. No more excuses for
decades old decisions. :-)

Volker

-- 
Volker Kuhlmann is list0570 with the domain in header.
http://volker.top.geek.nz/  Please do not CC list postings to me.
___
pfSense mailing list
https://lists.pfsense.org/mailman/listinfo/list
Support the project with Gold! https://pfsense.org/gold