Re: kernel names

2017-12-13 Thread Chris H

On Thu, 14 Dec 2017 00:51:54 -0500 "Allan Jude"  said


On 12/14/2017 00:47, blubee blubeeme wrote:
> When you boot into FreeBSD and you can select kernels, there's only 2
> options:
> default and kernel.old
> 
> Is there a way to have better output and support multiple kernels without

> having to login to the system and running uname -v or something like that?
> 
> Would it be possible to add options for more kernels from that boot menu?

> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 


The list is controlled by the /boot/loader.conf variable kernels=
which defaults to "kernel kernel.old"

I have a patch almost ready to land that will search all subdirectories
of /boot for a file named 'kernel' and add the names of those
directories to the list, such that the list will basically be autogenerated.

It currently contains too much copy/pasted code, and I just need to
clean it up a bit: https://reviews.freebsd.org/D11886

It was originally designed as part of my contributions towards packaged
base, where pkg will keep the last N (default to 5 I think) kernel
packages you have installed around, incase an upgrade goes bad.

This feature will work on any filesystem supported by the loader.


Outstanding, Allan! Well done. I eagerly await it's arrival.
I've often thought about whipping something like this up. But something
else always seemed to get in the way.

Thanks, Allan.

--Chris

--
Allan Jude



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel names

2017-12-13 Thread Chris H

On Thu, 14 Dec 2017 13:47:13 +0800 "blubee blubeeme"  said


When you boot into FreeBSD and you can select kernels, there's only 2
options:
default and kernel.old

Is there a way to have better output and support multiple kernels without
having to login to the system and running uname -v or something like that?

Would it be possible to add options for more kernels from that boot menu?

There sure is! How's your forth?
Honestly, it's an extremely simple, yet powerful language. All the old
Macintoshes used it (think BIOS). You could then simply add as many
kernel entries as you felt you needed.
OTOH You could simply break to the boot loader, and pick any kernel you
wanted. :-)
All of this also assumes you're manually making copies of the kernel(s)
your interested in saving. As a default install kernel only backs up
the previous one -- but you already knew that. :-)

HTH

--Chris


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel names

2017-12-13 Thread blubee blubeeme
On Thu, Dec 14, 2017 at 1:51 PM, Allan Jude  wrote:

> On 12/14/2017 00:47, blubee blubeeme wrote:
> > When you boot into FreeBSD and you can select kernels, there's only 2
> > options:
> > default and kernel.old
> >
> > Is there a way to have better output and support multiple kernels without
> > having to login to the system and running uname -v or something like
> that?
> >
> > Would it be possible to add options for more kernels from that boot menu?
> > ___
> > freebsd-current@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe@
> freebsd.org"
> >
>
> The list is controlled by the /boot/loader.conf variable kernels=
> which defaults to "kernel kernel.old"
>
> I have a patch almost ready to land that will search all subdirectories
> of /boot for a file named 'kernel' and add the names of those
> directories to the list, such that the list will basically be
> autogenerated.
>
> It currently contains too much copy/pasted code, and I just need to
> clean it up a bit: https://reviews.freebsd.org/D11886
>
> It was originally designed as part of my contributions towards packaged
> base, where pkg will keep the last N (default to 5 I think) kernel
> packages you have installed around, incase an upgrade goes bad.
>
> This feature will work on any filesystem supported by the loader.
>
> --
> Allan Jude
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>

Allen, thanks for the great work. I'll test it out but I can't wait to have
it merged in.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: get_swap_pager(x) failed

2017-12-13 Thread Mark Millard
On 2017-Dec-13, at 9:43 PM, Peter Jeremy  wrote:

> On 2017-Dec-13 11:23:46 +, Gary Palmer  wrote:
>> An open question would be why ARC is not reducing if the system is
>> under memory pressure.  It's meant to, but there have been various
>> bugs in that implementation.
> 
> The OP doesn't say what version of -current he is running but I would
> point the finger at r325851.  I have discovered that, in 11-stable,
> r326619 (which is the MFC of r325851) stops ARC responding to memory
> backpressure.

See also https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224080
(bugzialla 224080 ) about problems with zfs from head -r326347 .

bugzilla 224330 is another -r326347 problem report (based on
some list reports) but that are for problems in some non-zfs
contexts.

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: kernel names

2017-12-13 Thread Allan Jude
On 12/14/2017 00:47, blubee blubeeme wrote:
> When you boot into FreeBSD and you can select kernels, there's only 2
> options:
> default and kernel.old
> 
> Is there a way to have better output and support multiple kernels without
> having to login to the system and running uname -v or something like that?
> 
> Would it be possible to add options for more kernels from that boot menu?
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> 

The list is controlled by the /boot/loader.conf variable kernels=
which defaults to "kernel kernel.old"

I have a patch almost ready to land that will search all subdirectories
of /boot for a file named 'kernel' and add the names of those
directories to the list, such that the list will basically be autogenerated.

It currently contains too much copy/pasted code, and I just need to
clean it up a bit: https://reviews.freebsd.org/D11886

It was originally designed as part of my contributions towards packaged
base, where pkg will keep the last N (default to 5 I think) kernel
packages you have installed around, incase an upgrade goes bad.

This feature will work on any filesystem supported by the loader.

-- 
Allan Jude
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: get_swap_pager(x) failed

2017-12-13 Thread blubee blubeeme
On Thu, Dec 14, 2017 at 1:43 PM, Peter Jeremy  wrote:

> On 2017-Dec-13 11:23:46 +, Gary Palmer  wrote:
> >An open question would be why ARC is not reducing if the system is
> >under memory pressure.  It's meant to, but there have been various
> >bugs in that implementation.
>
> The OP doesn't say what version of -current he is running but I would
> point the finger at r325851.  I have discovered that, in 11-stable,
> r326619 (which is the MFC of r325851) stops ARC responding to memory
> backpressure.
>
> --
> Peter Jeremy
>

I just did a new install world and kernel so I'm but this is uname -v
FreeBSD 12.0-CURRENT #0 r326839: Thu Dec 14 13:34:47 CST 2017
 root@blubee:/usr/obj/usr/src/amd64.amd64/sys/OSS_GENERIC

The previous was with GENERIC kernel and maybe ports & src from two weeks
ago.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


kernel names

2017-12-13 Thread blubee blubeeme
When you boot into FreeBSD and you can select kernels, there's only 2
options:
default and kernel.old

Is there a way to have better output and support multiple kernels without
having to login to the system and running uname -v or something like that?

Would it be possible to add options for more kernels from that boot menu?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: get_swap_pager(x) failed

2017-12-13 Thread Peter Jeremy
On 2017-Dec-13 11:23:46 +, Gary Palmer  wrote:
>An open question would be why ARC is not reducing if the system is
>under memory pressure.  It's meant to, but there have been various
>bugs in that implementation.

The OP doesn't say what version of -current he is running but I would
point the finger at r325851.  I have discovered that, in 11-stable,
r326619 (which is the MFC of r325851) stops ARC responding to memory
backpressure.

-- 
Peter Jeremy


signature.asc
Description: PGP signature


Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

2017-12-13 Thread Daniel Kalchev


> On 13 Dec 2017, at 21:39, O. Hartmann  wrote:
> 
> Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
> "Rodney W. Grimes"  schrieb:
> 
>>> On Tue, 12 Dec 2017 14:58:28 -0800
>>> Cy Schubert  wrote:
>>> 
 There are a couple of ways you can address this. You'll need to
 offline the vdev first. If you've done a smartcrl -t long and if the
 test failed, smartcrl -a will tell you which block it had an issue
 with. You can use dd, ddrescue or dd_rescue to dd the block over
 itself. The drive may rewrite the (weak) block or if it fails to it
 will remap it (subsequently showing as reallocated).
 
 Of course there is a risk. If the sector is any of the boot blocks
 there is a good chance the server will hang.  
>>> 
>>> The drive is part of a dedicated storage-only pool. The boot drive is a
>>> fast SSD. So I do not care about this - well, to say it more politely:
>>> I do not have to take care of that aspect.
>>> 
 
 You have to be *absolutely* sure which the bad sector is. And, there
 may be more. There is a risk of data loss.
 
 I've used this technique many times. Most times it works perfectly.
 Other times the affected file is lost but the rest of the file system
 is recovered. And again there is always the risk.
 
 Replace the disk immediately if you experience a growing succession
 of pending sectors. Otherwise replace the disk at your earliest
 convenience.  
>>> 
>>> The ZFS scrubbing of the volume ended this morning, leaving the pool in
>>> a healthy state. After reboot, there was no sign of CAM errors again.
>>> 
>>> But there is something else I'm worried about. The mainboard I use is a 
>>> 
>>> ASRock Z77 Pro4-M.
>>> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
>>> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
>>> 6GB ports:
>>> 
>>> [...]
>>> ahci0@pci0:2:0:0:   class=0x010601 card=0x06121849 chip=0x06121b21
>>> rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
>>>device = 'ASM1062 Serial ATA Controller'
>>>class  = mass storage
>>>subclass   = SATA
>>>bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
>>>bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
>>>bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
>>>bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
>>>bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
>>>bar   [24] = type Memory, range 32, base 0xf7b0, size 512,
>>>enabled
>>> [...]
>>> 
>>> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
>>> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
>>> and it is online, I experience problems on the ZFS pool, which is
>>> attached to the MCP SATA ports.  
>> 
>> How does this external drive get its power?  Are the earth grounds of
>> both the system and the external drive power supply closely tied
>> togeather?  A plug/unplug event with a slight ground creep can
>> wreck havioc with device operation.
> 
> The external drive is housed in a external casing. Its PSU is de facto with 
> the same
> "grounding" (earth ground) as the server's PSU, they share the same power 
> plug at its
> point were the plug is comeing out of the wall - so to speak.

Most external drive power supplies are not grounded. At least none I ever saw 
had grounded plugs for the mains cable. Might be, yours has it...

Worth checking anyway.

Daniel


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

2017-12-13 Thread O. Hartmann
Am Wed, 13 Dec 2017 08:47:53 -0800 (PST)
"Rodney W. Grimes"  schrieb:

> > On Tue, 12 Dec 2017 14:58:28 -0800
> > Cy Schubert  wrote:
> >   
> > > There are a couple of ways you can address this. You'll need to
> > > offline the vdev first. If you've done a smartcrl -t long and if the
> > > test failed, smartcrl -a will tell you which block it had an issue
> > > with. You can use dd, ddrescue or dd_rescue to dd the block over
> > > itself. The drive may rewrite the (weak) block or if it fails to it
> > > will remap it (subsequently showing as reallocated).
> > > 
> > > Of course there is a risk. If the sector is any of the boot blocks
> > > there is a good chance the server will hang.  
> > 
> > The drive is part of a dedicated storage-only pool. The boot drive is a
> > fast SSD. So I do not care about this - well, to say it more politely:
> > I do not have to take care of that aspect.
> >   
> > > 
> > > You have to be *absolutely* sure which the bad sector is. And, there
> > > may be more. There is a risk of data loss.
> > > 
> > > I've used this technique many times. Most times it works perfectly.
> > > Other times the affected file is lost but the rest of the file system
> > > is recovered. And again there is always the risk.
> > > 
> > > Replace the disk immediately if you experience a growing succession
> > > of pending sectors. Otherwise replace the disk at your earliest
> > > convenience.  
> > 
> > The ZFS scrubbing of the volume ended this morning, leaving the pool in
> > a healthy state. After reboot, there was no sign of CAM errors again.
> > 
> > But there is something else I'm worried about. The mainboard I use is a 
> > 
> > ASRock Z77 Pro4-M.
> > The board has a cripple Intel MCP with 6 SATA ports from the chipset,
> > two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
> > 6GB ports:
> > 
> > [...]
> > ahci0@pci0:2:0:0:   class=0x010601 card=0x06121849 chip=0x06121b21
> > rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
> > device = 'ASM1062 Serial ATA Controller'
> > class  = mass storage
> > subclass   = SATA
> > bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
> > bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
> > bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
> > bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
> > bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
> > bar   [24] = type Memory, range 32, base 0xf7b0, size 512,
> > enabled
> > [...]
> > 
> > Attached to that ASM1062 SATA chip, is a backup drive via eSATA
> > connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
> > and it is online, I experience problems on the ZFS pool, which is
> > attached to the MCP SATA ports.  
> 
> How does this external drive get its power?  Are the earth grounds of
> both the system and the external drive power supply closely tied
> togeather?  A plug/unplug event with a slight ground creep can
> wreck havioc with device operation.

The external drive is housed in a external casing. Its PSU is de facto with the 
same
"grounding" (earth ground) as the server's PSU, they share the same power plug 
at its
point were the plug is comeing out of the wall - so to speak.

> 
> > Is this possible? I mean, as I asked before, a weird/defect cabling
> > would trigger different error schemes (CRC errors). Due to the fact
> > that the external drive is physically decoupled and is not capable of
> > coupling in vibrations, bad sector errors seem to me unlikely. But this
> > is simply a though of someone without special knowledge about physics
> > of HDDs.  
> 
> Even if left cabled, does this drive get powered up/down?  

The drive is cabled (eSATA) all the time, but is switched off for long times (4 
- 8 weeks
or 2 months, it depends, I switch it on for scrubbing or performing backups of 
important
data).

> 
> > I think people responding to my thread made it clear that the WD Green
> > isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
> > the fact, that they have serviced now more than 25000 hours, it would
> > be wise to replace them with alternatives.   
> 
> I think someone had an apm command that turns off the head park,
> that would do wonders for drive life.   On the other hand, I think
> if it was my data and I saw that the drive had 2M head load cycles
> I would be looking to get out of that driv with any data I could
> not easily replace.  If it was well backed up or easily replaced
> my worries would be less.
> 
> ... 275 lines removes ...

I'm prepared already, as stated, to change the drive(s), one by one. 

Hopefully, ZFS is as reliable to me as it has been reliable for others ;-)

Kind regards,

Oliver


-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).


pgpTqBmYqvo8i.pgp
Descript

Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

2017-12-13 Thread Rodney W. Grimes
> On Tue, 12 Dec 2017 14:58:28 -0800
> Cy Schubert  wrote:
> 
> > There are a couple of ways you can address this. You'll need to
> > offline the vdev first. If you've done a smartcrl -t long and if the
> > test failed, smartcrl -a will tell you which block it had an issue
> > with. You can use dd, ddrescue or dd_rescue to dd the block over
> > itself. The drive may rewrite the (weak) block or if it fails to it
> > will remap it (subsequently showing as reallocated).
> > 
> > Of course there is a risk. If the sector is any of the boot blocks
> > there is a good chance the server will hang.
> 
> The drive is part of a dedicated storage-only pool. The boot drive is a
> fast SSD. So I do not care about this - well, to say it more politely:
> I do not have to take care of that aspect.
> 
> > 
> > You have to be *absolutely* sure which the bad sector is. And, there
> > may be more. There is a risk of data loss.
> > 
> > I've used this technique many times. Most times it works perfectly.
> > Other times the affected file is lost but the rest of the file system
> > is recovered. And again there is always the risk.
> > 
> > Replace the disk immediately if you experience a growing succession
> > of pending sectors. Otherwise replace the disk at your earliest
> > convenience.
> 
> The ZFS scrubbing of the volume ended this morning, leaving the pool in
> a healthy state. After reboot, there was no sign of CAM errors again.
> 
> But there is something else I'm worried about. The mainboard I use is a 
> 
> ASRock Z77 Pro4-M.
> The board has a cripple Intel MCP with 6 SATA ports from the chipset,
> two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
> 6GB ports:
> 
> [...]
> ahci0@pci0:2:0:0:   class=0x010601 card=0x06121849 chip=0x06121b21
> rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
> device = 'ASM1062 Serial ATA Controller'
> class  = mass storage
> subclass   = SATA
> bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
> bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
> bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
> bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
> bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
> bar   [24] = type Memory, range 32, base 0xf7b0, size 512,
> enabled
> [...]
> 
> Attached to that ASM1062 SATA chip, is a backup drive via eSATA
> connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
> and it is online, I experience problems on the ZFS pool, which is
> attached to the MCP SATA ports.

How does this external drive get its power?  Are the earth grounds of
both the system and the external drive power supply closely tied
togeather?  A plug/unplug event with a slight ground creep can
wreck havioc with device operation.

> Is this possible? I mean, as I asked before, a weird/defect cabling
> would trigger different error schemes (CRC errors). Due to the fact
> that the external drive is physically decoupled and is not capable of
> coupling in vibrations, bad sector errors seem to me unlikely. But this
> is simply a though of someone without special knowledge about physics
> of HDDs.

Even if left cabled, does this drive get powered up/down?  

> I think people responding to my thread made it clear that the WD Green
> isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
> the fact, that they have serviced now more than 25000 hours, it would
> be wise to replace them with alternatives. 

I think someone had an apm command that turns off the head park,
that would do wonders for drive life.   On the other hand, I think
if it was my data and I saw that the drive had 2M head load cycles
I would be looking to get out of that driv with any data I could
not easily replace.  If it was well backed up or easily replaced
my worries would be less.

... 275 lines removes ...
-- 
Rod Grimes rgri...@freebsd.org
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error

2017-12-13 Thread Hartmann, O.
On Tue, 12 Dec 2017 14:58:28 -0800
Cy Schubert  wrote:

> There are a couple of ways you can address this. You'll need to
> offline the vdev first. If you've done a smartcrl -t long and if the
> test failed, smartcrl -a will tell you which block it had an issue
> with. You can use dd, ddrescue or dd_rescue to dd the block over
> itself. The drive may rewrite the (weak) block or if it fails to it
> will remap it (subsequently showing as reallocated).
> 
> Of course there is a risk. If the sector is any of the boot blocks
> there is a good chance the server will hang.

The drive is part of a dedicated storage-only pool. The boot drive is a
fast SSD. So I do not care about this - well, to say it more politely:
I do not have to take care of that aspect.

> 
> You have to be *absolutely* sure which the bad sector is. And, there
> may be more. There is a risk of data loss.
> 
> I've used this technique many times. Most times it works perfectly.
> Other times the affected file is lost but the rest of the file system
> is recovered. And again there is always the risk.
> 
> Replace the disk immediately if you experience a growing succession
> of pending sectors. Otherwise replace the disk at your earliest
> convenience.

The ZFS scrubbing of the volume ended this morning, leaving the pool in
a healthy state. After reboot, there was no sign of CAM errors again.

But there is something else I'm worried about. The mainboard I use is a 

ASRock Z77 Pro4-M.
The board has a cripple Intel MCP with 6 SATA ports from the chipset,
two of them SATA 6GB, 4 SATA II, and one additional chip with two SATA
6GB ports:

[...]
ahci0@pci0:2:0:0:   class=0x010601 card=0x06121849 chip=0x06121b21
rev=0x01 hdr=0x00 vendor = 'ASMedia Technology Inc.'
device = 'ASM1062 Serial ATA Controller'
class  = mass storage
subclass   = SATA
bar   [10] = type I/O Port, range 32, base 0xe050, size 8, enabled
bar   [14] = type I/O Port, range 32, base 0xe040, size 4, enabled
bar   [18] = type I/O Port, range 32, base 0xe030, size 8, enabled
bar   [1c] = type I/O Port, range 32, base 0xe020, size 4, enabled
bar   [20] = type I/O Port, range 32, base 0xe000, size 32, enabled
bar   [24] = type Memory, range 32, base 0xf7b0, size 512,
enabled
[...]

Attached to that ASM1062 SATA chip, is a backup drive via eSATA
connector, a WD 4 TB RED drive. It seems, whenever I attach this drive
and it is online, I experience problems on the ZFS pool, which is
attached to the MCP SATA ports.

Is this possible? I mean, as I asked before, a weird/defect cabling
would trigger different error schemes (CRC errors). Due to the fact
that the external drive is physically decoupled and is not capable of
coupling in vibrations, bad sector errors seem to me unlikely. But this
is simply a though of someone without special knowledge about physics
of HDDs.

I think people responding to my thread made it clear that the WD Green
isn't the first-choice-solution for a 20/6 (not 24/7) duty drive and
the fact, that they have serviced now more than 25000 hours, it would
be wise to replace them with alternatives. 

> 
> If using a zfs mirror (not in your case) detatch and attach will
> rewrite any weakly written sectors and reallocate pending sectors.
> 
> ---
> Sent using a tiny phone keyboard.
> Apologies for any typos and autocorrect.
> Also, this old phone only supports top post. Apologies.
> 
> Cy Schubert
>  or 
> The need of the many outweighs the greed of the few.
> ---
> 
> -Original Message-
> From: O. Hartmann
> Sent: 12/12/2017 14:19
> To: Rodney W. Grimes
> Cc: O. Hartmann; FreeBSD CURRENT; Freddie Cash; Alan Somers
> Subject: Re: SMART: disk problems on RAIDZ1 pool:
> (ada6:ahcich6:0:0:0): CAMstatus: ATA Status Error
> 
> Am Tue, 12 Dec 2017 10:52:27 -0800 (PST)
> "Rodney W. Grimes"  schrieb:
> 
> 
> Thank you for answering that fast!
> 
> > > Hello,
> > > 
> > > running CURRENT (recent r326769), I realised that smartmond sends
> > > out some console messages when booting the box:
> > > 
> > > [...]
> > > Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1
> > > Currently unreadable (pending) sectors Dec 12 14:14:33 <3.2> box1
> > > smartd[68426]: Device: /dev/ada6, 1 Offline uncorrectable sectors
> > > [...]
> > > 
> > > Checking the drive's SMART log with smartctl (it is one of four
> > > 3TB disk drives), I gather these informations:
> > > 
> > > [... smartctl -x /dev/ada6 ...]
> > > Error 42 [17] occurred at disk power-on lifetime: 25335 hours
> > > (1055 days + 15 hours) When the command that caused the error
> > > occurred, the device was active or idle.
> > > 
> > >   After command completion occurred, registers were:
> > >   ER -- ST COUNT  LBA_48  LH LM LL DV DC
> > >   -- -- -- == -- == == == -- -- -- -- --
> > >   40 -- 51 00 00 00 00 c2 7a 72 98 40 00  Error: UNC at LBA =
> > > 0xc27a7298 = 3262804632
> > > 
> > >   Commands leading to the command that caused the error were:
> > >   CR FEA

Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error

2017-12-13 Thread Rodney W. Grimes
> Am Tue, 12 Dec 2017 14:55:49 -0800 (PST)
> "Rodney W. Grimes"  schrieb:
> > > Am Tue, 12 Dec 2017 10:52:27 -0800 (PST)
> > > "Rodney W. Grimes"  schrieb:
> > > 
> > > Thank you for answering that fast!

Not so fast this time, had to sleep :)

> > > > > Hello,
> > > > > 
> > > > > running CURRENT (recent r326769), I realised that smartmond sends out 
> > > > > some console
> > > > > messages when booting the box:
> > > > > 
> > > > > [...]
> > > > > Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 
> > > > > Currently
> > > > > unreadable (pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]:
> > > > > Device: /dev/ada6, 1 Offline uncorrectable sectors
> > > > > [...]
> > > > > 
> > > > > Checking the drive's SMART log with smartctl (it is one of four 3TB 
> > > > > disk drives),
> > > > > I gather these informations:
> > > > > 
> > > > > [... smartctl -x /dev/ada6 ...]
> > > > > Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 
> > > > > days + 15
> > > > > hours) When the command that caused the error occurred, the device 
> > > > > was active or
> > > > > idle.
> > > > > 
> > > > >   After command completion occurred, registers were:
> > > > >   ER -- ST COUNT  LBA_48  LH LM LL DV DC
> > > > >   -- -- -- == -- == == == -- -- -- -- --
> > > > >   40 -- 51 00 00 00 00 c2 7a 72 98 40 00  Error: UNC at LBA = 
> > > > > 0xc27a7298 =
> > > > > 3262804632
> > > > > 
> > > > >   Commands leading to the command that caused the error were:
> > > > >   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  
> > > > > Command/Feature_Name
> > > > >   -- == -- == -- == == == -- -- -- -- --  ---  
> > > > > 
> > > > >   60 00 b0 00 88 00 00 c2 7a 73 20 40 08 23:38:12.195  READ FPDMA 
> > > > > QUEUED
> > > > >   60 00 b0 00 80 00 00 c2 7a 72 70 40 08 23:38:12.195  READ FPDMA 
> > > > > QUEUED
> > > > >   2f 00 00 00 01 00 00 00 00 00 10 40 08 23:38:12.195  READ LOG 
> > > > > EXT
> > > > >   60 00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343  READ FPDMA 
> > > > > QUEUED
> > > > >   60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343  READ FPDMA 
> > > > > QUEUED
> > > > > [...]
> > > > > 
> > > > > and
> > > > > 
> > > > > [...]
> > > > > SMART Attributes Data Structure revision number: 16
> > > > > Vendor Specific SMART Attributes with Thresholds:
> > > > > ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
> > > > >   1 Raw_Read_Error_Rate POSR-K   200   200   051-64
> > > > >   3 Spin_Up_TimePOS--K   178   170   021-6075
> > > > >   4 Start_Stop_Count-O--CK   098   098   000-2406
> > > > >   5 Reallocated_Sector_Ct   PO--CK   200   200   140-0
> > > > >   7 Seek_Error_Rate -OSR-K   200   200   000-0
> > > > >   9 Power_On_Hours  -O--CK   066   066   000-25339
> > > > >  10 Spin_Retry_Count-O--CK   100   100   000-0
> > > > >  11 Calibration_Retry_Count -O--CK   100   100   000-0
> > > > >  12 Power_Cycle_Count   -O--CK   098   098   000-2404
> > > > > 192 Power-Off_Retract_Count -O--CK   200   200   000-154
> > > > > 193 Load_Cycle_Count-O--CK   001   001   000-2055746
> > > > > 194 Temperature_Celsius -O---K   122   109   000-28
> > > > > 196 Reallocated_Event_Count -O--CK   200   200   000-0
> > > > > 197 Current_Pending_Sector  -O--CK   200   200   000-1
> > > > > 198 Offline_Uncorrectable   CK   200   200   000-1

Note here, we have a pending and we have an offline uncorrectable,
an offline uncorrectable needs to end up in the remap, that should
never end up cleared and back in the good blocks iirc, but then
again firmware gets changed so maybe it is possible to return
this to a good sector, either way it looks as if at this point
in time we infact may have 2 seperate blocks that are bad.

I have some long use heavily worn drives that have 10's of remapped
sectors and they are still running fine.  I would not use them for
mission critical or in a high heavy use situation, but they are good
for cold storage and other non critical use.  A total of 2 reallocates
I would not worry much about.  Unless I am seeing a growth rate.
Note that when these drives are shipped brand now for the first N
Power On Hours they are in a special mode that is very quick to simply
remap a "weak" sector.  Ie, any sector that gets requires some threshold
of M bits of error, the ECC already corrected the data but they vendor
has decided that these are weak sectors and it should just remap them.
Some firmware does not even call them Reallocated sectors, and adds
them to the manaufactures P list.

> > > > > 199 UDMA_CRC_Error_Count-O--CK   200   200   000-0
> > > > > 200 Multi_Zone_Error_Rate   ---R--   200   200   000-5
> > > > > ||_ K auto-keep
> > > > > |_

Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error

2017-12-13 Thread O. Hartmann
Am Tue, 12 Dec 2017 14:55:49 -0800 (PST)
"Rodney W. Grimes"  schrieb:

> > Am Tue, 12 Dec 2017 10:52:27 -0800 (PST)
> > "Rodney W. Grimes"  schrieb:
> > 
> > 
> > Thank you for answering that fast!
> >   
> > > > Hello,
> > > > 
> > > > running CURRENT (recent r326769), I realised that smartmond sends out 
> > > > some console
> > > > messages when booting the box:
> > > > 
> > > > [...]
> > > > Dec 12 14:14:33 <3.2> box1 smartd[68426]: Device: /dev/ada6, 1 Currently
> > > > unreadable (pending) sectors Dec 12 14:14:33 <3.2> box1 smartd[68426]:
> > > > Device: /dev/ada6, 1 Offline uncorrectable sectors
> > > > [...]
> > > > 
> > > > Checking the drive's SMART log with smartctl (it is one of four 3TB 
> > > > disk drives),
> > > > I gather these informations:
> > > > 
> > > > [... smartctl -x /dev/ada6 ...]
> > > > Error 42 [17] occurred at disk power-on lifetime: 25335 hours (1055 
> > > > days + 15
> > > > hours) When the command that caused the error occurred, the device was 
> > > > active or
> > > > idle.
> > > > 
> > > >   After command completion occurred, registers were:
> > > >   ER -- ST COUNT  LBA_48  LH LM LL DV DC
> > > >   -- -- -- == -- == == == -- -- -- -- --
> > > >   40 -- 51 00 00 00 00 c2 7a 72 98 40 00  Error: UNC at LBA = 
> > > > 0xc27a7298 =
> > > > 3262804632
> > > > 
> > > >   Commands leading to the command that caused the error were:
> > > >   CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  
> > > > Command/Feature_Name
> > > >   -- == -- == -- == == == -- -- -- -- --  ---  
> > > > 
> > > >   60 00 b0 00 88 00 00 c2 7a 73 20 40 08 23:38:12.195  READ FPDMA 
> > > > QUEUED
> > > >   60 00 b0 00 80 00 00 c2 7a 72 70 40 08 23:38:12.195  READ FPDMA 
> > > > QUEUED
> > > >   2f 00 00 00 01 00 00 00 00 00 10 40 08 23:38:12.195  READ LOG EXT
> > > >   60 00 b0 00 70 00 00 c2 7a 73 20 40 08 23:38:09.343  READ FPDMA 
> > > > QUEUED
> > > >   60 00 b0 00 68 00 00 c2 7a 72 70 40 08 23:38:09.343  READ FPDMA 
> > > > QUEUED
> > > > [...]
> > > > 
> > > > and
> > > > 
> > > > [...]
> > > > SMART Attributes Data Structure revision number: 16
> > > > Vendor Specific SMART Attributes with Thresholds:
> > > > ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
> > > >   1 Raw_Read_Error_Rate POSR-K   200   200   051-64
> > > >   3 Spin_Up_TimePOS--K   178   170   021-6075
> > > >   4 Start_Stop_Count-O--CK   098   098   000-2406
> > > >   5 Reallocated_Sector_Ct   PO--CK   200   200   140-0
> > > >   7 Seek_Error_Rate -OSR-K   200   200   000-0
> > > >   9 Power_On_Hours  -O--CK   066   066   000-25339
> > > >  10 Spin_Retry_Count-O--CK   100   100   000-0
> > > >  11 Calibration_Retry_Count -O--CK   100   100   000-0
> > > >  12 Power_Cycle_Count   -O--CK   098   098   000-2404
> > > > 192 Power-Off_Retract_Count -O--CK   200   200   000-154
> > > > 193 Load_Cycle_Count-O--CK   001   001   000-2055746
> > > > 194 Temperature_Celsius -O---K   122   109   000-28
> > > > 196 Reallocated_Event_Count -O--CK   200   200   000-0
> > > > 197 Current_Pending_Sector  -O--CK   200   200   000-1
> > > > 198 Offline_Uncorrectable   CK   200   200   000-1
> > > > 199 UDMA_CRC_Error_Count-O--CK   200   200   000-0
> > > > 200 Multi_Zone_Error_Rate   ---R--   200   200   000-5
> > > > ||_ K auto-keep
> > > > |__ C event count
> > > > ___ R error rate
> > > > ||| S speed/performance
> > > > ||_ O updated online
> > > > |__ P prefailure warning
> > > > 
> > > > [...]
> > > 
> > > The data up to this point informs us that you have 1 bad sector
> > > on a 3TB drive, that is actually an expected event given the data
> > > error rate on this stuff is such that your gona have these now
> > > and again.
> > > 
> > > Given you have 1 single event I would not suspect that this drive
> > > is dying, but it would be prudent to prepare for that possibility.  
> > 
> > Hello.
> > 
> > Well, I copied simply "one single event" that has been logged so far.
> > 
> > As you (and I) can see, it is error #42. After I posted here, a reboot has 
> > taken place
> > because the "repair" process on the Pool suddenly increased time and now 
> > I'm with
> > error #47, but interestingly, it is a new block that is damaged, but the 
> > SMART
> > attribute fields show this for now:  
> 
> Can you send the complete output of smartctl -a /dev/foo, I somehow missed
> that 40+ other errors had occured.


Yes, here it is, but please do not beat me due to its size ;-). It is "smartctl 
-x", that
shows me the errors. See file attached named "smart_ada.txt". It is everythin

Re: SMART: disk problems on RAIDZ1 pool: (ada6:ahcich6:0:0:0): CAM status: ATA Status Error

2017-12-13 Thread Daniel Kalchev


> On 13 Dec 2017, at 1:26, Freddie Cash  wrote:
> 
> On Tue, Dec 12, 2017 at 2:55 PM, Rodney W. Grimes <
> freebsd-...@pdx.rh.cn85.dnsmgr.net> wrote:
> 
>> Hum, just noticed this.  25k hours power on, 2M load cycles, this is
>> very hard on a hard drive.  Your drive is going into power save mode
>> and unloading the heads.  Infact at a rate of 81 times per hour?
>> Oh, I can not believe that.  Either way we need to get this stopped,
>> it shall wear your drives out.
>> 
> 
> ​Believe it.  :)  The WD Green drives have a head parking timeout of 15
> seconds, and no way to disable that anymore.  You used to be able to boot
> into DOS and run the tler.exe program from WD to disable the auto-parking
> feature, but they removed that ability fairly quickly.
> 
> The Green drives are meant to be used in systems that spend most of their
> time idle.  Trying to use them in an always-on RAID array is just asking
> for trouble.  They are only warrantied for a couple hundred thousand head
> parkings or something ridiculous like that.  2 million puts it way out of
> the warranty coverage.  :(
> 
> We had 24 of them in a ZFS pool back when they were first released as they
> were very inexpensive.  They lead to more downtime and replacement costs
> than any other drive we've used since (or even before).  Just don't use
> them in any kind of RAID array or always-on system.
> 

In order to handle drives like this and in general to get rid of load cycles, I 
use smartd on  all my ZFS pools with this piece of config:

DEVICESCAN -a -o off -e apm,off 

Might not be the best solution, but as it is activated during boot, S.M.A.R.T. 
attribute 193 Load_Cycle_Count does not increase anymore. Not fan of WD drives, 
but have few tens of them… all of them “behave” in some way or another.

For the original question, if I do not have spare disk to replace, on a 
raidz1/raidz2 pool I would typically do:

zpool offline poolname disk
dd if=/dev/zero of=/dev/disk bs=1m
zpool replace poolname disk

This effectively fills the disk with zeros, forcing any suspected unreadable 
blocks to be replaced. After this operation, no more pending blocks etc. But, 
on large drives/pools requires few days to complete (the last part). Over the 
years, I have used this procedure on many drives, sometimes more than once on 
the same drive and that posponed having to replace the drive and the annoying 
S.M.A.R.T. message: which by itself might not be major problem, but better not 
have the logs filled with warnings all the time.

I feel more confident doing this on raidz2 vdevs anyway..

If I had spare disk and spare port, just

zpool replace poolname disk

Daniel
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: get_swap_pager(x) failed

2017-12-13 Thread Gary Palmer
On Wed, Dec 13, 2017 at 10:34:02AM +0800, blubee blubeeme wrote:
> On Wed, Dec 13, 2017 at 5:53 AM, Mark Millard  wrote:
> 
> > blubee blubeeme gurenchan at gmail.com wrote on
> > Tue Dec 12 15:58:19 UTC 2017 :
> >
> > > On Tue, Dec 12, 2017 at 3:34 PM, blubee blubeeme  > > >wrote:
> > > > I am seeing tons of these messages while running tail -f
> > /var/log/messages
> > > > 
> > > > Dec 12 15:11:41 blubee kernel: swap_pager_getswapspace(25): failed
> > > . . .
> > > >  1159 blubee5  200   149M 56876K select  6   1:05   0.00%
> > > > ibus-engine-chewing
> > > >
> > > > ===
> > > >
> > > > What's with all the swap errors? I am running ZFS and I have 16GB of
> > ram,
> > > > how could I be having swap space errors?
> > > >
> > >
> > > Well I added 4GB of extra swap in /var/tmp/swap0
> > > then added that to my /etc/fstab: md99noneswap
> > > sw,file=/var/tmp/swap0,late 0   0
> > >
> > > and those errors went away.
> >
> > I recommend reviewing bugzilla 206048 (title in part
> > "swapfile usage hangs; swap partition works"):
> >
> > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206048
> >
> > before using file-system based swap spaces. They have
> > lots of problems with deadlocks. See especially comments
> > #7 and #8 quoting Konstantin Belousov. #8 is just a
> > reference to:
> >
> > https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/
> > kerneldebug-deadlocks.html
> >
> > Comment #3 shows a way to test for the problematical
> > behavior.
> >
> > Using swap partitions avoids the issue.
> >
> > ===
> > Mark Millard
> > markmi at dsl-only.net
> >
> >
> Thanks for the info, why would I be getting swap errors like that when I
> have 32GB of ram?
> sysctl hw.physmem
> hw.physmem: 34253692928
> 
> That really doesn't make any sense to me... Is it Chromium eating up 32GB+
> of ram?

Your top output from the first message in the thread gives a hint:

Mem: 3442M Active, 293M Inact, 4901M Laundry, 22G Wired, 1057M Free 
ARC: 18G Total, 1488M MFU, 15G MRU, 4003K Anon, 178M Header, 1171M Other
 16G Compressed, 23G Uncompressed, 1.41:1 Ratio 
Swap: 2048M Total, 2028M Used, 20M Free, 99% Inuse, 36K In  

You have 22GB RAM used by the kernel, and 18GB of that is used by ZFS.
As you can see from the last line, 99% of your small swap space is used
and paging was happening at the time of the snapshot ("36K In").

First I would suggest limiting ARC and seeing if that helps.  Unless you
are doing a lot of fileserver type work or working with lots of files
you want to keep around that ARC size is a bit big.  My desktop with
32GB RAM has ARC limited to 4GB.

Definitely do NOT swap to a file on a filesystem, especially if the
filesystem is ZFS.  That will eventually lead to a deadlock.

An open question would be why ARC is not reducing if the system is
under memory pressure.  It's meant to, but there have been various
bugs in that implementation.

Regards,

Gary
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"