Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger
On 4/9/2019 16:27, Zaphod Beeblebrox wrote:
> I have a "Ghetto" home RAID array.  It's built on compromises and makes use
> of RAID-Z2 to survive.  It consists of two plexes of 8x 4T units of
> "spinning rust".  It's been upgraded and upgraded.  It started as 8x 2T,
> then 8x 2T + 8x 4T then the current 16x 4T.  The first 8 disks are
> connected to motherboard SATA.  IIRC, there are 10.  Two ports are used for
> a mirror that it boots from.  There's also an SSD in there somhow, so it
> might be 12 ports on the motherboard.
>
> The other 8 disks started life in eSATA port multiplier boxes.  That was
> doubleplusungood, so I got a RAID card based on LSI pulled from a fujitsu
> server in Japan.  That's been upgraded a couple of times... not always a
> good experience.  One problem is that cheap or refurbished drives don't
> always "like" SAS controllers and FreeBSD.  YMMV.
>
> Anyways, this is all to introduce the fact that I've seen this behaviour
> multiple times. You have a drive that leaves the array for some amount of
> time, and after resilvering, a scrub will find a small amount of bad data.
> 32 k or 40k or somesuch.  In my cranial schema of things, I've chalked it
> up to out-of-order writing of the drives ... or other such behavior s.t.
> ZFS doesn't know exactly what has been written.  I've often wondered if the
> fix would be to add an amount of fuzz to the transaction range that is
> resilvered.
>
>
> On Tue, Apr 9, 2019 at 4:32 PM Karl Denninger  wrote:
>
>> On 4/9/2019 15:04, Andriy Gapon wrote:
>>> On 09/04/2019 22:01, Karl Denninger wrote:
 the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
 IN USE AREA was examined, compared, and blocks not on the "new member"
 or changed copied over.
>>> I think that that's not entirely correct.
>>> ZFS maintains something called DTL, a dirty-time log, for a missing /
>> offlined /
>>> removed device.  When the device re-appears and gets resilvered, ZFS
>> walks only
>>> those blocks that were born within the TXG range(s) when the device was
>> missing.
>>> In any case, I do not have an explanation for what you are seeing.
>> That implies something much more-serious could be wrong such as given
>> enough time -- a week, say -- that the DTL marker is incorrect and some
>> TXGs that were in fact changed since the OFFLINE are not walked through
>> and synchronized.  That would explain why it gets caught by a scrub --
>> the resilver is in fact not actually copying all the blocks that got
>> changed and so when you scrub the blocks are not identical.  Assuming
>> the detached disk is consistent that's not catastrophically bad IF
>> CAUGHT; where you'd get screwed HARD is in the situation where (for
>> example) you had a 2-unit mirror, detached one, re-attached it, resilver
>> says all is well, there is no scrub performed and then the
>> *non-detached* disk fails before there is a scrub.  In that case you
>> will have permanently destroyed or corrupted data since the other disk
>> is allegedly consistent but there are blocks *missing* that were never
>> copied over.
>>
>> Again this just showed up on 12.x; it definitely was *not* at issue in
>> 11.1 at all.  I never ran 11.2 in production for a material amount of
>> time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
>> to 12.x) so I don't know if it is in play on 11.2 or not.
>>
>> I'll see if it shows up again with 20.00.07.00 card firmware.
>>
>> Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
>> or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
>> a crap-ton of changes, offline the second and reattach the third (in
>> effect mirroring the "take one to the vault" thing) with a couple of
>> hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
>> bs=1m" sort of thing) "make me some new data that has to be resilvered"
>> workload.  I don't know if that's because I need more entropy in the
>> filesystem than I can reasonably generate this way (e.g. more
>> fragmentation of files, etc) or whether it's a time-based issue (e.g.
>> something's wrong with the DTL/TXG thing as you note above in terms of
>> how it functions and it only happens if the time elapsed causes
>> something to be subject to a rollover or similar problem.)
>>
>> I spent quite a lot of time trying to make reproduce the issue on my
>> "sandbox" machine and was unable -- and of note it is never a large
>> quantity of data that is impacted, it's usually only a couple of dozen
>> checksums that show as bad and fixed.  Of note it's also never just one;
>> if there was a single random hit on a data block due to ordinary bitrot
>> sort of issues I'd expect only one checksum to be bad.  But generating a
>> realistic synthetic workload over the amount of time involved on a
>> sandbox is not trivial at all; the system on which this is now happening
>> handles a lot of email and routine processing of various sorts including
>> a fair bit of 

RE: Mailx Question

2019-04-09 Thread Software Info
Fantastic. Works like a charm. Thank you very much.

Kind Regards
SI


Sent from Mail for Windows 10

From: Miroslav Lachman
Sent: Tuesday, April 9, 2019 4:40 PM
To: Software Info; freebsd-stable@freebsd.org
Subject: Re: Mailx Question

Software Info wrote on 2019/04/09 23:09:
> Hi All
> Since mailx is built into FreeBSD I decided to try asking this question here. 
> I have a text file with about 30 email addresses. The file will change every 
> day. I want an easy commandline way to read the file and blind copy send an 
> email to the addresses in the file. So far, I have this working with just a 
> plain send using the command below.
> mailx -s "Test Emails" -b `cat mylist.txt` < body.txt -r 
> "No-Reply"
> 
> Of course, when I use a plain send, everybody sees everybody’s email address 
> so I would love to be able to do a blind copy send. Would anyone be able to 
> assist me with this?

It may depend on your MTA (Sendmail, Postfix, Exim etc.)

"You must specify direct recipients with -s, -c, or -b."

   -b bcc-addr
Send blind carbon copies to bcc-addr list of users.  The bcc-addr
argument should be a comma-separated list of names.

You should replace newlines with comma:

cat mylist.txt | tr "\n" ","

Maybe something like this will work for you:

mail -s "Test E-mails" -b `cat mylist.txt | tr "\n" ","` 
my-gene...@example.com < body.txt

Miroslav Lachman

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Mailx Question

2019-04-09 Thread Miroslav Lachman

Software Info wrote on 2019/04/09 23:09:

Hi All
Since mailx is built into FreeBSD I decided to try asking this question here. I 
have a text file with about 30 email addresses. The file will change every day. 
I want an easy commandline way to read the file and blind copy send an email to 
the addresses in the file. So far, I have this working with just a plain send 
using the command below.
mailx -s "Test Emails" -b `cat mylist.txt` < body.txt -r 
"No-Reply"

Of course, when I use a plain send, everybody sees everybody’s email address so 
I would love to be able to do a blind copy send. Would anyone be able to assist 
me with this?


It may depend on your MTA (Sendmail, Postfix, Exim etc.)

"You must specify direct recipients with -s, -c, or -b."

  -b bcc-addr
   Send blind carbon copies to bcc-addr list of users.  The bcc-addr
   argument should be a comma-separated list of names.

You should replace newlines with comma:

cat mylist.txt | tr "\n" ","

Maybe something like this will work for you:

mail -s "Test E-mails" -b `cat mylist.txt | tr "\n" ","` 
my-gene...@example.com < body.txt


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Zaphod Beeblebrox
I have a "Ghetto" home RAID array.  It's built on compromises and makes use
of RAID-Z2 to survive.  It consists of two plexes of 8x 4T units of
"spinning rust".  It's been upgraded and upgraded.  It started as 8x 2T,
then 8x 2T + 8x 4T then the current 16x 4T.  The first 8 disks are
connected to motherboard SATA.  IIRC, there are 10.  Two ports are used for
a mirror that it boots from.  There's also an SSD in there somhow, so it
might be 12 ports on the motherboard.

The other 8 disks started life in eSATA port multiplier boxes.  That was
doubleplusungood, so I got a RAID card based on LSI pulled from a fujitsu
server in Japan.  That's been upgraded a couple of times... not always a
good experience.  One problem is that cheap or refurbished drives don't
always "like" SAS controllers and FreeBSD.  YMMV.

Anyways, this is all to introduce the fact that I've seen this behaviour
multiple times. You have a drive that leaves the array for some amount of
time, and after resilvering, a scrub will find a small amount of bad data.
32 k or 40k or somesuch.  In my cranial schema of things, I've chalked it
up to out-of-order writing of the drives ... or other such behavior s.t.
ZFS doesn't know exactly what has been written.  I've often wondered if the
fix would be to add an amount of fuzz to the transaction range that is
resilvered.


On Tue, Apr 9, 2019 at 4:32 PM Karl Denninger  wrote:

> On 4/9/2019 15:04, Andriy Gapon wrote:
> > On 09/04/2019 22:01, Karl Denninger wrote:
> >> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
> >> IN USE AREA was examined, compared, and blocks not on the "new member"
> >> or changed copied over.
> > I think that that's not entirely correct.
> > ZFS maintains something called DTL, a dirty-time log, for a missing /
> offlined /
> > removed device.  When the device re-appears and gets resilvered, ZFS
> walks only
> > those blocks that were born within the TXG range(s) when the device was
> missing.
> >
> > In any case, I do not have an explanation for what you are seeing.
>
> That implies something much more-serious could be wrong such as given
> enough time -- a week, say -- that the DTL marker is incorrect and some
> TXGs that were in fact changed since the OFFLINE are not walked through
> and synchronized.  That would explain why it gets caught by a scrub --
> the resilver is in fact not actually copying all the blocks that got
> changed and so when you scrub the blocks are not identical.  Assuming
> the detached disk is consistent that's not catastrophically bad IF
> CAUGHT; where you'd get screwed HARD is in the situation where (for
> example) you had a 2-unit mirror, detached one, re-attached it, resilver
> says all is well, there is no scrub performed and then the
> *non-detached* disk fails before there is a scrub.  In that case you
> will have permanently destroyed or corrupted data since the other disk
> is allegedly consistent but there are blocks *missing* that were never
> copied over.
>
> Again this just showed up on 12.x; it definitely was *not* at issue in
> 11.1 at all.  I never ran 11.2 in production for a material amount of
> time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
> to 12.x) so I don't know if it is in play on 11.2 or not.
>
> I'll see if it shows up again with 20.00.07.00 card firmware.
>
> Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
> or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
> a crap-ton of changes, offline the second and reattach the third (in
> effect mirroring the "take one to the vault" thing) with a couple of
> hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
> bs=1m" sort of thing) "make me some new data that has to be resilvered"
> workload.  I don't know if that's because I need more entropy in the
> filesystem than I can reasonably generate this way (e.g. more
> fragmentation of files, etc) or whether it's a time-based issue (e.g.
> something's wrong with the DTL/TXG thing as you note above in terms of
> how it functions and it only happens if the time elapsed causes
> something to be subject to a rollover or similar problem.)
>
> I spent quite a lot of time trying to make reproduce the issue on my
> "sandbox" machine and was unable -- and of note it is never a large
> quantity of data that is impacted, it's usually only a couple of dozen
> checksums that show as bad and fixed.  Of note it's also never just one;
> if there was a single random hit on a data block due to ordinary bitrot
> sort of issues I'd expect only one checksum to be bad.  But generating a
> realistic synthetic workload over the amount of time involved on a
> sandbox is not trivial at all; the system on which this is now happening
> handles a lot of email and routine processing of various sorts including
> a fair bit of database activity associated with network monitoring and
> statistical analysis.
>
> I'm assuming that using "offline" as a means to 

Mailx Question

2019-04-09 Thread Software Info
Hi All
Since mailx is built into FreeBSD I decided to try asking this question here. I 
have a text file with about 30 email addresses. The file will change every day. 
I want an easy commandline way to read the file and blind copy send an email to 
the addresses in the file. So far, I have this working with just a plain send 
using the command below.
mailx -s "Test Emails" -b `cat mylist.txt` < body.txt -r 
"No-Reply"

Of course, when I use a plain send, everybody sees everybody’s email address so 
I would love to be able to do a blind copy send. Would anyone be able to assist 
me with this?


Regards
SI


Sent from Mail for Windows 10

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger
On 4/9/2019 15:04, Andriy Gapon wrote:
> On 09/04/2019 22:01, Karl Denninger wrote:
>> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
>> IN USE AREA was examined, compared, and blocks not on the "new member"
>> or changed copied over.
> I think that that's not entirely correct.
> ZFS maintains something called DTL, a dirty-time log, for a missing / 
> offlined /
> removed device.  When the device re-appears and gets resilvered, ZFS walks 
> only
> those blocks that were born within the TXG range(s) when the device was 
> missing.
>
> In any case, I do not have an explanation for what you are seeing.

That implies something much more-serious could be wrong such as given
enough time -- a week, say -- that the DTL marker is incorrect and some
TXGs that were in fact changed since the OFFLINE are not walked through
and synchronized.  That would explain why it gets caught by a scrub --
the resilver is in fact not actually copying all the blocks that got
changed and so when you scrub the blocks are not identical.  Assuming
the detached disk is consistent that's not catastrophically bad IF
CAUGHT; where you'd get screwed HARD is in the situation where (for
example) you had a 2-unit mirror, detached one, re-attached it, resilver
says all is well, there is no scrub performed and then the
*non-detached* disk fails before there is a scrub.  In that case you
will have permanently destroyed or corrupted data since the other disk
is allegedly consistent but there are blocks *missing* that were never
copied over.

Again this just showed up on 12.x; it definitely was *not* at issue in
11.1 at all.  I never ran 11.2 in production for a material amount of
time (I went from 11.1 to 12.0 STABLE after the IPv6 fixes were posted
to 12.x) so I don't know if it is in play on 11.2 or not.

I'll see if it shows up again with 20.00.07.00 card firmware.

Of note I cannot reproduce this on my test box with EITHER 19.00.00.00
or 20.00.07.00 firmware when I set up a 3-unit mirror, offline one, make
a crap-ton of changes, offline the second and reattach the third (in
effect mirroring the "take one to the vault" thing) with a couple of
hours elapsed time and a synthetic (e.g. "dd if=/dev/random of=outfile
bs=1m" sort of thing) "make me some new data that has to be resilvered"
workload.  I don't know if that's because I need more entropy in the
filesystem than I can reasonably generate this way (e.g. more
fragmentation of files, etc) or whether it's a time-based issue (e.g.
something's wrong with the DTL/TXG thing as you note above in terms of
how it functions and it only happens if the time elapsed causes
something to be subject to a rollover or similar problem.) 

I spent quite a lot of time trying to make reproduce the issue on my
"sandbox" machine and was unable -- and of note it is never a large
quantity of data that is impacted, it's usually only a couple of dozen
checksums that show as bad and fixed.  Of note it's also never just one;
if there was a single random hit on a data block due to ordinary bitrot
sort of issues I'd expect only one checksum to be bad.  But generating a
realistic synthetic workload over the amount of time involved on a
sandbox is not trivial at all; the system on which this is now happening
handles a lot of email and routine processing of various sorts including
a fair bit of database activity associated with network monitoring and
statistical analysis.

I'm assuming that using "offline" as a means to do this hasn't become
"invalid" as something that's considered "ok" as a means of doing this
sort of thing it certainly has worked perfectly well for a very long
time!

-- 
Karl Denninger
k...@denninger.net 
/The Market Ticker/
/[S/MIME encrypted email preferred]/


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Andriy Gapon
On 09/04/2019 22:01, Karl Denninger wrote:
> the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
> IN USE AREA was examined, compared, and blocks not on the "new member"
> or changed copied over.

I think that that's not entirely correct.
ZFS maintains something called DTL, a dirty-time log, for a missing / offlined /
removed device.  When the device re-appears and gets resilvered, ZFS walks only
those blocks that were born within the TXG range(s) when the device was missing.

In any case, I do not have an explanation for what you are seeing.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)

2019-04-09 Thread Karl Denninger
I've run into something often -- and repeatably -- enough since updating
to 12-STABLE that I suspect there may be a code problem lurking in the
ZFS stack or in the driver and firmware compatibility with various HBAs
based on the LSI/Avago devices.

The scenario is this -- I have data sets that are RaidZ2 that are my
"normal" working set; one is comprised of SSD volumes and one of
spinning rust volumes.  These all are normal and scrubs never show
problems.  I've had physical failures with them over the years (although
none since moving to 12-STABLE as of yet) and have never had trouble
with resilvers or other misbehavior.

I also have a "backup" pool that is a 3-member mirror, to which the
volatile (that is, the zfs filesystems not set read-only) has zfs send's
done to.  Call them backup-i, backup-e1 and backup-e2.

All disks in these pools are geli-encrypted running on top of a
freebsd-zfs partition inside a GPT partition table using -s 4096 (4k)
geli "sectors".

Two of the backup mirror members are always in the machine; backup-i
(the base internal drive) is never removed.  The third is in a bank
vault.  Every week the vault drive is exchanged with the other, so that
the "first" member is never removed from the host, but the other two
(-e1 and -e2) alternate.  If the building burns I have a full copy of
all the volatile data in the vault.  (I also have mirrored copies, 2
each, of all the datasets that are operationally read-only in the vault
too; those get updated quarterly if there are changes to the
operationally read-only portion of the data store.)  The drive in the
vault is swapped weekly, so a problem should be detected almost
immediately before it can bugger me.

Before removing the disk intended to go to the vault I "offline" it,
spin it down (camcontrol standby) which issues a standby immediate to
the drive insuring that its cache is flushed and the spindle spun down
and then pull it.  I go exchange them at the bank, insert the other one,
and "zpool online" it, which automatically resilvers it.

The disk resilvers and all is well -- no errors.

Or is it all ok?

If I run a scrub on the pool as soon as the resilver completes the disk
I just inserted will /invariably /have a few checksum errors on it that
the scrub fixes.  It's not a large number, anywhere from a couple dozen
to a hundred or so, but it's not zero -- and it damn well should be as
the resilver JUST COMPLETED with no errors which means the ENTIRE DISK'S
IN USE AREA was examined, compared, and blocks not on the "new member"
or changed copied over.  The "-i" disk (the one that is never pulled)
NEVER is the one with the checksum errors on it -- it's ALWAYS the one I
just inserted and which was resilvered to.

If I zpool clear the errors and scrub again all is fine -- no errors. 
If I scrub again before pulling the disk the next time to do the swap
all is fine as well.  I swap the two, resilver, and I'll get a few more
errors on the next scrub, ALWAYS on the disk I just put in.

Smartctl shows NO errors on the disk.  No ECC, no reallocated sectors,
no interface errors, no resets, nothing.  Smartd is running and never
posts any real-time complaints, other than the expected one a minute or
two after I yank the drive to take it to the bank.  There are no
CAM-related errors printing on the console either.  So ZFS says there's
a *silent* data error (bad checksum; never a read or write error) in a
handful of blocks but the disk says there have been no errors, the
driver does not report any errors, there have been no power failures as
the disk was in a bank vault and thus it COULDN'T have had a write-back
cache corruption event or similar occur.

I never had trouble with this under 11.1 or before and have been using
this paradigm for something on the order of five years running on this
specific machine without incident.  Now I'm seeing it repeatedly and
*reliably* under 12.0-STABLE.  I swapped the first disk that did it,
thinking it was physically defective -- the replacement did it on the
next swap.  In fact I've yet to record a swap-out on 12-STABLE that
*hasn't* done this and yet it NEVER happened under 11.1.  At the same
time I can run scrubs until the cows come home on the multiple Raidz2
packs on the same controller and never get any checksum errors on any of
them.

The firmware in the card was 19.00.00.00 -- again, this firmware *has
been stable for years.* 

I have just rolled the firmware on the card forward to 20.00.07.00,
which is the "latest" available.  I had previously not moved to 20.x
because earlier versions had known issues (some severe and potentially
fatal to data integrity) and 19 had been working without problem -- I
thus had no reason to move to 20.00.07.00.

But there apparently are some fairly significant timing differences
between the driver code in 11.1 and 11.2/12.0, as I discovered when the
SAS expander I used to have in these boxes started returning timeout
errors that were false.  Again -- this same 

Re: em performs worse than igb (latency wise) in 12?

2019-04-09 Thread Nick Rogers
On Sat, Apr 6, 2019 at 10:24 PM Graham Menhennitt 
wrote:

> Not that it's at all relevant to the question here, but...
>
> It does mostly work without em in the 12 kernel - I'm not sure how, but
> it does.
>
> I upgraded to 12-stable via source but didn't add em to my custom
> kernel. Most things worked - basic network functionality. But I had
> problems with ipfw and igb. Adding em to the kernel fixed them.
>

FWIW the latest GENERIC kernel includes the iflib, em, etc devices as far
as I can tell. I found the new UPDATING entry about iflib "no longer
unconditionally compiled into the kernel" a bit confusing... So long as you
are including GENERIC it should be the same as 12-RELEASE.


> Graham
>
> On 6/4/19 6:12 am, Kris von Mach wrote:
> > On 4/6/2019 2:56 AM, Pete French wrote:
> >> Something odd going on there there - I am using 12-STABLE and I have
> >> igb just fine, and it attaches to the same hardware that 11 did:
> >
> > It does work in 12, throughput is great, just that the latency is
> > higher than 11.
> >
> > igb0: flags=8843 metric 0 mtu
> > 1500
> >
> options=e527bb
>
> >
> > ether 38:ea:a7:8d:c1:6c
> > inet 208.72.56.19 netmask 0xfc00 broadcast 208.72.59.255
> > inet6 fe80::3aea:a7ff:fe8d:c16c%igb0 prefixlen 64 scopeid 0x1
> > inet6 2602:ffb8::208:72:56:9 prefixlen 64
> > media: Ethernet autoselect (1000baseT )
> > status: active
> > nd6 options=21
> >
> >> Do you have a custom kernel, and if so did you see this note in
> >> UPDATING?
> >
> > Yes I do, but it includes all of GENERIC which includes em drivers,
> > otherwise it wouldn't even work with the network card.
> >
> > my custom kernel:
> >
> > include GENERIC
> > ident   CUSTOM
> > makeoptions WITH_EXTRA_TCP_STACKS=1
> > options TCPHPTS
> > options SC_KERNEL_CONS_ATTR=(FG_GREEN|BG_BLACK)
> > options IPSTEALTH
> > options   AHC_REG_PRETTY_PRINT  # Print register bitfields in debug
> > options   AHD_REG_PRETTY_PRINT  # Print register bitfields in debug
> > device cryptodev
> > device aesni
> >
> > I did try without RACK just in case that was the culprit.
> >
> >
> > ___
> > freebsd-stable@freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org
> "
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: about zfs and ashift and changing ashift on existing zpool

2019-04-09 Thread tech-lists

On Mon, Apr 08, 2019 at 09:25:43PM -0400, Michael Butler wrote:

On 2019-04-08 20:55, Alexander Motin wrote:

On 08.04.2019 20:21, Eugene Grosbein wrote:

09.04.2019 7:00, Kevin P. Neal wrote:


My guess (given that only ada1 is reporting a blocksize mismatch) is that
your disks reported a 512B native blocksize.  In the absence of any override,
ZFS will then build an ashift=9 pool.


[skip]


smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:   SEAGATE
Product:  ST2400MM0129
Revision: C003
Compliance:   SPC-4
User Capacity:2,400,476,553,216 bytes [2.40 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes


Maybe it't time to prefer "Physical block size" over "Logical block size" in 
relevant GEOMs
like GEOM_DISK, so upper levels such as ZFS would do the right thing 
automatically.


No.  It is a bad idea.  Changing logical block size for existing disks
will most likely result in breaking compatibility and inability to read
previously written data.  ZFS already uses physical block size when
possible -- on pool creation or new vdev addition.  When not possible
(pool already created wrong) it just complains about it, so that user
would know that his configuration is imperfect and he should not expect
full performance.


And some drives just present 512 bytes for both .. no idea if this is
consistent with the underlying silicon :-( I built a ZFS pool on it
using 4k blocks anyway.

smartctl 7.0 2018-12-30 r4883 [FreeBSD 13.0-CURRENT amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: WDC WDS100T2B0A-00SM50
Serial Number:1837B0803409
LU WWN Device Id: 5 001b44 8b99f7560
Firmware Version: X61190WD
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Form Factor:  2.5 inches
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Mon Apr  8 21:22:15 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable


Yeah it's weird isn't it. So it seems it's not an issue with zfs at all
as far as I can see. This is one of the drives that was replaced, and
it's identical to the other two making up the array. So not unreasonably
ashift was 9, as all three drives making up the array were
512 logical/physical.

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Black
Device Model: WDC WD4001FAEX-00MJRA0
Firmware Version: 01.01L01
User Capacity:4,000,787,030,016 bytes [4.00 TB]
Sector Size:  512 bytes logical/physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Tue Apr  9 12:47:01 2019 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

I replaced one of them with an 8tb drive:

TART OF INFORMATION SECTION ===
Model Family: Seagate Archive HDD
Device Model: ST8000AS0002-1NA17Z
Firmware Version: AR13
User Capacity:8,001,563,222,016 bytes [8.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5980 rpm
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Tue Apr  9 12:55:55 2019 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

so the 2nd drive is emulating 512. But ZFS seems to see through that and
correctly determines it's a 4k drive.

In any case, the fix was to make a new pool (which automatically set
ashift to 12 when the 8Tb disk was added) then zfs send from the old
pool to the new one, then destroy the old pool. Fortunately this was
easy because the system had zfs installed as an afterthought. So no
root-on-zfs. The OS is on a SSD.

All I can say is that zpool performance of a 4k drive in an a9 zpool is
non-ideal. The new pool feels quicker (even though the disks aren't
built for speed), and I've learned something new :D

--
J.


signature.asc
Description: PGP signature


Re: em performs worse than igb (latency wise) in 12?

2019-04-09 Thread Kris von Mach

On 4/7/2019 6:49 AM, Matthew Macy wrote:

On Sat, Apr 6, 2019 at 1:23 PM Michael Butler
 wrote:
I'd be interested to see if substituting the port net/intel-em-kmod 
has any effect on the issue, 

I would as well. igb, em, and lem are all the same driver in 12. This
makes maintenance a lot easier. However, the older NICs have a lot of
errata workarounds that aren't explicitly commented as such. My first
guess is this card suffers from one such errata workaround that has
been dropped in the update.


I've tried net/intel-em-kmod, it actually became worse went from 100 
requests/sec to about 90.


That makes sense about maintenance. Though I believe i350 is less than 5 
years old, HP's 366FLR version is 4. So it's not that old, and at least 
in gigabit level nic, is one of the best afaik.


Is there some other 1gig nic that is recommended for 12? Or is it time 
to switch to 10gbit? I've heard good things about Chelsio for 10gbit.


I went back to 11-Stable for now.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"