Re: svn commit: r355837 - head/sys/cam

2019-12-16 Thread Warner Losh
On Mon, Dec 16, 2019 at 5:54 PM Kevin Bowling 
wrote:

>
>
> On Mon, Dec 16, 2019 at 5:44 PM Steven Hartland <
> steven.hartl...@multiplay.co.uk> wrote:
>
>> Sticky keyboard there Warner?
>
>
> LOL
>

Yea. I have a mac with a keyboard with a stuck delete key. I tried to edit
the commit message on a flight to california last week. It screwed up a few
of the commit messages so I quit for the night. I fixed all the others, but
missed this one :(.

On a more serious note the fact that the controllers lie about the
>> underlying
>> location of data, the impact of skipping the TRIM requests can have a
>> much more
>> serious impact than one might think depending on the drive, so this type
>> of
>> optimisation can significantly harm performance instead of increasing it.
>>
>> This was the main reasons we sponsored the initial ZFS TRIM
>> implementation; as
>> drive performance go so bad with no TRIM that SSD's performed worse than
>> HDD's.
>
>
> Have you been able to test the new OpenZFS/ZoF TRIM?
>
> I notice the current FBSD one gets quite beleaguered with highly
> concurrent poudriere as the snapshots are being reaped, I.e TRIMs totally
> swamp r/w ops on the Plextor PCIe SSD I have.  I haven’t tried ZoF on this
> machine yet since it is my main workstation but will do so once it is ready
> for mainline.
>

Trims totally swamping r/w ops is why I started this work in the first
place. I'd wager that the new ZoF TRIM code may not be as well tuned as
FreeBSD and/or makes performance assumptions that are unwise in practice.
I've not looked at it to know, but I suspect it is combining adjacent trims
less. If you are using nvd it will shot gun all requests into the device's
queue, which on less than high end enterprise drives can lead to issues
like you described... I'm willing to help people characterize what's going
on, but I won't have time to look into this until sometime in January. In
general, at least for the drives we use, fewer trims that are larger work a
lot better. Also, biasing your I/O selection towards reads by some factor
helps mitigate the trim factor a bit, though they can still swamp if
there's no reads for a short while to keep the trims at bay (which is why I
wrote the pacing code).

Warner


>
>> Now obviously this was some time ago, but I wouldn't be surprised if
>> there's bad
>> hardware / firmware like this still being produced.
>>
>> Given that might be a good idea to make this optional, possibly even opt
>> in not opt
>> out?
>>
>>  Regards
>>  Steve
>>
>> On 17/12/2019 00:13, Warner Losh wrote:
>> > Author: imp
>> > Date: Tue Dec 17 00:13:45 2019
>> > New Revision: 355837
>> > URL: https://svnweb.freebsd.org/changeset/base/355837
>> >
>> > Log:
>> >Implement bio_speedup
>> >
>> >React to the BIO_SPEED command in the cam io scheduler by completing
>> >as successful BIO_DELETE commands that are pending, up to the length
>> >passed down in the BIO_SPEEDUP cmomand. The length passed down is a
>> >hint for how much space on the drive needs to be recovered. By
>> >completing the BIO_DELETE comomands, this allows the upper layers to
>> >allocate and write to the blocks that were about to be trimmed. Since
>> >FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
>> >go directly to writing.
>> >
>> >The biggest benefit from TRIMS coomes ffrom the drive being able t
>> >ooptimize its free block pool inthe log run. There's little nto no
>> >bene3efit in the shoort term. , sepeciall whn the trim is followed by
>> >a write. Speedup lets  us make this tradeoff.
>> >
>> >Reviewed by: kirk, kib
>> >Sponsored by: Netflix
>> >Differential Revision: https://reviews.freebsd.org/D18351
>> >
>> > Modified:
>> >head/sys/cam/cam_iosched.c
>> >
>> > Modified: head/sys/cam/cam_iosched.c
>> >
>> ==
>> > --- head/sys/cam/cam_iosched.cTue Dec 17 00:13:40 2019
>> (r355836)
>> > +++ head/sys/cam/cam_iosched.cTue Dec 17 00:13:45 2019
>> (r355837)
>> > @@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc
>> *isc,
>> >   {
>> >
>> >   /*
>> > +  * A BIO_SPEEDUP from the uppper layers means that they have a
>> block
>> > +  * shortage. At the present, this is only sent when we're trying
>> to
>> > +  * allocate blocks, but have a shortage before giving up.
>> bio_length is
>> > +  * the size of their shortage. We will complete just enough
>> BIO_DELETEs
>> > +  * in the queue to satisfy the need. If bio_length is 0, we'll
>> complete
>> > +  * them all. This allows the scheduler to delay BIO_DELETEs to
>> improve
>> > +  * read/write performance without worrying about the upper
>> layers. When
>> > +  * it's possibly a problem, we respond by pretending the
>> BIO_DELETEs
>> > +  * just worked. We can't do anything about the BIO_DELETEs in the
>> > +  * hardware, though. We have

Re: svn commit: r355837 - head/sys/cam

2019-12-16 Thread Warner Losh
On Mon, Dec 16, 2019 at 5:43 PM Steven Hartland <
steven.hartl...@multiplay.co.uk> wrote:

> Sticky keyboard there Warner?
>

Yea. I thought I'd fixed all those messages.


> On a more serious note the fact that the controllers lie about the
> underlying
> location of data, the impact of skipping the TRIM requests can have a
> much more
> serious impact than one might think depending on the drive, so this type of
> optimisation can significantly harm performance instead of increasing it.
>

The underlying location of an LBA is never fixed. There's always a
intermediate layer that does the mapping so you can get good performance
writing to the drive in chunks smaller than the erase block. A trim
followed quickly by a write is nearly the same as just doing the write. In
the cases where we have a severe shortage, we'll skip the trim to a block
that's likely to be written to shortly anyway. It's only done when there's
a shortage, and it only affects the trims that we've not yet scheduled.


> This was the main reasons we sponsored the initial ZFS TRIM
> implementation; as
> drive performance go so bad with no TRIM that SSD's performed worse than
> HDD's.
>

Yes. You can't omit trims generally, but when the choice is to freeze all
the other I/O in the system and do every single trim, or skip a few trims
to allow the I/O to flow more quickly, this is an obvious tradeoff. The
reason you want to generally do trim is to keep the write amp on the drive
down. Skipping it occasionally, especially for blocks that will soon be
written, is fine. It won't materially affect the write amp. Turning it off
completely will affect the write amp terribly, on some drive, as you
discovered as the underlying nand pool of blocks get fragmented.

Now obviously this was some time ago, but I wouldn't be surprised if
> there's bad
> hardware / firmware like this still being produced.
>

I've never encountered firmware so bad that this would cause it any
trouble, despite encountering a lot of bad firmware out there... DSM TRIMs
are optional, and nothing requires that they be done. If you do the
overwhelming majority but fail to do so for a few trims, the drive will
still behave decently even with really bad firmware and garbage collection
routines.


> Given that might be a good idea to make this optional, possibly even opt
> in not opt
> out?
>

It's 100% off if you don't use the I/O scheduler with option IOSCHED. Since
the cam io scheduler can delay I/O, this helps when there's a severe
resource shortage in the system (which happens somewhat infrequently). As
such, I viewed it as part of IOSCHED and not something needing a knob to
turn off separately. I had a hard time triggering the shortages that would
exercise the speedup path, and the code review moved the speedup code to an
even less frequent location.

Warner


>  Regards
>  Steve
>
> On 17/12/2019 00:13, Warner Losh wrote:
> > Author: imp
> > Date: Tue Dec 17 00:13:45 2019
> > New Revision: 355837
> > URL: https://svnweb.freebsd.org/changeset/base/355837
> >
> > Log:
> >Implement bio_speedup
> >
> >React to the BIO_SPEED command in the cam io scheduler by completing
> >as successful BIO_DELETE commands that are pending, up to the length
> >passed down in the BIO_SPEEDUP cmomand. The length passed down is a
> >hint for how much space on the drive needs to be recovered. By
> >completing the BIO_DELETE comomands, this allows the upper layers to
> >allocate and write to the blocks that were about to be trimmed. Since
> >FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
> >go directly to writing.
> >
> >The biggest benefit from TRIMS coomes ffrom the drive being able t
> >ooptimize its free block pool inthe log run. There's little nto no
> >bene3efit in the shoort term. , sepeciall whn the trim is followed by
> >a write. Speedup lets  us make this tradeoff.
> >
> >Reviewed by: kirk, kib
> >Sponsored by: Netflix
> >Differential Revision: https://reviews.freebsd.org/D18351
> >
> > Modified:
> >head/sys/cam/cam_iosched.c
> >
> > Modified: head/sys/cam/cam_iosched.c
> >
> ==
> > --- head/sys/cam/cam_iosched.cTue Dec 17 00:13:40 2019
> (r355836)
> > +++ head/sys/cam/cam_iosched.cTue Dec 17 00:13:45 2019
> (r355837)
> > @@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc
> *isc,
> >   {
> >
> >   /*
> > +  * A BIO_SPEEDUP from the uppper layers means that they have a
> block
> > +  * shortage. At the present, this is only sent when we're trying to
> > +  * allocate blocks, but have a shortage before giving up.
> bio_length is
> > +  * the size of their shortage. We will complete just enough
> BIO_DELETEs
> > +  * in the queue to satisfy the need. If bio_length is 0, we'll
> complete
> > +  * them all. This allows the scheduler to delay BIO_DELETEs to
> improv

Re: svn commit: r355837 - head/sys/cam

2019-12-16 Thread Kevin Bowling
On Mon, Dec 16, 2019 at 5:44 PM Steven Hartland <
steven.hartl...@multiplay.co.uk> wrote:

> Sticky keyboard there Warner?


LOL


> On a more serious note the fact that the controllers lie about the
> underlying
> location of data, the impact of skipping the TRIM requests can have a
> much more
> serious impact than one might think depending on the drive, so this type of
> optimisation can significantly harm performance instead of increasing it.
>
> This was the main reasons we sponsored the initial ZFS TRIM
> implementation; as
> drive performance go so bad with no TRIM that SSD's performed worse than
> HDD's.


Have you been able to test the new OpenZFS/ZoF TRIM?

I notice the current FBSD one gets quite beleaguered with highly concurrent
poudriere as the snapshots are being reaped, I.e TRIMs totally swamp r/w
ops on the Plextor PCIe SSD I have.  I haven’t tried ZoF on this machine
yet since it is my main workstation but will do so once it is ready for
mainline.


> Now obviously this was some time ago, but I wouldn't be surprised if
> there's bad
> hardware / firmware like this still being produced.
>
> Given that might be a good idea to make this optional, possibly even opt
> in not opt
> out?
>
>  Regards
>  Steve
>
> On 17/12/2019 00:13, Warner Losh wrote:
> > Author: imp
> > Date: Tue Dec 17 00:13:45 2019
> > New Revision: 355837
> > URL: https://svnweb.freebsd.org/changeset/base/355837
> >
> > Log:
> >Implement bio_speedup
> >
> >React to the BIO_SPEED command in the cam io scheduler by completing
> >as successful BIO_DELETE commands that are pending, up to the length
> >passed down in the BIO_SPEEDUP cmomand. The length passed down is a
> >hint for how much space on the drive needs to be recovered. By
> >completing the BIO_DELETE comomands, this allows the upper layers to
> >allocate and write to the blocks that were about to be trimmed. Since
> >FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
> >go directly to writing.
> >
> >The biggest benefit from TRIMS coomes ffrom the drive being able t
> >ooptimize its free block pool inthe log run. There's little nto no
> >bene3efit in the shoort term. , sepeciall whn the trim is followed by
> >a write. Speedup lets  us make this tradeoff.
> >
> >Reviewed by: kirk, kib
> >Sponsored by: Netflix
> >Differential Revision: https://reviews.freebsd.org/D18351
> >
> > Modified:
> >head/sys/cam/cam_iosched.c
> >
> > Modified: head/sys/cam/cam_iosched.c
> >
> ==
> > --- head/sys/cam/cam_iosched.cTue Dec 17 00:13:40 2019
> (r355836)
> > +++ head/sys/cam/cam_iosched.cTue Dec 17 00:13:45 2019
> (r355837)
> > @@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc
> *isc,
> >   {
> >
> >   /*
> > +  * A BIO_SPEEDUP from the uppper layers means that they have a
> block
> > +  * shortage. At the present, this is only sent when we're trying to
> > +  * allocate blocks, but have a shortage before giving up.
> bio_length is
> > +  * the size of their shortage. We will complete just enough
> BIO_DELETEs
> > +  * in the queue to satisfy the need. If bio_length is 0, we'll
> complete
> > +  * them all. This allows the scheduler to delay BIO_DELETEs to
> improve
> > +  * read/write performance without worrying about the upper layers.
> When
> > +  * it's possibly a problem, we respond by pretending the
> BIO_DELETEs
> > +  * just worked. We can't do anything about the BIO_DELETEs in the
> > +  * hardware, though. We have to wait for them to complete.
> > +  */
> > + if (bp->bio_cmd == BIO_SPEEDUP) {
> > + off_t len;
> > + struct bio *nbp;
> > +
> > + len = 0;
> > + while (bioq_first(&isc->trim_queue) &&
> > + (bp->bio_length == 0 || len < bp->bio_length)) {
> > + nbp = bioq_takefirst(&isc->trim_queue);
> > + len += nbp->bio_length;
> > + nbp->bio_error = 0;
> > + biodone(nbp);
> > + }
> > + if (bp->bio_length > 0) {
> > + if (bp->bio_length > len)
> > + bp->bio_resid = bp->bio_length - len;
> > + else
> > + bp->bio_resid = 0;
> > + }
> > + bp->bio_error = 0;
> > + biodone(bp);
> > + return;
> > + }
> > +
> > + /*
> >* If we get a BIO_FLUSH, and we're doing delayed BIO_DELETEs then
> we
> >* set the last tick time to one less than the current ticks minus
> the
> >* delay to force the BIO_DELETEs to be presented to the client
> driver.
> > @@ -1919,8 +1954,8 @@ DB_SHOW_COMMAND(iosched, cam_iosched_db_show)
> >   db_printf("Trim Q len %d\n", biolen(&isc->trim_queue));
> >   

Re: svn commit: r355837 - head/sys/cam

2019-12-16 Thread Steven Hartland

Sticky keyboard there Warner?

On a more serious note the fact that the controllers lie about the 
underlying
location of data, the impact of skipping the TRIM requests can have a 
much more

serious impact than one might think depending on the drive, so this type of
optimisation can significantly harm performance instead of increasing it.

This was the main reasons we sponsored the initial ZFS TRIM 
implementation; as
drive performance go so bad with no TRIM that SSD's performed worse than 
HDD's.


Now obviously this was some time ago, but I wouldn't be surprised if 
there's bad

hardware / firmware like this still being produced.

Given that might be a good idea to make this optional, possibly even opt 
in not opt

out?

    Regards
    Steve

On 17/12/2019 00:13, Warner Losh wrote:

Author: imp
Date: Tue Dec 17 00:13:45 2019
New Revision: 355837
URL: https://svnweb.freebsd.org/changeset/base/355837

Log:
   Implement bio_speedup
   
   React to the BIO_SPEED command in the cam io scheduler by completing

   as successful BIO_DELETE commands that are pending, up to the length
   passed down in the BIO_SPEEDUP cmomand. The length passed down is a
   hint for how much space on the drive needs to be recovered. By
   completing the BIO_DELETE comomands, this allows the upper layers to
   allocate and write to the blocks that were about to be trimmed. Since
   FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
   go directly to writing.
   
   The biggest benefit from TRIMS coomes ffrom the drive being able t

   ooptimize its free block pool inthe log run. There's little nto no
   bene3efit in the shoort term. , sepeciall whn the trim is followed by
   a write. Speedup lets  us make this tradeoff.
   
   Reviewed by: kirk, kib

   Sponsored by: Netflix
   Differential Revision: https://reviews.freebsd.org/D18351

Modified:
   head/sys/cam/cam_iosched.c

Modified: head/sys/cam/cam_iosched.c
==
--- head/sys/cam/cam_iosched.c  Tue Dec 17 00:13:40 2019(r355836)
+++ head/sys/cam/cam_iosched.c  Tue Dec 17 00:13:45 2019(r355837)
@@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc *isc,
  {
  
  	/*

+* A BIO_SPEEDUP from the uppper layers means that they have a block
+* shortage. At the present, this is only sent when we're trying to
+* allocate blocks, but have a shortage before giving up. bio_length is
+* the size of their shortage. We will complete just enough BIO_DELETEs
+* in the queue to satisfy the need. If bio_length is 0, we'll complete
+* them all. This allows the scheduler to delay BIO_DELETEs to improve
+* read/write performance without worrying about the upper layers. When
+* it's possibly a problem, we respond by pretending the BIO_DELETEs
+* just worked. We can't do anything about the BIO_DELETEs in the
+* hardware, though. We have to wait for them to complete.
+*/
+   if (bp->bio_cmd == BIO_SPEEDUP) {
+   off_t len;
+   struct bio *nbp;
+
+   len = 0;
+   while (bioq_first(&isc->trim_queue) &&
+   (bp->bio_length == 0 || len < bp->bio_length)) {
+   nbp = bioq_takefirst(&isc->trim_queue);
+   len += nbp->bio_length;
+   nbp->bio_error = 0;
+   biodone(nbp);
+   }
+   if (bp->bio_length > 0) {
+   if (bp->bio_length > len)
+   bp->bio_resid = bp->bio_length - len;
+   else
+   bp->bio_resid = 0;
+   }
+   bp->bio_error = 0;
+   biodone(bp);
+   return;
+   }
+
+   /*
 * If we get a BIO_FLUSH, and we're doing delayed BIO_DELETEs then we
 * set the last tick time to one less than the current ticks minus the
 * delay to force the BIO_DELETEs to be presented to the client driver.
@@ -1919,8 +1954,8 @@ DB_SHOW_COMMAND(iosched, cam_iosched_db_show)
db_printf("Trim Q len %d\n", biolen(&isc->trim_queue));
db_printf("read_bias: %d\n", isc->read_bias);
db_printf("current_read_bias: %d\n", isc->current_read_bias);
-   db_printf("Trims active   %d\n", isc->pend_trim);
-   db_printf("Max trims active   %d\n", isc->max_trim);
+   db_printf("Trims active   %d\n", isc->pend_trims);
+   db_printf("Max trims active   %d\n", isc->max_trims);
  }
  #endif
  #endif


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"