re: scsipi: physio split the request

2018-12-27 Thread matthew green
> Of course larger transfers would also mitigate the overhead for each I/O
> operation, but we already do several Gigabyte/s with 64k transfers and
> filesystem I/O tends to be even smaller.

yes - the benefits will be in the 0-10% range for most things.  it
will help, but only a fairly small amount, most of us won't notice.

i've seen peaks of 1.4GB/s with an nvme(4) device with ffs on top.


.mrg.


Re: scsipi: physio split the request

2018-12-27 Thread Michael van Elst
jnem...@cue.bc.ca (John Nemeth) writes:

>On Dec 27,  6:49pm, Michael van Elst wrote:
>} So far that's mostly a problem with software raid and modern tape I/O.

> Wouldn't hardware RAID also benefit from bigger buffers?
>Although, I suppose a battery backed cache be used to workaround
>small transfer sizes.

The transfer size currently limits I/O of stripes because it is split
over all stripe units (drives). A hardware controller does this internally
and isn't affected by MAXPHYS.

Of course larger transfers would also mitigate the overhead for each I/O
operation, but we already do several Gigabyte/s with 64k transfers and
filesystem I/O tends to be even smaller.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: scsipi: physio split the request

2018-12-27 Thread John Nemeth
On Dec 27,  6:49pm, Michael van Elst wrote:
} m...@netbsd.org (Emmanuel Dreyfus) writes:
} 
} >Is there a reason other than historical for NetBSD 64kB limit?
} 
} It's a compromise. Some buffers are statically sized for MAXPHYS
} and some ancient hardware cannot exceed 64k (or even less) DMA transfers.
} The buffer size is mostly a problem because we don't support
} scatter-gather transfers, so the buffers need to be contigous in
} physical RAM (and some hardware doesn't support s-g either).
} 
} So far that's mostly a problem with software raid and modern tape I/O.

 Wouldn't hardware RAID also benefit from bigger buffers?
Although, I suppose a battery backed cache be used to workaround
small transfer sizes.

}-- End of excerpt from Michael van Elst


Re: scsipi: physio split the request

2018-12-27 Thread Michael van Elst
thor...@me.com (Jason Thorpe) writes:

>> You need a really huge amount of RAM for that, and also a huge
>> KVA space.

>...but it doesn't have to be that way.

>The fundamental problem is that for physio, we currently have to map the 
>buffer into kernel space at all.

Mapping into KVA is another problem.

>  We really should have a more abstract way to describe memory that is passed 
> down to device drivers that currently take struct buf *s, call it an I/O 
> memory descriptor ("iomd"). This iomd would have, say, an array of vm_page 
> *'s, or perhaps an array of paddr_t's, but would also have a pointer to the 
> buffer as mapped into kernel address space.

The problem is that currently we and also some hardware cannot handle
such a construct.

>Then a new bus_dmamap_load_iomd() call could take an iomd as an argument, and 
>skip doing a bunch of work (calling into the pmap later to get the physical 
>address), and just build the bus_dma_segment_t's directly.

There is hardware that can only handle a single bus_dma_segment.

So that's:

- support some more abstract MAXPHYS (i.e. not a global constant).
- make buffers based on scatter-gather lists instead of a single linear
  piece of memory.
- make drivers use these scatter-gather buffers
- try to emulate this behaviour when hardware is too limited.
- make other users of buffers compatible with scatter-gather lists

That's a long way to go and still not related to mapping buffers into KVA.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: scsipi: physio split the request

2018-12-27 Thread Christos Zoulas
On Dec 27, 12:29pm, buh...@nfbcal.org (Brian Buhrow) wrote:
-- Subject: Re: scsipi: physio split the request

|   hello.  Just out of curiosity, why did the tls-maxphys branch never
| get merged with head once the work was done or mostly done?

mostly done...

christos


Re: scsipi: physio split the request

2018-12-27 Thread Jason Thorpe



> On Dec 27, 2018, at 10:51 AM, Michael van Elst  wrote:
> 
> m...@netbsd.org (Emmanuel Dreyfus) writes:
> 
>> What happens if I just #define MAXPHYS (1024*1204*1024) ?
> 
> You need a really huge amount of RAM for that, and also a huge
> KVA space.

...but it doesn't have to be that way.

The fundamental problem is that for physio, we currently have to map the buffer 
into kernel space at all.  We really should have a more abstract way to 
describe memory that is passed down to device drivers that currently take 
struct buf *s, call it an I/O memory descriptor ("iomd").  This iomd would 
have, say, an array of vm_page *'s, or perhaps an array of paddr_t's, but would 
also have a pointer to the buffer as mapped into kernel address space.  The 
necessary part is having the page array filled in, along with an offset, and a 
length.  If not sufficient, then callers could map the buffer ONLY if needed, 
e.g. if you have to do PIO to your device.

Then a new bus_dmamap_load_iomd() call could take an iomd as an argument, and 
skip doing a bunch of work (calling into the pmap later to get the physical 
address), and just build the bus_dma_segment_t's directly.  If it needs to 
bounce-buffer, then the back-end takes care of calling iomd_map() or whatever.

This isn't a fully fleshed-out proposal, or anything, but I know it's ben 
brought up off and on for years... we really ought to just get around to doing 
it.  Unfortunately, it's going to mean modifying a lot of drivers before the 
upper layers can assume "I can pass iomds down everywhere for buf I/O".

-- thorpej



Re: scsipi: physio split the request

2018-12-27 Thread Brian Buhrow
hello.  Just out of curiosity, why did the tls-maxphys branch never
get merged with head once the work was done or mostly done?
-thanks
-Brian



Re: scsipi: physio split the request

2018-12-27 Thread Christos Zoulas
In article <20181227153028.gr4...@homeworld.netbsd.org>,
Emmanuel Dreyfus   wrote:
>On Thu, Dec 27, 2018 at 09:47:03AM -0500, Christos Zoulas wrote:
>> | What happens if I just #define MAXPHYS (1024*1204*1024) ?
>> I don't think that's a good idea. My guess is that things are going to
>blow up.
>
>At least if I try to be on par with Linux limit and build with
>-DMAXPHYS=1048576 the system goes to multiuser without a hitch.
>
>Running mkltfs raises aa few errors on the console, though:
>mpii0: error 27 loading dmamap
>st0(mpii0:0:2:0): passthrough: adapter inconsistency
>mpii0: error 27 loading dmamap
>st0(mpii0:0:2:0): passthrough: adapter inconsistency

Told you: EFBIG :-)
Why don't you try tls-maxphys?

christos



Re: scsipi: physio split the request

2018-12-27 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes:

>What happens if I just #define MAXPHYS (1024*1204*1024) ?

You need a really huge amount of RAM for that, and also a huge
KVA space. Try MAXPHYS (1024*1024) for a start.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: scsipi: physio split the request

2018-12-27 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes:

>Is there a reason other than historical for NetBSD 64kB limit?

It's a compromise. Some buffers are statically sized for MAXPHYS
and some ancient hardware cannot exceed 64k (or even less) DMA transfers.
The buffer size is mostly a problem because we don't support
scatter-gather transfers, so the buffers need to be contigous in
physical RAM (and some hardware doesn't support s-g either).

So far that's mostly a problem with software raid and modern tape I/O.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: scsipi: physio split the request

2018-12-27 Thread Michael van Elst
m...@netbsd.org (Emmanuel Dreyfus) writes:

>On Thu, Dec 27, 2018 at 10:44:46AM +0100, Manuel Bouyer wrote:
>> tape block size are usually larger than 512 (I use 64k here).
>> What block size did mkltfs use ? Actually we can't do larger than 64k.

>It seems to attempt transfers of 256kB

We are limited to MAXPHYS which is currently 64k.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: scsipi: physio split the request

2018-12-27 Thread Emmanuel Dreyfus
On Thu, Dec 27, 2018 at 09:47:03AM -0500, Christos Zoulas wrote:
> | What happens if I just #define MAXPHYS (1024*1204*1024) ?
> I don't think that's a good idea. My guess is that things are going to blow 
> up.

At least if I try to be on par with Linux limit and build with
-DMAXPHYS=1048576 the system goes to multiuser without a hitch.

Running mkltfs raises aa few errors on the console, though:
mpii0: error 27 loading dmamap
st0(mpii0:0:2:0): passthrough: adapter inconsistency
mpii0: error 27 loading dmamap
st0(mpii0:0:2:0): passthrough: adapter inconsistency




-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: svr4, again

2018-12-27 Thread Maxime Villard

Le 21/12/2018 à 12:19, Maxime Villard a écrit :

Le 21/12/2018 à 10:25, Anders Magnusson a écrit :

Den 2018-12-20 kl. 21:29, skrev Maxime Villard:

Le 20/12/2018 à 18:11, Kamil Rytarowski a écrit :

https://github.com/krytarowski/franz-lisp-netbsd-0.9-i386

On the other hand unless we need it for bootloaders, drivers or
something needed to run NetBSD, I'm for removal of srv3, sunos etc compat.


Yes.

So, first things first, and to come back to my email about ibcs2: what are
the reasons for keeping it? As I said previously, this is not for x86 but
for Vax. As was also said, FreeBSD removed it just a few days ago.

I'm bringing up compat_ibcs2 because I did start a thread on port-vax@ about
it last year (as quoted earlier), and back then it seemed that no one knew
what was the use case on Vax.


It was something that Matt Thomas used for a customer running some commercial
program,
but it was a long time ago (15 years?).  I've never heard of any other use, so
from my perspective IBCS2 not relevant (anymore).

-- ragge


Alright, so I propose that we retire it. After a quick scroll-reread of the
thread it seems to me we all agree on that. Anyone objecting etc?


So, no one? I will remove it soon...


Re: scsipi: physio split the request

2018-12-27 Thread Christos Zoulas
On Dec 27,  2:41pm, m...@netbsd.org (Emmanuel Dreyfus) wrote:
-- Subject: Re: scsipi: physio split the request

| On Thu, Dec 27, 2018 at 02:33:28PM +, Christos Zoulas wrote:
| > I think you need resurrect the tls-maxphys branch... It was close to working
| > IIRC.
| 
| What happens if I just #define MAXPHYS (1024*1204*1024) ?

I don't think that's a good idea. My guess is that things are going to blow up.

christos


Re: scsipi: physio split the request

2018-12-27 Thread Emmanuel Dreyfus
On Thu, Dec 27, 2018 at 02:33:28PM +, Christos Zoulas wrote:
> I think you need resurrect the tls-maxphys branch... It was close to working
> IIRC.

What happens if I just #define MAXPHYS (1024*1204*1024) ?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: scsipi: physio split the request

2018-12-27 Thread Christos Zoulas
In article <20181227123711.go4...@homeworld.netbsd.org>,
Emmanuel Dreyfus   wrote:
>On Thu, Dec 27, 2018 at 10:44:46AM +0100, Manuel Bouyer wrote:
>> tape block size are usually larger than 512 (I use 64k here).
>> What block size did mkltfs use ? Actually we can't do larger than 64k.
>
>It seems to attempt transfers of 256kB
>
>LTFS20010D SCSI request: [ A3 1F 08 00 00 00 04 00 00 00 00 00 ]
>Requested length=262144
>LTFS20089D Driver detail:errno = 0x5
>LTFS20089D Driver detail:  host_status = 0x0
>LTFS20089D Driver detail:driver_status = 0x0
>LTFS20089D Driver detail:   status = 0x0
>LTFS20011D SCSI outcome: Driver status=0xFF SCSI status=0xFF Actual length=0

I think you need resurrect the tls-maxphys branch... It was close to working
IIRC.

christos



Re: scsipi: physio split the request

2018-12-27 Thread Emmanuel Dreyfus
On Thu, Dec 27, 2018 at 10:44:46AM +0100, Manuel Bouyer wrote:
> tape block size are usually larger than 512 (I use 64k here).

I patched ltfs so that all the max sizes (256kB and 512kB for Linux)
are set to 64kB for NetBSD. I can now format and mount the LTFS filesystem,
but I need to limit the block size to under 64kB.

This will work:
dump -0f - / | dd obs=63k of=/ltfs/dump20181227 

This hangs the filesystem:
dump -0f - / | dd obs=64k of=/ltfs/dump20181227 

I tested on glusterfs that our FUSE implementation does not limit writes
to 64k chunks, hence I assume I introduced a bug in ltfs with the 64kB 
limit everywhere.

Is there a reason other than historical for NetBSD 64kB limit?

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: scsipi: physio split the request

2018-12-27 Thread Emmanuel Dreyfus
On Thu, Dec 27, 2018 at 10:44:46AM +0100, Manuel Bouyer wrote:
> tape block size are usually larger than 512 (I use 64k here).
> What block size did mkltfs use ? Actually we can't do larger than 64k.

It seems to attempt transfers of 256kB

LTFS20010D SCSI request: [ A3 1F 08 00 00 00 04 00 00 00 00 00 ] Requested 
length=262144
LTFS20089D Driver detail:errno = 0x5
LTFS20089D Driver detail:  host_status = 0x0
LTFS20089D Driver detail:driver_status = 0x0
LTFS20089D Driver detail:   status = 0x0
LTFS20011D SCSI outcome: Driver status=0xFF SCSI status=0xFF Actual length=0


-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: scsipi: physio split the request

2018-12-27 Thread Manuel Bouyer
On Thu, Dec 27, 2018 at 09:07:41AM +, Emmanuel Dreyfus wrote:
> Hello
> 
> A few years ago I made a failed attempt at running LTFS on a LTO 6 drive. 
> I resumed the effort, and once I got the LTFS code ported, running 
> a command like mkltfs fails with kernel console saying:
> st0(mpii0:0:2:0): physio split the request.. cannot proceed
> 
> This is netbsd-current from yesterday.
> 
> I understand this is about tape block size larger than usual 512. 

tape block size are usually larger than 512 (I use 64k here).
What block size did mkltfs use ? Actually we can't do larger than 64k.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


scsipi: physio split the request

2018-12-27 Thread Emmanuel Dreyfus
Hello

A few years ago I made a failed attempt at running LTFS on a LTO 6 drive. 
I resumed the effort, and once I got the LTFS code ported, running 
a command like mkltfs fails with kernel console saying:
st0(mpii0:0:2:0): physio split the request.. cannot proceed

This is netbsd-current from yesterday.

I understand this is about tape block size larger than usual 512. 
src/sys/dev/st.c seems to have provision for that, with drive specific
quirks like below. Do I read it correctly?
{{T_SEQUENTIAL, T_REMOV,
 "TANDBERG", " TDC 3600   ", ""}, {0, 12, {
{0, 0, 0},  /* minor 0-3 */
{ST_Q_FORCE_BLKSIZE, 0, QIC_525},   /* minor 4-7 */
{0, 0, QIC_150},/* minor 8-11 */
{0, 0, QIC_120} /* minor 12-15 */
}}},
{{T_SEQUENTIAL, T_REMOV,
 "TANDBERG", " TDC 3800   ", ""}, {0, 0, {
{ST_Q_FORCE_BLKSIZE, 512, 0},   /* minor 0-3 */
{0, 0, QIC_525},/* minor 4-7 */
{0, 0, QIC_150},/* minor 8-11 */
{0, 0, QIC_120} /* minor 12-15 */
}}},

Mine is detected as:
  st0 at scsibus0 target 2 lun 0:  tape removable
  st0: density code 90, variable blocks, write-enabled
  st0: tagged queueing

Is it just that I need quirks for this specific drive? If this is the
case, where the appropriate information can be found? Looking in Linux
kernel code, I can only find stuff about TANDBERG TDC 3600, with nothing
about block size.


-- 
Emmanuel Dreyfus
m...@netbsd.org