Re: minphys woes

2014-09-05 Thread Stefan Fritsch
On Wednesday 03 September 2014 15:23:13, Philip Guenther wrote:
 IMO, the problem that you're hitting with your vioblk device isn't a
 problem with MAXPHYS, physio(), or minphys, but rather with
 MAXBSIZE. Back in the 1990's, the fact that you want to be able to
 configure a device with minphys 20kB on your arch means that the MD
 MAXBSIZE define for your arch should be 20kB.
 
 If MAXBSIZE could be set/limited on a per-device basis (i.e., the
 code using it queried the device) then your problem would be
 resolved, no?

Thank you very much for this write up. It was very enlightening and I 
agree that the problem I encountered is that MAXBSIZE is not a per-
device value.

Another thing this has made clear to me is that I don't want to deal 
with openbsd's lower level block layer. I will just close the lid and 
hope the smell goes away.

Cheers,
Stefan



Re: minphys woes

2014-09-03 Thread Stefan Fritsch
On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote:
  From physio(9):
   minphys
   A device specific routine called to determine the
 maximum transfer size that the device's strategy routine can
 handle.
 
  Since we have seen that the driver must be able to handle 64k
  blocks in any case, the fact that minphys is device specific is
  useless, isn't it?
 
 physio() is used by character device access.  Looks to me like
 sdminphys() will change the chunking behavior of this:
 dd if=/dev/zero of=/dev/rsd0a bs=100M count=1
 
 depending on whether sd0 is a SCSI-I device, no?

Yes, but that does not make it any less useless. File systems will 
call the very same strategy routine and expect it to deal with 64K 
blocks. The statement in the man page gives the misleading impression 
that the strategy routine could be used to avoid that the strategy 
routine has to handle 64K blocks, which is not true.

So, maybe the minphys mechanism is useful for block devices that are 
not used for file systems. Are there any such devices?

Maybe we could add a sentence to the man page like following?

Beware that a block device's strategy routine that is used for file 
systems must always be able to accept transfers sizes up to MAXPHYS.



Re: minphys woes

2014-09-03 Thread Claudio Jeker
On Wed, Sep 03, 2014 at 07:51:25PM +0200, Stefan Fritsch wrote:
 On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote:
   From physio(9):
minphys
A device specific routine called to determine the
  maximum transfer size that the device's strategy routine can
  handle.
  
   Since we have seen that the driver must be able to handle 64k
   blocks in any case, the fact that minphys is device specific is
   useless, isn't it?
  
  physio() is used by character device access.  Looks to me like
  sdminphys() will change the chunking behavior of this:
  dd if=/dev/zero of=/dev/rsd0a bs=100M count=1
  
  depending on whether sd0 is a SCSI-I device, no?
 
 Yes, but that does not make it any less useless. File systems will 
 call the very same strategy routine and expect it to deal with 64K 
 blocks. The statement in the man page gives the misleading impression 
 that the strategy routine could be used to avoid that the strategy 
 routine has to handle 64K blocks, which is not true.
 
 So, maybe the minphys mechanism is useful for block devices that are 
 not used for file systems. Are there any such devices?
 
 Maybe we could add a sentence to the man page like following?
 
 Beware that a block device's strategy routine that is used for file 
 systems must always be able to accept transfers sizes up to MAXPHYS.
 

Maybe it is time to fix the block side and make sure that it respects
minphys as well by chopping up the IO. This would also allow to buffer
bigger blocks than MAXPHYS in the buffer cache (not sure if this is
something we want though).

-- 
:wq Claudio



Re: minphys woes

2014-09-03 Thread Philip Guenther
On Wed, Sep 3, 2014 at 10:51 AM, Stefan Fritsch s...@sfritsch.de wrote:
 On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote:
  From physio(9):
   minphys
   A device specific routine called to determine the
 maximum transfer size that the device's strategy routine can
 handle.
 
  Since we have seen that the driver must be able to handle 64k
  blocks in any case, the fact that minphys is device specific is
  useless, isn't it?

 physio() is used by character device access.  Looks to me like
 sdminphys() will change the chunking behavior of this:
 dd if=/dev/zero of=/dev/rsd0a bs=100M count=1

 depending on whether sd0 is a SCSI-I device, no?

 Yes, but that does not make it any less useless. File systems will
 call the very same strategy routine and expect it to deal with 64K
 blocks. The statement in the man page gives the misleading impression
 that the [minphys] routine could be used to avoid that the strategy
 routine has to handle 64K blocks, which is not true.

We don't seem to have a manpage describing the routines that a block
device must implement and their requirements, nor the same for
character devices.  Those would be nice additions to our section 9
manpages.

This requirement on a block device's strategy routine comes from the
filesystem layer.  These *do not use physio()*, so how does it make
sense to document that requirement in physio(9)?  That manpage doesn't
even contain the word block!


 So, maybe the minphys mechanism is useful for block devices that are
 not used for file systems. Are there any such devices?

The current minphys mechanism *should* be useful for devices which do
not store file systems with a block size greater than the maximum I/O
size of the device.  IMO, if a filesystem is configured with block
size less than the device's minphys, then it should not be doing
breads of more than that block size.  If that's the case, then
manually setting the file system block size to less than the device's
minphys should let you create a filesystem on that device.

(Creating/using a filesystem with block size greater the device
minphys seems like a Bad Idea for performance and possibly correctness
reasons.  It's like the disk I/O version of IP fragmentation...)


To go back to a previous email, you wrote:
 But what is really strange is, why does openbsd then have
 an infrastructure to set different max transfer sizes for physio on a
 per-adapter basis? This makes no sense. Either the drivers have to support
 64k transfers, in which case most of the minphys infrastructure is
 useless, or they don't have to. In the latter case the minphys
 infrastructure would need to be used in all code paths.

The answer to this question can be found by reading cvs logs and
diffs: MAXBSIZE used to be strictly less than MAXPHYS and less than or
equal to the smallest minphys of a device supported on the given
architecture.  MAXBSIZE was MD then, and also apparently reflected
limitations in the pmap; i386 pmap bugs delayed it from going to 64k
MAXBSIZE until after all the others did.

Why bother with permitting character device access in chunks greater
than MAXBSIZE?  Probably because some tools (newfs, dump, fsck) had
significant performance gains by doing I/O in the largest chunk
supported by the device, irrespective of the limitations of the pmap,
buffer cache, or filesystems.  Oh, and tapes really wanted larger I/Os
so that they could stream, IIRC.


IMO, the problem that you're hitting with your vioblk device isn't a
problem with MAXPHYS, physio(), or minphys, but rather with MAXBSIZE.
Back in the 1990's, the fact that you want to be able to configure a
device with minphys 20kB on your arch means that the MD MAXBSIZE
define for your arch should be 20kB.

If MAXBSIZE could be set/limited on a per-device basis (i.e., the code
using it queried the device) then your problem would be resolved, no?


Philip Guenther



Re: minphys woes

2014-09-02 Thread Stefan Fritsch
On Monday 01 September 2014 22:24:16, David Gwynne wrote:
  i haven't found a controller that does less than MAXPHYS.
  perhaps they meant to improve the situation but stopped short.
 
 if we wanted to raise MAXPHYS, we'd have to support older
 controllers that cant do greater than 64k with some mechanism.

So the summary of this thread is that

- drivers have to support 64K transfers
- the minphys mechanism is useless at the moment

Not quite what I hoped, but ok. Do you know a good place where this 
should be documented?



Re: minphys woes

2014-09-02 Thread Philip Guenther
On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de wrote:

 On Monday 01 September 2014 22:24:16, David Gwynne wrote:
   i haven't found a controller that does less than MAXPHYS.
   perhaps they meant to improve the situation but stopped short.
 
  if we wanted to raise MAXPHYS, we'd have to support older
  controllers that cant do greater than 64k with some mechanism.

 So the summary of this thread is that

 - drivers have to support 64K transfers
 - the minphys mechanism is useless at the moment

Useless?  It works just fine for physio, which is what it was designed for, no?



 Not quite what I hoped, but ok. Do you know a good place where this
 should be documented?

minphys is currently documented on physio(9)


Philip Guenther



Re: minphys woes

2014-09-02 Thread Stefan Fritsch
On Tuesday 02 September 2014 13:15:19, Philip Guenther wrote:
 On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de 
wrote:
  On Monday 01 September 2014 22:24:16, David Gwynne wrote:
i haven't found a controller that does less than MAXPHYS.
perhaps they meant to improve the situation but stopped short.
   
   if we wanted to raise MAXPHYS, we'd have to support older
   controllers that cant do greater than 64k with some mechanism.
  
  So the summary of this thread is that
  
  - drivers have to support 64K transfers
  - the minphys mechanism is useless at the moment
 
 Useless?  It works just fine for physio, which is what it was
 designed for, no?
  Not quite what I hoped, but ok. Do you know a good place where
  this
  should be documented?
 
 minphys is currently documented on physio(9)


From physio(9):

 minphys
 A device specific routine called to determine the maximum
 transfer size that the device's strategy routine can
 handle.

Since we have seen that the driver must be able to handle 64k blocks 
in any case, the fact that minphys is device specific is useless, 
isn't it?



Re: minphys woes

2014-09-02 Thread Philip Guenther
On Tue, Sep 2, 2014 at 2:07 PM, Stefan Fritsch s...@sfritsch.de wrote:
 On Tuesday 02 September 2014 13:15:19, Philip Guenther wrote:
 On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de
 wrote:
  On Monday 01 September 2014 22:24:16, David Gwynne wrote:
i haven't found a controller that does less than MAXPHYS.
perhaps they meant to improve the situation but stopped short.
  
   if we wanted to raise MAXPHYS, we'd have to support older
   controllers that cant do greater than 64k with some mechanism.
 
  So the summary of this thread is that
 
  - drivers have to support 64K transfers
  - the minphys mechanism is useless at the moment

 Useless?  It works just fine for physio, which is what it was
 designed for, no?
  Not quite what I hoped, but ok. Do you know a good place where
  this
  should be documented?

 minphys is currently documented on physio(9)

 From physio(9):

  minphys
  A device specific routine called to determine the maximum
  transfer size that the device's strategy routine can
  handle.

 Since we have seen that the driver must be able to handle 64k blocks
 in any case, the fact that minphys is device specific is useless,
 isn't it?

physio() is used by character device access.  Looks to me like
sdminphys() will change the chunking behavior of this:
dd if=/dev/zero of=/dev/rsd0a bs=100M count=1

depending on whether sd0 is a SCSI-I device, no?


Philip Guenther



Re: minphys woes

2014-09-01 Thread Mike Belopuhov
On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
 On Fri, 29 Aug 2014, Mike Belopuhov wrote:
 correct me if i'm wrong, but what happens is that bread being a block
 read reads up to MAXBSIZE which is conveniently set to 64k and you can't
 create a filesystem with a larger block size.

 physio (raw device io) however doesn't go through bread and need to know
 how split the provided buffer in separate transactions hence minphys.

 Yes, that seems to be what happens. But if every adapter needs to support
 transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
 adapter to be able to override the default minphys function with its own.
 And adapters that only support smaller transfers would need to have logic
 in their driver to be able to split the transfer into smaller chunks.


i believe that if you start digging you realise that (at least at some point)
the least common denominator is (or was) 64k meaning that even the shittiest
controller on vax can do 64k.  which means that we don't have code for a
filesystem or buffercache to probe the controller for a supported transfer
size.

 I think it makes more sense to have that logic in one place to be used by
 all drivers.

perhaps, but unless the filesystem will issue breads of larger blocks the
only benefit would be physio itself which doesn't really justify the change.



Re: minphys woes

2014-09-01 Thread Stefan Fritsch
On Mon, 1 Sep 2014, Mike Belopuhov wrote:

 On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
  Yes, that seems to be what happens. But if every adapter needs to support
  transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
  adapter to be able to override the default minphys function with its own.
  And adapters that only support smaller transfers would need to have logic
  in their driver to be able to split the transfer into smaller chunks.
 
 
 i believe that if you start digging you realise that (at least at some point)
 the least common denominator is (or was) 64k meaning that even the shittiest
 controller on vax can do 64k.  which means that we don't have code for a
 filesystem or buffercache to probe the controller for a supported transfer
 size.

That's possible. But what is really strange is, why does openbsd then have 
an infrastructure to set different max transfer sizes for physio on a 
per-adapter basis? This makes no sense. Either the drivers have to support 
64k transfers, in which case most of the minphys infrastructure is 
useless, or they don't have to. In the latter case the minphys 
infrastructure would need to be used in all code paths.

  I think it makes more sense to have that logic in one place to be used by
  all drivers.
 
 perhaps, but unless the filesystem will issue breads of larger blocks the
 only benefit would be physio itself which doesn't really justify the change.

You lost me there. Why should this depend on how many requests some 
filesystem makes with larger blocks? If there is the possibility that 
filesystems do such requests, someone needs to do the splitting of the 
requests. The question is who. The driver or the block layer.


BTW, for my use case I don't actually want to limit the block size, but 
rather the number of DMA segments. But the most reasonable way to do that 
seemed to set minphys to max_segments * pagesize. If we change how these 
things work, we could take the number of DMA segments into account.



Re: minphys woes

2014-09-01 Thread Stefan Fritsch
On Mon, 1 Sep 2014, David Gwynne wrote:
  Yes, that seems to be what happens. But if every adapter needs to support 
  transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the 
  adapter to be able to override the default minphys function with its own. 
  And adapters that only support smaller transfers would need to have logic 
  in their driver to be able to split the transfer into smaller chunks.
  
  I think it makes more sense to have that logic in one place to be used by 
  all drivers.
 
 this has blown my mind. how do the stupid ata cf card things that only 
 support single sector IO work?

It seems _wdc_ata_bio_start in ata_wdc.c does the splitting in this case.



Re: minphys woes

2014-09-01 Thread Mike Belopuhov
On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote:
 On Mon, 1 Sep 2014, Mike Belopuhov wrote:

 On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
  Yes, that seems to be what happens. But if every adapter needs to support
  transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
  adapter to be able to override the default minphys function with its own.
  And adapters that only support smaller transfers would need to have logic
  in their driver to be able to split the transfer into smaller chunks.
 

 i believe that if you start digging you realise that (at least at some point)
 the least common denominator is (or was) 64k meaning that even the shittiest
 controller on vax can do 64k.  which means that we don't have code for a
 filesystem or buffercache to probe the controller for a supported transfer
 size.

 That's possible. But what is really strange is, why does openbsd then have
 an infrastructure to set different max transfer sizes for physio on a
 per-adapter basis? This makes no sense. Either the drivers have to support
 64k transfers, in which case most of the minphys infrastructure is
 useless, or they don't have to. In the latter case the minphys
 infrastructure would need to be used in all code paths.


i haven't found a controller that does less than MAXPHYS.
perhaps they meant to improve the situation but stopped short.

  I think it makes more sense to have that logic in one place to be used by
  all drivers.

 perhaps, but unless the filesystem will issue breads of larger blocks the
 only benefit would be physio itself which doesn't really justify the change.

 You lost me there. Why should this depend on how many requests some
 filesystem makes with larger blocks? If there is the possibility that
 filesystems do such requests, someone needs to do the splitting of the
 requests. The question is who. The driver or the block layer.



well, the filesystem doesn't issue requests for more than 64k at a time.
newfs won't allow you to build a 128k block file filesystem.

 BTW, for my use case I don't actually want to limit the block size, but
 rather the number of DMA segments. But the most reasonable way to do that
 seemed to set minphys to max_segments * pagesize. If we change how these
 things work, we could take the number of DMA segments into account.

can't help you with this one.



Re: minphys woes

2014-09-01 Thread David Gwynne

On 1 Sep 2014, at 9:22 pm, Mike Belopuhov m...@belopuhov.com wrote:

 On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote:
 On Mon, 1 Sep 2014, Mike Belopuhov wrote:
 
 On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote:
 Yes, that seems to be what happens. But if every adapter needs to support
 transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the
 adapter to be able to override the default minphys function with its own.
 And adapters that only support smaller transfers would need to have logic
 in their driver to be able to split the transfer into smaller chunks.
 
 
 i believe that if you start digging you realise that (at least at some 
 point)
 the least common denominator is (or was) 64k meaning that even the shittiest
 controller on vax can do 64k.  which means that we don't have code for a
 filesystem or buffercache to probe the controller for a supported transfer
 size.
 
 That's possible. But what is really strange is, why does openbsd then have
 an infrastructure to set different max transfer sizes for physio on a
 per-adapter basis? This makes no sense. Either the drivers have to support
 64k transfers, in which case most of the minphys infrastructure is
 useless, or they don't have to. In the latter case the minphys
 infrastructure would need to be used in all code paths.
 
 
 i haven't found a controller that does less than MAXPHYS.
 perhaps they meant to improve the situation but stopped short.

if we wanted to raise MAXPHYS, we'd have to support older controllers that cant 
do greater than 64k with some mechanism.

 
 I think it makes more sense to have that logic in one place to be used by
 all drivers.
 
 perhaps, but unless the filesystem will issue breads of larger blocks the
 only benefit would be physio itself which doesn't really justify the change.
 
 You lost me there. Why should this depend on how many requests some
 filesystem makes with larger blocks? If there is the possibility that
 filesystems do such requests, someone needs to do the splitting of the
 requests. The question is who. The driver or the block layer.
 
 
 
 well, the filesystem doesn't issue requests for more than 64k at a time.
 newfs won't allow you to build a 128k block file filesystem.
 
 BTW, for my use case I don't actually want to limit the block size, but
 rather the number of DMA segments. But the most reasonable way to do that
 seemed to set minphys to max_segments * pagesize. If we change how these
 things work, we could take the number of DMA segments into account.
 
 can't help you with this one.




Re: minphys woes

2014-08-31 Thread David Gwynne

On 30 Aug 2014, at 6:39, Stefan Fritsch s...@sfritsch.de wrote:

 On Fri, 29 Aug 2014, Mike Belopuhov wrote:
 
 On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote:
 On Fri, 29 Aug 2014, Miod Vallat wrote:
 sc-sc_xfer_max is computed according to the host's capabilities. What I
 want to simulate with this diff is a host adapter that can only cope with
 transfers  64k == MAXPHYS.
 
 Back to your original problem, you might want to print the sc_link
 struct as well the scsi_adapter struct it points to, when you detect a
 transfer larger than MAXPHYS. It has likely been overriden or reset to
 NULL by mistake at some point.
 
 OK, I will try that. Will take a bit until I have time, though.
 
 That's not the problem. I have added an additional
 
 KASSERT(xs-sc_link-adapter-scsi_minphys == vioblk_minphys)
 
 before the other KASSERT and it did not trigger.
 
 But I have also read the code and I could not find a place in the path
 from bread() to the scsi adapter cmd function where the minphys function
 is called.
 
 
 correct me if i'm wrong, but what happens is that bread being a block
 read reads up to MAXBSIZE which is conveniently set to 64k and you can't
 create a filesystem with a larger block size.
 
 physio (raw device io) however doesn't go through bread and need to know
 how split the provided buffer in separate transactions hence minphys.
 
 Yes, that seems to be what happens. But if every adapter needs to support 
 transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the 
 adapter to be able to override the default minphys function with its own. 
 And adapters that only support smaller transfers would need to have logic 
 in their driver to be able to split the transfer into smaller chunks.
 
 I think it makes more sense to have that logic in one place to be used by 
 all drivers.

this has blown my mind. how do the stupid ata cf card things that only support 
single sector IO work?

dlg



Re: minphys woes

2014-08-29 Thread David Gwynne
is it something msdos isnt doing that ffs does?

On 29 Aug 2014, at 15:55, Stefan Fritsch s...@sfritsch.de wrote:

 On Friday 29 August 2014 10:34:20, David Gwynne wrote:
 are you expecting minphys to appear in that backtrace?
 
 
 No. In that trace minphys should have limited the transfer size and I 
 had put an KASSERT() into vioblk_scsi_cmd() which verified that xs-
 datalen is not larger than that limit. A call to bread() causes the 
 KASSERT to trigger.



Re: minphys woes

2014-08-29 Thread Stefan Fritsch
No, also happens on ffs. If the root FS is on an affected device, it looks 
like this:


root on sd0a (9e7002f8647a2403.a) swap on sd0b dump on sd0b
clock: unknown CMOS layout
panic: kernel diagnostic assertion xs-datalen = MINPHYS failed: file 
../../../../dev/pci/vioblk.c, line 405
Stopped at  Debugger+0x9:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS 
PANIC!
IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, 
TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0} trace  
Debugger() at Debugger+0x9
panic() at panic+0xfe
__assert() at __assert+0x25
vioblk_scsi_cmd() at vioblk_scsi_cmd+0x565
scsi_xs_exec() at scsi_xs_exec+0x35
sdstart() at sdstart+0x16e
scsi_iopool_run() at scsi_iopool_run+0x5d
scsi_xsh_runqueue() at scsi_xsh_runqueue+0x13d
scsi_xsh_add() at scsi_xsh_add+0x98
sdstrategy() at sdstrategy+0x102
spec_strategy() at spec_strategy+0x53
ufs_strategy() at ufs_strategy+0x82
VOP_STRATEGY() at VOP_STRATEGY+0x3b
bread_cluster() at bread_cluster+0x2a4
ffs_read() at ffs_read+0x250
VOP_READ() at VOP_READ+0x3f
uvn_io() at uvn_io+0x2e2
uvn_get() at uvn_get+0x1f7
uvm_fault() at uvm_fault+0xdf5
trap() at trap+0x62f
--- trap (number 6) ---
end of kernel
end trace frame: 0x7f7dd0c0, count: -20
acpi_pdirpa+0x43ae00:




Reproducer patch is below. I am sure it can be reproduced with any 
other disk driver, but with ahci a reproducer would be a bit more 
complicated because of the atascsi layer.

Maybe VOP_STRATEGY() should do a similar minphys+loop thing as physio() 
does?



--- a/sys/dev/pci/vioblk.c
+++ b/sys/dev/pci/vioblk.c
@@ -63,6 +63,7 @@
 #include scsi/scsi_disk.h
 #include scsi/scsiconf.h
 
+#define MINPHYS(20*1024)
 #define VIOBLK_DONE-1
 
 struct virtio_feature_name vioblk_feature_names[] = {
@@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl)
struct vioblk_softc *sc = sl-adapter_softc;
if (bp-b_bcount  sc-sc_xfer_max)
bp-b_bcount = sc-sc_xfer_max;
+   if (bp-b_bcount  MINPHYS)
+   bp-b_bcount = MINPHYS;
 }
 
 void
@@ -399,6 +402,8 @@ vioblk_scsi_cmd(struct scsi_xfer *xs)
int timeout;
int slot, ret, nsegs;
 
+   KASSERT(xs-datalen = MINPHYS);
+
s = splbio();
ret = virtio_enqueue_prep(vq, slot);
if (ret) {






On Fri, 29 Aug 2014, David Gwynne wrote:

 is it something msdos isnt doing that ffs does?
 
 On 29 Aug 2014, at 15:55, Stefan Fritsch s...@sfritsch.de wrote:
 
  On Friday 29 August 2014 10:34:20, David Gwynne wrote:
  are you expecting minphys to appear in that backtrace?
  
  
  No. In that trace minphys should have limited the transfer size and I 
  had put an KASSERT() into vioblk_scsi_cmd() which verified that xs-
  datalen is not larger than that limit. A call to bread() causes the 
  KASSERT to trigger.
 
 



Re: minphys woes

2014-08-29 Thread Miod Vallat
 +++ b/sys/dev/pci/vioblk.c
 @@ -63,6 +63,7 @@
  #include scsi/scsi_disk.h
  #include scsi/scsiconf.h
  
 +#define MINPHYS  (20*1024)

This makes no sense. If you want to override MINPHYS, you need to
override for the whole kernel, not only here.

 @@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl)
   struct vioblk_softc *sc = sl-adapter_softc;
   if (bp-b_bcount  sc-sc_xfer_max)
   bp-b_bcount = sc-sc_xfer_max;
 + if (bp-b_bcount  MINPHYS)
 + bp-b_bcount = MINPHYS;

This is useless. sc-sc_xfer_max is computed so that it will always be
= MINPHYS.



Re: minphys woes

2014-08-29 Thread Stefan Fritsch
On Fri, 29 Aug 2014, Miod Vallat wrote:

  +++ b/sys/dev/pci/vioblk.c
  @@ -63,6 +63,7 @@
   #include scsi/scsi_disk.h
   #include scsi/scsiconf.h
   
  +#define MINPHYS(20*1024)
 
 This makes no sense. If you want to override MINPHYS, you need to
 override for the whole kernel, not only here.

MINPHYS is just a name I have chosen. It is not used anywhere else in the 
kernel. You are confusing this with MAXPHYS.

 
  @@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl)
  struct vioblk_softc *sc = sl-adapter_softc;
  if (bp-b_bcount  sc-sc_xfer_max)
  bp-b_bcount = sc-sc_xfer_max;
  +   if (bp-b_bcount  MINPHYS)
  +   bp-b_bcount = MINPHYS;
 
 This is useless. sc-sc_xfer_max is computed so that it will always be
 = MINPHYS.

sc-sc_xfer_max is computed according to the host's capabilities. What I 
want to simulate with this diff is a host adapter that can only cope with 
transfers  64k == MAXPHYS.



Re: minphys woes

2014-08-29 Thread Miod Vallat
   +#define MINPHYS  (20*1024)
  
  This makes no sense. If you want to override MINPHYS, you need to
  override for the whole kernel, not only here.
 
 MINPHYS is just a name I have chosen. It is not used anywhere else in the 
 kernel. You are confusing this with MAXPHYS.

Oops, you're right. Sorry about the confusion.

 sc-sc_xfer_max is computed according to the host's capabilities. What I 
 want to simulate with this diff is a host adapter that can only cope with 
 transfers  64k == MAXPHYS.

Back to your original problem, you might want to print the sc_link
struct as well the scsi_adapter struct it points to, when you detect a
transfer larger than MAXPHYS. It has likely been overriden or reset to
NULL by mistake at some point.



Re: minphys woes

2014-08-29 Thread Stefan Fritsch
On Fri, 29 Aug 2014, Miod Vallat wrote:
  sc-sc_xfer_max is computed according to the host's capabilities. What I 
  want to simulate with this diff is a host adapter that can only cope with 
  transfers  64k == MAXPHYS.
 
 Back to your original problem, you might want to print the sc_link
 struct as well the scsi_adapter struct it points to, when you detect a
 transfer larger than MAXPHYS. It has likely been overriden or reset to
 NULL by mistake at some point.

OK, I will try that. Will take a bit until I have time, though.

But I have also read the code and I could not find a place in the path 
from bread() to the scsi adapter cmd function where the minphys function 
is called.



Re: minphys woes

2014-08-29 Thread Mike Belopuhov
On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote:
 On Fri, 29 Aug 2014, Miod Vallat wrote:
  sc-sc_xfer_max is computed according to the host's capabilities. What I
  want to simulate with this diff is a host adapter that can only cope with
  transfers  64k == MAXPHYS.

 Back to your original problem, you might want to print the sc_link
 struct as well the scsi_adapter struct it points to, when you detect a
 transfer larger than MAXPHYS. It has likely been overriden or reset to
 NULL by mistake at some point.

 OK, I will try that. Will take a bit until I have time, though.

 But I have also read the code and I could not find a place in the path
 from bread() to the scsi adapter cmd function where the minphys function
 is called.


correct me if i'm wrong, but what happens is that bread being a block
read reads up to MAXBSIZE which is conveniently set to 64k and you can't
create a filesystem with a larger block size.

physio (raw device io) however doesn't go through bread and need to know
how split the provided buffer in separate transactions hence minphys.



Re: minphys woes

2014-08-29 Thread Stefan Fritsch
On Fri, 29 Aug 2014, Mike Belopuhov wrote:

 On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote:
  On Fri, 29 Aug 2014, Miod Vallat wrote:
   sc-sc_xfer_max is computed according to the host's capabilities. What I
   want to simulate with this diff is a host adapter that can only cope with
   transfers  64k == MAXPHYS.
 
  Back to your original problem, you might want to print the sc_link
  struct as well the scsi_adapter struct it points to, when you detect a
  transfer larger than MAXPHYS. It has likely been overriden or reset to
  NULL by mistake at some point.
 
  OK, I will try that. Will take a bit until I have time, though.

That's not the problem. I have added an additional

 KASSERT(xs-sc_link-adapter-scsi_minphys == vioblk_minphys)

before the other KASSERT and it did not trigger.

  But I have also read the code and I could not find a place in the path
  from bread() to the scsi adapter cmd function where the minphys function
  is called.
 
 
 correct me if i'm wrong, but what happens is that bread being a block
 read reads up to MAXBSIZE which is conveniently set to 64k and you can't
 create a filesystem with a larger block size.
 
 physio (raw device io) however doesn't go through bread and need to know
 how split the provided buffer in separate transactions hence minphys.

Yes, that seems to be what happens. But if every adapter needs to support 
transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the 
adapter to be able to override the default minphys function with its own. 
And adapters that only support smaller transfers would need to have logic 
in their driver to be able to split the transfer into smaller chunks.

I think it makes more sense to have that logic in one place to be used by 
all drivers.



minphys woes

2014-08-28 Thread Stefan Fritsch
Hi,

while debugging some problems on a virtio host that supports only few dma 
segments per request, I noticed that the minphys function is not called in 
all code paths. It is only used for physio but not for other (cached) disk 
operations. An example trace is below.

Am I missing something or is this as broken as I think it is? I thought 
the minphys function is a way for a driver to say that it does not support 
larger transfers and that transfers are split accordingly. What's the 
purpose of the minphys mechanism if it does not affect all code paths?

Cheers,
Stefan


panic() at 0x1027bcf5 panic+0x1a5/0x1061006f0
vioblk_scsi_cmd() at 0x103ecccf vioblk_scsi_cmd+0x8ff/0x106100770
sdstart() at 0x103bc4d6 sdstart+0x176/0x1061007b0
scsi_ioh_runqueue() at 0x103b3c05 scsi_ioh_runqueue+0x55/0x1061007e0
scsi_xsh_runqueue() at 0x103b3d72 scsi_xsh_runqueue+0x132/0x106100800
sdstrategy() at 0x103bc6bf sdstrategy+0x10f/0x106100830
spec_strategy() at 0x102a8b3b spec_strategy+0x5b/0x106100850
VOP_STRATEGY() at 0x102a75be VOP_STRATEGY+0x2e/0x106100870
bio_doread() at 0x102991fc bio_doread+0xbc/0x1061008a0
bread() at 0x102992c3 bread+0x13/0x1061008c0
fillinusemap() at 0x102ad6d9 fillinusemap+0xb9/0x106100920
msdosfs_mountfs() at 0x102af78d msdosfs_mountfs+0x59d/0x106100990
msdosfs_mount() at 0x102afcf6 msdosfs_mount+0x3e6/0x106100b70
sys_mount() at 0x102a6a3c sys_mount+0x36c/0x106100d90



Re: minphys woes

2014-08-28 Thread David Gwynne
are you expecting minphys to appear in that backtrace?

On 29 Aug 2014, at 5:07, Stefan Fritsch s...@sfritsch.de wrote:

 Hi,
 
 while debugging some problems on a virtio host that supports only few dma 
 segments per request, I noticed that the minphys function is not called in 
 all code paths. It is only used for physio but not for other (cached) disk 
 operations. An example trace is below.
 
 Am I missing something or is this as broken as I think it is? I thought 
 the minphys function is a way for a driver to say that it does not support 
 larger transfers and that transfers are split accordingly. What's the 
 purpose of the minphys mechanism if it does not affect all code paths?
 
 Cheers,
 Stefan
 
 
 panic() at 0x1027bcf5 panic+0x1a5/0x1061006f0
 vioblk_scsi_cmd() at 0x103ecccf vioblk_scsi_cmd+0x8ff/0x106100770
 sdstart() at 0x103bc4d6 sdstart+0x176/0x1061007b0
 scsi_ioh_runqueue() at 0x103b3c05 scsi_ioh_runqueue+0x55/0x1061007e0
 scsi_xsh_runqueue() at 0x103b3d72 scsi_xsh_runqueue+0x132/0x106100800
 sdstrategy() at 0x103bc6bf sdstrategy+0x10f/0x106100830
 spec_strategy() at 0x102a8b3b spec_strategy+0x5b/0x106100850
 VOP_STRATEGY() at 0x102a75be VOP_STRATEGY+0x2e/0x106100870
 bio_doread() at 0x102991fc bio_doread+0xbc/0x1061008a0
 bread() at 0x102992c3 bread+0x13/0x1061008c0
 fillinusemap() at 0x102ad6d9 fillinusemap+0xb9/0x106100920
 msdosfs_mountfs() at 0x102af78d msdosfs_mountfs+0x59d/0x106100990
 msdosfs_mount() at 0x102afcf6 msdosfs_mount+0x3e6/0x106100b70
 sys_mount() at 0x102a6a3c sys_mount+0x36c/0x106100d90
 



Re: minphys woes

2014-08-28 Thread Stefan Fritsch
On Friday 29 August 2014 10:34:20, David Gwynne wrote:
 are you expecting minphys to appear in that backtrace?


No. In that trace minphys should have limited the transfer size and I 
had put an KASSERT() into vioblk_scsi_cmd() which verified that xs-
datalen is not larger than that limit. A call to bread() causes the 
KASSERT to trigger.