Re: minphys woes
On Wednesday 03 September 2014 15:23:13, Philip Guenther wrote: IMO, the problem that you're hitting with your vioblk device isn't a problem with MAXPHYS, physio(), or minphys, but rather with MAXBSIZE. Back in the 1990's, the fact that you want to be able to configure a device with minphys 20kB on your arch means that the MD MAXBSIZE define for your arch should be 20kB. If MAXBSIZE could be set/limited on a per-device basis (i.e., the code using it queried the device) then your problem would be resolved, no? Thank you very much for this write up. It was very enlightening and I agree that the problem I encountered is that MAXBSIZE is not a per- device value. Another thing this has made clear to me is that I don't want to deal with openbsd's lower level block layer. I will just close the lid and hope the smell goes away. Cheers, Stefan
Re: minphys woes
On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote: From physio(9): minphys A device specific routine called to determine the maximum transfer size that the device's strategy routine can handle. Since we have seen that the driver must be able to handle 64k blocks in any case, the fact that minphys is device specific is useless, isn't it? physio() is used by character device access. Looks to me like sdminphys() will change the chunking behavior of this: dd if=/dev/zero of=/dev/rsd0a bs=100M count=1 depending on whether sd0 is a SCSI-I device, no? Yes, but that does not make it any less useless. File systems will call the very same strategy routine and expect it to deal with 64K blocks. The statement in the man page gives the misleading impression that the strategy routine could be used to avoid that the strategy routine has to handle 64K blocks, which is not true. So, maybe the minphys mechanism is useful for block devices that are not used for file systems. Are there any such devices? Maybe we could add a sentence to the man page like following? Beware that a block device's strategy routine that is used for file systems must always be able to accept transfers sizes up to MAXPHYS.
Re: minphys woes
On Wed, Sep 03, 2014 at 07:51:25PM +0200, Stefan Fritsch wrote: On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote: From physio(9): minphys A device specific routine called to determine the maximum transfer size that the device's strategy routine can handle. Since we have seen that the driver must be able to handle 64k blocks in any case, the fact that minphys is device specific is useless, isn't it? physio() is used by character device access. Looks to me like sdminphys() will change the chunking behavior of this: dd if=/dev/zero of=/dev/rsd0a bs=100M count=1 depending on whether sd0 is a SCSI-I device, no? Yes, but that does not make it any less useless. File systems will call the very same strategy routine and expect it to deal with 64K blocks. The statement in the man page gives the misleading impression that the strategy routine could be used to avoid that the strategy routine has to handle 64K blocks, which is not true. So, maybe the minphys mechanism is useful for block devices that are not used for file systems. Are there any such devices? Maybe we could add a sentence to the man page like following? Beware that a block device's strategy routine that is used for file systems must always be able to accept transfers sizes up to MAXPHYS. Maybe it is time to fix the block side and make sure that it respects minphys as well by chopping up the IO. This would also allow to buffer bigger blocks than MAXPHYS in the buffer cache (not sure if this is something we want though). -- :wq Claudio
Re: minphys woes
On Wed, Sep 3, 2014 at 10:51 AM, Stefan Fritsch s...@sfritsch.de wrote: On Tuesday 02 September 2014 15:22:16, Philip Guenther wrote: From physio(9): minphys A device specific routine called to determine the maximum transfer size that the device's strategy routine can handle. Since we have seen that the driver must be able to handle 64k blocks in any case, the fact that minphys is device specific is useless, isn't it? physio() is used by character device access. Looks to me like sdminphys() will change the chunking behavior of this: dd if=/dev/zero of=/dev/rsd0a bs=100M count=1 depending on whether sd0 is a SCSI-I device, no? Yes, but that does not make it any less useless. File systems will call the very same strategy routine and expect it to deal with 64K blocks. The statement in the man page gives the misleading impression that the [minphys] routine could be used to avoid that the strategy routine has to handle 64K blocks, which is not true. We don't seem to have a manpage describing the routines that a block device must implement and their requirements, nor the same for character devices. Those would be nice additions to our section 9 manpages. This requirement on a block device's strategy routine comes from the filesystem layer. These *do not use physio()*, so how does it make sense to document that requirement in physio(9)? That manpage doesn't even contain the word block! So, maybe the minphys mechanism is useful for block devices that are not used for file systems. Are there any such devices? The current minphys mechanism *should* be useful for devices which do not store file systems with a block size greater than the maximum I/O size of the device. IMO, if a filesystem is configured with block size less than the device's minphys, then it should not be doing breads of more than that block size. If that's the case, then manually setting the file system block size to less than the device's minphys should let you create a filesystem on that device. (Creating/using a filesystem with block size greater the device minphys seems like a Bad Idea for performance and possibly correctness reasons. It's like the disk I/O version of IP fragmentation...) To go back to a previous email, you wrote: But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. The answer to this question can be found by reading cvs logs and diffs: MAXBSIZE used to be strictly less than MAXPHYS and less than or equal to the smallest minphys of a device supported on the given architecture. MAXBSIZE was MD then, and also apparently reflected limitations in the pmap; i386 pmap bugs delayed it from going to 64k MAXBSIZE until after all the others did. Why bother with permitting character device access in chunks greater than MAXBSIZE? Probably because some tools (newfs, dump, fsck) had significant performance gains by doing I/O in the largest chunk supported by the device, irrespective of the limitations of the pmap, buffer cache, or filesystems. Oh, and tapes really wanted larger I/Os so that they could stream, IIRC. IMO, the problem that you're hitting with your vioblk device isn't a problem with MAXPHYS, physio(), or minphys, but rather with MAXBSIZE. Back in the 1990's, the fact that you want to be able to configure a device with minphys 20kB on your arch means that the MD MAXBSIZE define for your arch should be 20kB. If MAXBSIZE could be set/limited on a per-device basis (i.e., the code using it queried the device) then your problem would be resolved, no? Philip Guenther
Re: minphys woes
On Monday 01 September 2014 22:24:16, David Gwynne wrote: i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. if we wanted to raise MAXPHYS, we'd have to support older controllers that cant do greater than 64k with some mechanism. So the summary of this thread is that - drivers have to support 64K transfers - the minphys mechanism is useless at the moment Not quite what I hoped, but ok. Do you know a good place where this should be documented?
Re: minphys woes
On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de wrote: On Monday 01 September 2014 22:24:16, David Gwynne wrote: i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. if we wanted to raise MAXPHYS, we'd have to support older controllers that cant do greater than 64k with some mechanism. So the summary of this thread is that - drivers have to support 64K transfers - the minphys mechanism is useless at the moment Useless? It works just fine for physio, which is what it was designed for, no? Not quite what I hoped, but ok. Do you know a good place where this should be documented? minphys is currently documented on physio(9) Philip Guenther
Re: minphys woes
On Tuesday 02 September 2014 13:15:19, Philip Guenther wrote: On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de wrote: On Monday 01 September 2014 22:24:16, David Gwynne wrote: i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. if we wanted to raise MAXPHYS, we'd have to support older controllers that cant do greater than 64k with some mechanism. So the summary of this thread is that - drivers have to support 64K transfers - the minphys mechanism is useless at the moment Useless? It works just fine for physio, which is what it was designed for, no? Not quite what I hoped, but ok. Do you know a good place where this should be documented? minphys is currently documented on physio(9) From physio(9): minphys A device specific routine called to determine the maximum transfer size that the device's strategy routine can handle. Since we have seen that the driver must be able to handle 64k blocks in any case, the fact that minphys is device specific is useless, isn't it?
Re: minphys woes
On Tue, Sep 2, 2014 at 2:07 PM, Stefan Fritsch s...@sfritsch.de wrote: On Tuesday 02 September 2014 13:15:19, Philip Guenther wrote: On Tue, Sep 2, 2014 at 12:51 PM, Stefan Fritsch s...@sfritsch.de wrote: On Monday 01 September 2014 22:24:16, David Gwynne wrote: i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. if we wanted to raise MAXPHYS, we'd have to support older controllers that cant do greater than 64k with some mechanism. So the summary of this thread is that - drivers have to support 64K transfers - the minphys mechanism is useless at the moment Useless? It works just fine for physio, which is what it was designed for, no? Not quite what I hoped, but ok. Do you know a good place where this should be documented? minphys is currently documented on physio(9) From physio(9): minphys A device specific routine called to determine the maximum transfer size that the device's strategy routine can handle. Since we have seen that the driver must be able to handle 64k blocks in any case, the fact that minphys is device specific is useless, isn't it? physio() is used by character device access. Looks to me like sdminphys() will change the chunking behavior of this: dd if=/dev/zero of=/dev/rsd0a bs=100M count=1 depending on whether sd0 is a SCSI-I device, no? Philip Guenther
Re: minphys woes
On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Mike Belopuhov wrote: correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys. Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change.
Re: minphys woes
On Mon, 1 Sep 2014, Mike Belopuhov wrote: On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. That's possible. But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change. You lost me there. Why should this depend on how many requests some filesystem makes with larger blocks? If there is the possibility that filesystems do such requests, someone needs to do the splitting of the requests. The question is who. The driver or the block layer. BTW, for my use case I don't actually want to limit the block size, but rather the number of DMA segments. But the most reasonable way to do that seemed to set minphys to max_segments * pagesize. If we change how these things work, we could take the number of DMA segments into account.
Re: minphys woes
On Mon, 1 Sep 2014, David Gwynne wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. I think it makes more sense to have that logic in one place to be used by all drivers. this has blown my mind. how do the stupid ata cf card things that only support single sector IO work? It seems _wdc_ata_bio_start in ata_wdc.c does the splitting in this case.
Re: minphys woes
On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote: On Mon, 1 Sep 2014, Mike Belopuhov wrote: On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. That's possible. But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change. You lost me there. Why should this depend on how many requests some filesystem makes with larger blocks? If there is the possibility that filesystems do such requests, someone needs to do the splitting of the requests. The question is who. The driver or the block layer. well, the filesystem doesn't issue requests for more than 64k at a time. newfs won't allow you to build a 128k block file filesystem. BTW, for my use case I don't actually want to limit the block size, but rather the number of DMA segments. But the most reasonable way to do that seemed to set minphys to max_segments * pagesize. If we change how these things work, we could take the number of DMA segments into account. can't help you with this one.
Re: minphys woes
On 1 Sep 2014, at 9:22 pm, Mike Belopuhov m...@belopuhov.com wrote: On 1 September 2014 13:06, Stefan Fritsch s...@sfritsch.de wrote: On Mon, 1 Sep 2014, Mike Belopuhov wrote: On 29 August 2014 22:39, Stefan Fritsch s...@sfritsch.de wrote: Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. i believe that if you start digging you realise that (at least at some point) the least common denominator is (or was) 64k meaning that even the shittiest controller on vax can do 64k. which means that we don't have code for a filesystem or buffercache to probe the controller for a supported transfer size. That's possible. But what is really strange is, why does openbsd then have an infrastructure to set different max transfer sizes for physio on a per-adapter basis? This makes no sense. Either the drivers have to support 64k transfers, in which case most of the minphys infrastructure is useless, or they don't have to. In the latter case the minphys infrastructure would need to be used in all code paths. i haven't found a controller that does less than MAXPHYS. perhaps they meant to improve the situation but stopped short. if we wanted to raise MAXPHYS, we'd have to support older controllers that cant do greater than 64k with some mechanism. I think it makes more sense to have that logic in one place to be used by all drivers. perhaps, but unless the filesystem will issue breads of larger blocks the only benefit would be physio itself which doesn't really justify the change. You lost me there. Why should this depend on how many requests some filesystem makes with larger blocks? If there is the possibility that filesystems do such requests, someone needs to do the splitting of the requests. The question is who. The driver or the block layer. well, the filesystem doesn't issue requests for more than 64k at a time. newfs won't allow you to build a 128k block file filesystem. BTW, for my use case I don't actually want to limit the block size, but rather the number of DMA segments. But the most reasonable way to do that seemed to set minphys to max_segments * pagesize. If we change how these things work, we could take the number of DMA segments into account. can't help you with this one.
Re: minphys woes
On 30 Aug 2014, at 6:39, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Mike Belopuhov wrote: On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. That's not the problem. I have added an additional KASSERT(xs-sc_link-adapter-scsi_minphys == vioblk_minphys) before the other KASSERT and it did not trigger. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called. correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys. Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. I think it makes more sense to have that logic in one place to be used by all drivers. this has blown my mind. how do the stupid ata cf card things that only support single sector IO work? dlg
Re: minphys woes
is it something msdos isnt doing that ffs does? On 29 Aug 2014, at 15:55, Stefan Fritsch s...@sfritsch.de wrote: On Friday 29 August 2014 10:34:20, David Gwynne wrote: are you expecting minphys to appear in that backtrace? No. In that trace minphys should have limited the transfer size and I had put an KASSERT() into vioblk_scsi_cmd() which verified that xs- datalen is not larger than that limit. A call to bread() causes the KASSERT to trigger.
Re: minphys woes
No, also happens on ffs. If the root FS is on an affected device, it looks like this: root on sd0a (9e7002f8647a2403.a) swap on sd0b dump on sd0b clock: unknown CMOS layout panic: kernel diagnostic assertion xs-datalen = MINPHYS failed: file ../../../../dev/pci/vioblk.c, line 405 Stopped at Debugger+0x9: leave RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! IF RUNNING SMP, USE 'mach ddbcpu #' AND 'trace' ON OTHER PROCESSORS, TOO. DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! ddb{0} trace Debugger() at Debugger+0x9 panic() at panic+0xfe __assert() at __assert+0x25 vioblk_scsi_cmd() at vioblk_scsi_cmd+0x565 scsi_xs_exec() at scsi_xs_exec+0x35 sdstart() at sdstart+0x16e scsi_iopool_run() at scsi_iopool_run+0x5d scsi_xsh_runqueue() at scsi_xsh_runqueue+0x13d scsi_xsh_add() at scsi_xsh_add+0x98 sdstrategy() at sdstrategy+0x102 spec_strategy() at spec_strategy+0x53 ufs_strategy() at ufs_strategy+0x82 VOP_STRATEGY() at VOP_STRATEGY+0x3b bread_cluster() at bread_cluster+0x2a4 ffs_read() at ffs_read+0x250 VOP_READ() at VOP_READ+0x3f uvn_io() at uvn_io+0x2e2 uvn_get() at uvn_get+0x1f7 uvm_fault() at uvm_fault+0xdf5 trap() at trap+0x62f --- trap (number 6) --- end of kernel end trace frame: 0x7f7dd0c0, count: -20 acpi_pdirpa+0x43ae00: Reproducer patch is below. I am sure it can be reproduced with any other disk driver, but with ahci a reproducer would be a bit more complicated because of the atascsi layer. Maybe VOP_STRATEGY() should do a similar minphys+loop thing as physio() does? --- a/sys/dev/pci/vioblk.c +++ b/sys/dev/pci/vioblk.c @@ -63,6 +63,7 @@ #include scsi/scsi_disk.h #include scsi/scsiconf.h +#define MINPHYS(20*1024) #define VIOBLK_DONE-1 struct virtio_feature_name vioblk_feature_names[] = { @@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl) struct vioblk_softc *sc = sl-adapter_softc; if (bp-b_bcount sc-sc_xfer_max) bp-b_bcount = sc-sc_xfer_max; + if (bp-b_bcount MINPHYS) + bp-b_bcount = MINPHYS; } void @@ -399,6 +402,8 @@ vioblk_scsi_cmd(struct scsi_xfer *xs) int timeout; int slot, ret, nsegs; + KASSERT(xs-datalen = MINPHYS); + s = splbio(); ret = virtio_enqueue_prep(vq, slot); if (ret) { On Fri, 29 Aug 2014, David Gwynne wrote: is it something msdos isnt doing that ffs does? On 29 Aug 2014, at 15:55, Stefan Fritsch s...@sfritsch.de wrote: On Friday 29 August 2014 10:34:20, David Gwynne wrote: are you expecting minphys to appear in that backtrace? No. In that trace minphys should have limited the transfer size and I had put an KASSERT() into vioblk_scsi_cmd() which verified that xs- datalen is not larger than that limit. A call to bread() causes the KASSERT to trigger.
Re: minphys woes
+++ b/sys/dev/pci/vioblk.c @@ -63,6 +63,7 @@ #include scsi/scsi_disk.h #include scsi/scsiconf.h +#define MINPHYS (20*1024) This makes no sense. If you want to override MINPHYS, you need to override for the whole kernel, not only here. @@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl) struct vioblk_softc *sc = sl-adapter_softc; if (bp-b_bcount sc-sc_xfer_max) bp-b_bcount = sc-sc_xfer_max; + if (bp-b_bcount MINPHYS) + bp-b_bcount = MINPHYS; This is useless. sc-sc_xfer_max is computed so that it will always be = MINPHYS.
Re: minphys woes
On Fri, 29 Aug 2014, Miod Vallat wrote: +++ b/sys/dev/pci/vioblk.c @@ -63,6 +63,7 @@ #include scsi/scsi_disk.h #include scsi/scsiconf.h +#define MINPHYS(20*1024) This makes no sense. If you want to override MINPHYS, you need to override for the whole kernel, not only here. MINPHYS is just a name I have chosen. It is not used anywhere else in the kernel. You are confusing this with MAXPHYS. @@ -157,6 +158,8 @@ vioblk_minphys(struct buf *bp, struct scsi_link *sl) struct vioblk_softc *sc = sl-adapter_softc; if (bp-b_bcount sc-sc_xfer_max) bp-b_bcount = sc-sc_xfer_max; + if (bp-b_bcount MINPHYS) + bp-b_bcount = MINPHYS; This is useless. sc-sc_xfer_max is computed so that it will always be = MINPHYS. sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS.
Re: minphys woes
+#define MINPHYS (20*1024) This makes no sense. If you want to override MINPHYS, you need to override for the whole kernel, not only here. MINPHYS is just a name I have chosen. It is not used anywhere else in the kernel. You are confusing this with MAXPHYS. Oops, you're right. Sorry about the confusion. sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point.
Re: minphys woes
On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called.
Re: minphys woes
On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called. correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys.
Re: minphys woes
On Fri, 29 Aug 2014, Mike Belopuhov wrote: On 29 August 2014 11:26, Stefan Fritsch s...@sfritsch.de wrote: On Fri, 29 Aug 2014, Miod Vallat wrote: sc-sc_xfer_max is computed according to the host's capabilities. What I want to simulate with this diff is a host adapter that can only cope with transfers 64k == MAXPHYS. Back to your original problem, you might want to print the sc_link struct as well the scsi_adapter struct it points to, when you detect a transfer larger than MAXPHYS. It has likely been overriden or reset to NULL by mistake at some point. OK, I will try that. Will take a bit until I have time, though. That's not the problem. I have added an additional KASSERT(xs-sc_link-adapter-scsi_minphys == vioblk_minphys) before the other KASSERT and it did not trigger. But I have also read the code and I could not find a place in the path from bread() to the scsi adapter cmd function where the minphys function is called. correct me if i'm wrong, but what happens is that bread being a block read reads up to MAXBSIZE which is conveniently set to 64k and you can't create a filesystem with a larger block size. physio (raw device io) however doesn't go through bread and need to know how split the provided buffer in separate transactions hence minphys. Yes, that seems to be what happens. But if every adapter needs to support transfers of MAXBSIZE == MAXPHYS anyway, there would be no need for the adapter to be able to override the default minphys function with its own. And adapters that only support smaller transfers would need to have logic in their driver to be able to split the transfer into smaller chunks. I think it makes more sense to have that logic in one place to be used by all drivers.
minphys woes
Hi, while debugging some problems on a virtio host that supports only few dma segments per request, I noticed that the minphys function is not called in all code paths. It is only used for physio but not for other (cached) disk operations. An example trace is below. Am I missing something or is this as broken as I think it is? I thought the minphys function is a way for a driver to say that it does not support larger transfers and that transfers are split accordingly. What's the purpose of the minphys mechanism if it does not affect all code paths? Cheers, Stefan panic() at 0x1027bcf5 panic+0x1a5/0x1061006f0 vioblk_scsi_cmd() at 0x103ecccf vioblk_scsi_cmd+0x8ff/0x106100770 sdstart() at 0x103bc4d6 sdstart+0x176/0x1061007b0 scsi_ioh_runqueue() at 0x103b3c05 scsi_ioh_runqueue+0x55/0x1061007e0 scsi_xsh_runqueue() at 0x103b3d72 scsi_xsh_runqueue+0x132/0x106100800 sdstrategy() at 0x103bc6bf sdstrategy+0x10f/0x106100830 spec_strategy() at 0x102a8b3b spec_strategy+0x5b/0x106100850 VOP_STRATEGY() at 0x102a75be VOP_STRATEGY+0x2e/0x106100870 bio_doread() at 0x102991fc bio_doread+0xbc/0x1061008a0 bread() at 0x102992c3 bread+0x13/0x1061008c0 fillinusemap() at 0x102ad6d9 fillinusemap+0xb9/0x106100920 msdosfs_mountfs() at 0x102af78d msdosfs_mountfs+0x59d/0x106100990 msdosfs_mount() at 0x102afcf6 msdosfs_mount+0x3e6/0x106100b70 sys_mount() at 0x102a6a3c sys_mount+0x36c/0x106100d90
Re: minphys woes
are you expecting minphys to appear in that backtrace? On 29 Aug 2014, at 5:07, Stefan Fritsch s...@sfritsch.de wrote: Hi, while debugging some problems on a virtio host that supports only few dma segments per request, I noticed that the minphys function is not called in all code paths. It is only used for physio but not for other (cached) disk operations. An example trace is below. Am I missing something or is this as broken as I think it is? I thought the minphys function is a way for a driver to say that it does not support larger transfers and that transfers are split accordingly. What's the purpose of the minphys mechanism if it does not affect all code paths? Cheers, Stefan panic() at 0x1027bcf5 panic+0x1a5/0x1061006f0 vioblk_scsi_cmd() at 0x103ecccf vioblk_scsi_cmd+0x8ff/0x106100770 sdstart() at 0x103bc4d6 sdstart+0x176/0x1061007b0 scsi_ioh_runqueue() at 0x103b3c05 scsi_ioh_runqueue+0x55/0x1061007e0 scsi_xsh_runqueue() at 0x103b3d72 scsi_xsh_runqueue+0x132/0x106100800 sdstrategy() at 0x103bc6bf sdstrategy+0x10f/0x106100830 spec_strategy() at 0x102a8b3b spec_strategy+0x5b/0x106100850 VOP_STRATEGY() at 0x102a75be VOP_STRATEGY+0x2e/0x106100870 bio_doread() at 0x102991fc bio_doread+0xbc/0x1061008a0 bread() at 0x102992c3 bread+0x13/0x1061008c0 fillinusemap() at 0x102ad6d9 fillinusemap+0xb9/0x106100920 msdosfs_mountfs() at 0x102af78d msdosfs_mountfs+0x59d/0x106100990 msdosfs_mount() at 0x102afcf6 msdosfs_mount+0x3e6/0x106100b70 sys_mount() at 0x102a6a3c sys_mount+0x36c/0x106100d90
Re: minphys woes
On Friday 29 August 2014 10:34:20, David Gwynne wrote: are you expecting minphys to appear in that backtrace? No. In that trace minphys should have limited the transfer size and I had put an KASSERT() into vioblk_scsi_cmd() which verified that xs- datalen is not larger than that limit. A call to bread() causes the KASSERT to trigger.