Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On 20/12/18 09:26, Christoph Hellwig wrote: > On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote: >>> 1) reduce the size of every kernel with block layer support, and >>> even more for every kernel with scsi support >> >> By proposing the removal of bidi support from the block layer, it isn't >> just the SCSI subsystem that will be impacted. Those NVMe documents >> that you referred me to earlier in the year, in the command tables >> in 1.3c and earlier you have noticed the 2 bit direction field and >> what 11b means? Even if there aren't any bidi NVMe commands *** yet, >> the fact that NVMe's 64 byte command format has provision for 4 >> (not 2) independent data transfers (data + meta, for each direction). >> Surely NVMe will sooner or later take advantage of those ... a >> command like READ GATHERED comes to mind. > > NVMe on the other hand does have support for separate read and write > buffers as in the current SCSI bidi support, as it encodes the data > transfers in that SQE. So IFF NVMe does bidi commands it would have > to use a single buffer for data in/out, There is no such thing as "buffer" there is at first a bio, and after virtual-to-iommu mapping a scatter-gather-list. All these are currently governed by a struct request. request, bio, and sgl, have a single direction, All API's expect a single direction. All BIDI did was to say. Lets not change any API or structure but just use two of them at the same time. All the wiser is the very high level user, and the very low HW driver like iscsi. All the middlewere was never touched. In the view of a bidi target like say an osd. It all stream looks like a single "Buffer" on the wire, were some of it is read and some of it is written to. > which can be easily done ?? Did you try. It will take much more than an additional pointer sir > in the block layer without the current bidi support that chains > two struct request instances for data in and data out. > That was the all trick of not changing a single API or structure Just have two of the same thing, we already know how to handle >>> 2) reduce the size of the critical struct request structure by >>> 128 bits, thus reducing the memory used by every blk-mq driver >>> significantly, never mind the cache effects >> >> Hmm, one pointer (that is null in the non-bidi case) should be enough, >> that's 64 or 32 bits. > > Due to the way we use request chaining we need two fields at the > moment. ->special and ->next_rq. No! ->special is nothing to do with bidi. ->special is a field to be used by LLD's only and are not to be touched by block layer or transports or high level users. Request has the single ->next_rq for bidi. And could be eliminated by sharing space with the elevator info. Do you want a patch? (So in effect it can be taking 0 bytes, and yes a little bit of code) > If we'd refactor the whole thing > for the basically non-existent user we could indeed probably get it > down to a single pointer. > >> While on the subject of bidi, the order of transfers: is the data-out >> (to the target) always before the data-in or is it the target device >> that decides (depending on the semantics of the command) who is first? > > The way I read SAM data needs to be transferred to the device for > processing first, then the processing occurs and then it is transferred > out, so the order seems fixed. > Not sure what is the "SAM" above. But most of the BIDI commands I know, osd and otherwise, the order is command specific, and many times it is done in parallel. Read some bits than write some bits, rinse and repeat ... (You see in scsi the all OUT buffer is part of the actual CDB, so in effect any READ is a BIDI. The novelty here is the variable sizes and the SW stack memory targets for the different operations) >> >> Doug Gilbert >> >> *** there could already be vendor specific bidi NVMe commands out >> there (ditto for SCSI) > > For NVMe they'd need to transfer data in and out in the same buffer > to sort work, and even then only if we don't happen to be bounce > buffering using swiotlb, or using a network transport. Similarly for > SCSI only iSCSI at the moment supports bidi CDBs, so we could have > applications using vendor specific bidi commands on iSCSI, which > is exactly what we're trying to find out, but it is a bit of a very > niche use case. > Again bidi works NOW. Did not yet see the big gain, of throwing it out. Jai Maa Boaz
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On 19/12/18 16:43, Christoph Hellwig wrote: > On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote: >> On 11/11/18 15:32, Christoph Hellwig wrote: >>> The only real user of the T10 OSD protocol, the pNFS object layout >>> driver never went to the point of having shipping products, and we >>> removed it 1.5 years ago. Exofs is just a simple example without >>> real life users. >>> >> >> You have failed to say what is your motivation for this patchset? What >> is it you are trying to fix/improve. > > Drop basically unused support, which allows us to > > 1) reduce the size of every kernel with block layer support, and > even more for every kernel with scsi support Do you have numbers? its mainly code-segment so I don't think you will see any real life measurable difference. > 2) reduce the size of the critical struct request structure by > 128 bits, thus reducing the memory used by every blk-mq driver > significantly, never mind the cache effects 128 bits? I see the "struct request *next_rq;" is there another one? It could share space with elv; && flush; Do you want a patch? > 3) stop having the maintainance overhead for this code in the > block layer, which has been rather painful at times > I hear you man. Life is pain. But is it really such an overhead? I mean it is already implemented. What else is there to do? Please please show me? (Sorry for being slow) Jai Maa Boaz
RE: remove exofs, the T10 OSD code and block/scsi bidi support V3
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org ow...@vger.kernel.org> On Behalf Of Douglas Gilbert > Sent: Wednesday, December 19, 2018 9:02 PM > Subject: Re: remove exofs, the T10 OSD code and block/scsi bidi support V3 > ... > While on the subject of bidi, the order of transfers: is the data-out > (to the target) always before the data-in or is it the target device > that decides (depending on the semantics of the command) who is first? In SCSI, that was command-specific. Some necessitated intermixing the transfers, while others did all one direction before all the other direction. --- Robert Elliott, HPE Persistent Memory
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote: >> 1) reduce the size of every kernel with block layer support, and >> even more for every kernel with scsi support > > By proposing the removal of bidi support from the block layer, it isn't > just the SCSI subsystem that will be impacted. Those NVMe documents > that you referred me to earlier in the year, in the command tables > in 1.3c and earlier you have noticed the 2 bit direction field and > what 11b means? Even if there aren't any bidi NVMe commands *** yet, > the fact that NVMe's 64 byte command format has provision for 4 > (not 2) independent data transfers (data + meta, for each direction). > Surely NVMe will sooner or later take advantage of those ... a > command like READ GATHERED comes to mind. NVMe on the other hand does have support for separate read and write buffers as in the current SCSI bidi support, as it encodes the data transfers in that SQE. So IFF NVMe does bidi commands it would have to use a single buffer for data in/out, which can be easily done in the block layer without the current bidi support that chains two struct request instances for data in and data out. >> 2) reduce the size of the critical struct request structure by >> 128 bits, thus reducing the memory used by every blk-mq driver >> significantly, never mind the cache effects > > Hmm, one pointer (that is null in the non-bidi case) should be enough, > that's 64 or 32 bits. Due to the way we use request chaining we need two fields at the moment. ->special and ->next_rq. If we'd refactor the whole thing for the basically non-existent user we could indeed probably get it down to a single pointer. > While on the subject of bidi, the order of transfers: is the data-out > (to the target) always before the data-in or is it the target device > that decides (depending on the semantics of the command) who is first? The way I read SAM data needs to be transferred to the device for processing first, then the processing occurs and then it is transferred out, so the order seems fixed. > > Doug Gilbert > > *** there could already be vendor specific bidi NVMe commands out > there (ditto for SCSI) For NVMe they'd need to transfer data in and out in the same buffer to sort work, and even then only if we don't happen to be bounce buffering using swiotlb, or using a network transport. Similarly for SCSI only iSCSI at the moment supports bidi CDBs, so we could have applications using vendor specific bidi commands on iSCSI, which is exactly what we're trying to find out, but it is a bit of a very niche use case.
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On 2018-12-19 9:43 a.m., Christoph Hellwig wrote: On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote: On 11/11/18 15:32, Christoph Hellwig wrote: The only real user of the T10 OSD protocol, the pNFS object layout driver never went to the point of having shipping products, and we removed it 1.5 years ago. Exofs is just a simple example without real life users. You have failed to say what is your motivation for this patchset? What is it you are trying to fix/improve. Drop basically unused support, which allows us to 1) reduce the size of every kernel with block layer support, and even more for every kernel with scsi support By proposing the removal of bidi support from the block layer, it isn't just the SCSI subsystem that will be impacted. Those NVMe documents that you referred me to earlier in the year, in the command tables in 1.3c and earlier you have noticed the 2 bit direction field and what 11b means? Even if there aren't any bidi NVMe commands *** yet, the fact that NVMe's 64 byte command format has provision for 4 (not 2) independent data transfers (data + meta, for each direction). Surely NVMe will sooner or later take advantage of those ... a command like READ GATHERED comes to mind. 2) reduce the size of the critical struct request structure by 128 bits, thus reducing the memory used by every blk-mq driver significantly, never mind the cache effects Hmm, one pointer (that is null in the non-bidi case) should be enough, that's 64 or 32 bits. 3) stop having the maintainance overhead for this code in the block layer, which has been rather painful at times You won't get any sympathy from me :-) The sg driver is trying to inject _SCSI_ commands into the SCSI mid-level for onward processing by SCSI LLDs. So WTF does it have to deal with the block layer. While on the subject of bidi, the order of transfers: is the data-out (to the target) always before the data-in or is it the target device that decides (depending on the semantics of the command) who is first? Doug Gilbert *** there could already be vendor specific bidi NVMe commands out there (ditto for SCSI)
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote: > On 11/11/18 15:32, Christoph Hellwig wrote: > > The only real user of the T10 OSD protocol, the pNFS object layout > > driver never went to the point of having shipping products, and we > > removed it 1.5 years ago. Exofs is just a simple example without > > real life users. > > > > You have failed to say what is your motivation for this patchset? What > is it you are trying to fix/improve. Drop basically unused support, which allows us to 1) reduce the size of every kernel with block layer support, and even more for every kernel with scsi support 2) reduce the size of the critical struct request structure by 128 bits, thus reducing the memory used by every blk-mq driver significantly, never mind the cache effects 3) stop having the maintainance overhead for this code in the block layer, which has been rather painful at times
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On 11/11/18 15:32, Christoph Hellwig wrote: > The only real user of the T10 OSD protocol, the pNFS object layout > driver never went to the point of having shipping products, and we > removed it 1.5 years ago. Exofs is just a simple example without > real life users. > You have failed to say what is your motivation for this patchset? What is it you are trying to fix/improve. For the sake of "not used much" I fail to see the risk taking of this removal. > The code has been mostly unmaintained for years and is getting in the > way of block / SCSI changes, so I think it's finally time to drop it. > > Quote from Boaz: > > "As I said then. It is used in Universities for studies and experiments. > Every once in a while. I get an email with questions and reports. > > But yes feel free to remove the all thing!! > > I guess I can put it up on github. In a public tree. > > Just that I will need to forward port it myself, til now you guys > been doing this for me ;-)" > Yes I wrote that for V1. But I wrote the *opposite* thing in a later mail. Which nullifies this statement above. So please remove this quote in future submits. Here is what I wrote later as response of V2 of this set: I think I'm changing my mind about this. Because of two reasons: One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers that are much older and fat-and-ugliness consuming then the clean osd stack. For example the all ISA bus and ZONE_DMA stuff. Two: I have offered many times, every time this came up. That if anyone has a major (or even minor) change to the block and/or scsi layers that he/she has done. And that now breaks the compilation/run time of OSD or exofs. I'm personally willing to spend my weekends and fix it myself. Send me a URL of the tree with the work done, and I will send the patches needed to revitalize OSD/exofs as part of that change set. I have never received any such requests to date. So I would please like to protest on two of Christoph's statements above. "The code has been mostly unmaintained for years and is getting in the way of block / SCSI changes" 1. What does "maintained" means? I have for all these years been immediately responsive to any inquiries and patches to the code in question. And am listed as MAINTAINER of this code. 2. I have regularly, for ever, every kernel release around the RC3-RC4 time frame, compiled and ran my almost automatic setup and made sure the things still run as expected (No regressions). So Yes the code has not seen any new fixtures for years. But it is regularly tested and proven to work, on latest kernel. So it fails the definition of a "bit rot" Christoph you've been saying for so long "getting in the way of block/SCSI changes". And every time and again this time please tell me, you never answered before. What are these changes you want to make? can I please help? Send me any tree where exofs/osd compilation is broken and I will personally fix it in "ONE WEEK" time. (If compilation is fine but you know runtime will break, its nice to have an heads up, but if not my automatic system will detect it anyway) Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS compilation, and I failed to respond with a fix within "ONE WEEK". Then this goes in as part of that change-set. And not with the argument of "Not used, not maintained" - But as "Breaks compilation of the following changes" I promise I will gladly ACK it then. So for now. A personal NACK from me on the grounds that. You never told me why / what this is breaking. Thanks Boaz > Now the last time this caused a bit of a stir, but still no actual users, > not even for SG_IO passthrough commands. So here we go again, this time > including removing everything in the scsi and block layer supporting it, > and thus shrinking struct request. > Again. T10-OSD or not. Bidi is currently actively used. By Linus rules You are not allowed to remove it. Two use paths: 1. Management CDBS of private vendors yes via iscsi. virt_io and usb-scsi 2. Target mode support of WRITE-RETURN-XOR, and COMPARE_AND_WRITE --- You guys should do what you feel best. Even not answering my questions and of course not agreeing with my advise, .i.e about breaking people's setups. But please remove the wrong quote from me. Please quote my objection of the matter. (pretty please because you may surly ignore that request as well) [I am not fighting about this at all. Please do what you need to do. Just want to set the record strait that's all] Cheers :-) Boaz
Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
On 11/11/18 15:32, Christoph Hellwig wrote: > The only real user of the T10 OSD protocol, the pNFS object layout > driver never went to the point of having shipping products, and we > removed it 1.5 years ago. Exofs is just a simple example without > real life users. > You have failed to say what is your motivation for this patchset? What is it you are trying to fix/improve. For the sake of "not used much" I fail to see the risk taking of this removal. > The code has been mostly unmaintained for years and is getting in the > way of block / SCSI changes, so I think it's finally time to drop it. > > Quote from Boaz: > > "As I said then. It is used in Universities for studies and experiments. > Every once in a while. I get an email with questions and reports. > > But yes feel free to remove the all thing!! > > I guess I can put it up on github. In a public tree. > > Just that I will need to forward port it myself, til now you guys > been doing this for me ;-)" > Yes I wrote that for V1. But I wrote the *opposite* thing in a later mail. Which nullifies this statement above. So please remove this quote in future submits. Here is what I wrote later as response of V2 of this set: I think I'm changing my mind about this. Because of two reasons: One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers that are much older and fat-and-ugliness consuming then the clean osd stack. For example the all ISA bus and ZONE_DMA stuff. Two: I have offered many times, every time this came up. That if anyone has a major (or even minor) change to the block and/or scsi layers that he/she has done. And that now breaks the compilation/run time of OSD or exofs. I'm personally willing to spend my weekends and fix it myself. Send me a URL of the tree with the work done, and I will send the patches needed to revitalize OSD/exofs as part of that change set. I have never received any such requests to date. So I would please like to protest on two of Christoph's statements above. "The code has been mostly unmaintained for years and is getting in the way of block / SCSI changes" 1. What does "maintained" means? I have for all these years been immediately responsive to any inquiries and patches to the code in question. And am listed as MAINTAINER of this code. 2. I have regularly, for ever, every kernel release around the RC3-RC4 time frame, compiled and ran my almost automatic setup and made sure the things still run as expected (No regressions). So Yes the code has not seen any new fixtures for years. But it is regularly tested and proven to work, on latest kernel. So it fails the definition of a "bit rot" Christoph you've been saying for so long "getting in the way of block/SCSI changes". And every time and again this time please tell me, you never answered before. What are these changes you want to make? can I please help? Send me any tree where exofs/osd compilation is broken and I will personally fix it in "ONE WEEK" time. (If compilation is fine but you know runtime will break, its nice to have an heads up, but if not my automatic system will detect it anyway) Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS compilation, and I failed to respond with a fix within "ONE WEEK". Then this goes in as part of that change-set. And not with the argument of "Not used, not maintained" - But as "Breaks compilation of the following changes" I promise I will gladly ACK it then. So for now. A personal NACK from me on the grounds that. You never told me why / what this is breaking. Thanks Boaz > Now the last time this caused a bit of a stir, but still no actual users, > not even for SG_IO passthrough commands. So here we go again, this time > including removing everything in the scsi and block layer supporting it, > and thus shrinking struct request. > Again. T10-OSD or not. Bidi is currently actively used. By Linus rules You are not allowed to remove it. Two use paths: 1. Management CDBS of private vendors yes via iscsi. virt_io and usb-scsi 2. Target mode support of WRITE-RETURN-XOR, and COMPARE_AND_WRITE --- You guys should do what you feel best. Even not answering my questions and of course not agreeing with my advise, .i.e about breaking people's setups. But please remove the wrong quote from me. Please quote my objection of the matter. (pretty please because you may surly ignore that request as well) [I am not fighting about this at all. Please do what you need to do. Just want to set the record strait that's all] Cheers :-) Boaz