Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-23 Thread Boaz Harrosh
On 20/12/18 09:26, Christoph Hellwig wrote:
> On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote:
>>>   1) reduce the size of every kernel with block layer support, and
>>>  even more for every kernel with scsi support
>>
>> By proposing the removal of bidi support from the block layer, it isn't
>> just the SCSI subsystem that will be impacted. Those NVMe documents
>> that you referred me to earlier in the year, in the command tables
>> in 1.3c and earlier you have noticed the 2 bit direction field and
>> what 11b means? Even if there aren't any bidi NVMe commands *** yet,
>> the fact that NVMe's 64 byte command format has provision for 4
>> (not 2) independent data transfers (data + meta, for each direction).
>> Surely NVMe will sooner or later take advantage of those ... a
>> command like READ GATHERED comes to mind.
> 
> NVMe on the other hand does have support for separate read and write
> buffers as in the current SCSI bidi support, as it encodes the data
> transfers in that SQE.  So IFF NVMe does bidi commands it would have
> to use a single buffer for data in/out, 

There is no such thing as "buffer" there is at first a bio, and after
virtual-to-iommu mapping a scatter-gather-list. All these are currently
governed by a struct request.
request, bio, and sgl, have a single direction, All API's expect a single
direction.

All BIDI did was to say. Lets not change any API or structure but just
use two of them at the same time.

All the wiser is the very high level user, and the very low HW driver like
iscsi. All the middlewere was never touched.

In the view of a bidi target like say an osd. It all stream looks like a single
"Buffer" on the wire, were some of it is read and some of it is written
to.

> which can be easily done

?? Did you try. It will take much more than an additional pointer sir

> in the block layer without the current bidi support that chains
> two struct request instances for data in and data out.
> 

That was the all trick of not changing a single API or structure
Just have two of the same thing, we already know how to handle

>>>   2) reduce the size of the critical struct request structure by
>>>  128 bits, thus reducing the memory used by every blk-mq driver
>>>  significantly, never mind the cache effects
>>
>> Hmm, one pointer (that is null in the non-bidi case) should be enough,
>> that's 64 or 32 bits.
> 
> Due to the way we use request chaining we need two fields at the
> moment.  ->special and ->next_rq.  

No! ->special is nothing to do with bidi. ->special is a field to be
used by LLD's only and are not to be touched by block layer or transports
or high level users.
 
Request has the single ->next_rq for bidi. And could be eliminated by
sharing space with the elevator info. Do you want a patch?

(So in effect it can be taking 0 bytes, and yes a little bit of code)

> If we'd refactor the whole thing
> for the basically non-existent user we could indeed probably get it
> down to a single pointer. 
> 
>> While on the subject of bidi, the order of transfers: is the data-out
>> (to the target) always before the data-in or is it the target device
>> that decides (depending on the semantics of the command) who is first?
> 
> The way I read SAM data needs to be transferred to the device for
> processing first, then the processing occurs and then it is transferred
> out, so the order seems fixed.
> 

Not sure what is the "SAM" above. But most of the BIDI commands I know,
osd and otherwise, the order is command specific, and many times it is
done in parallel.
Read some bits than write some bits, rinse and repeat ...

(You see in scsi the all OUT buffer is part of the actual CDB, so in effect
 any READ is a BIDI. The novelty here is the variable sizes and the SW stack
 memory targets for the different operations)

>>
>> Doug Gilbert
>>
>>  *** there could already be vendor specific bidi NVMe commands out
>> there (ditto for SCSI)
> 
> For NVMe they'd need to transfer data in and out in the same buffer
> to sort work, and even then only if we don't happen to be bounce
> buffering using swiotlb, or using a network transport.  Similarly for
> SCSI only iSCSI at the moment supports bidi CDBs, so we could have
> applications using vendor specific bidi commands on iSCSI, which
> is exactly what we're trying to find out, but it is a bit of a very
> niche use case.
> 

Again bidi works NOW. Did not yet see the big gain, of throwing it
out.

Jai Maa
Boaz



Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-23 Thread Boaz Harrosh
On 19/12/18 16:43, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote:
>> On 11/11/18 15:32, Christoph Hellwig wrote:
>>> The only real user of the T10 OSD protocol, the pNFS object layout
>>> driver never went to the point of having shipping products, and we
>>> removed it 1.5 years ago.  Exofs is just a simple example without
>>> real life users.
>>>
>>
>> You have failed to say what is your motivation for this patchset? What
>> is it you are trying to fix/improve.
> 
> Drop basically unused support, which allows us to
> 
>  1) reduce the size of every kernel with block layer support, and
> even more for every kernel with scsi support

Do you have numbers? its mainly code-segment so I don't think you will see
any real life measurable difference.

>  2) reduce the size of the critical struct request structure by
> 128 bits, thus reducing the memory used by every blk-mq driver
> significantly, never mind the cache effects

128 bits? I see the "struct request *next_rq;"
is there another one?

It could share space with elv; && flush;
Do you want a patch?

>  3) stop having the maintainance overhead for this code in the
> block layer, which has been rather painful at times
> 

I hear you man. Life is pain. But is it really such an overhead?
I mean it is already implemented. What else is there to do?
Please please show me? (Sorry for being slow)

Jai Maa
Boaz



RE: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-20 Thread Elliott, Robert (Persistent Memory)


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Douglas Gilbert
> Sent: Wednesday, December 19, 2018 9:02 PM
> Subject: Re: remove exofs, the T10 OSD code and block/scsi bidi support V3
> 
... 
> While on the subject of bidi, the order of transfers: is the data-out
> (to the target) always before the data-in or is it the target device
> that decides (depending on the semantics of the command) who is first?

In SCSI, that was command-specific. Some necessitated intermixing the
transfers, while others did all one direction before all the other
direction.

---
Robert Elliott, HPE Persistent Memory






Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-19 Thread Christoph Hellwig
On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote:
>>   1) reduce the size of every kernel with block layer support, and
>>  even more for every kernel with scsi support
>
> By proposing the removal of bidi support from the block layer, it isn't
> just the SCSI subsystem that will be impacted. Those NVMe documents
> that you referred me to earlier in the year, in the command tables
> in 1.3c and earlier you have noticed the 2 bit direction field and
> what 11b means? Even if there aren't any bidi NVMe commands *** yet,
> the fact that NVMe's 64 byte command format has provision for 4
> (not 2) independent data transfers (data + meta, for each direction).
> Surely NVMe will sooner or later take advantage of those ... a
> command like READ GATHERED comes to mind.

NVMe on the other hand does have support for separate read and write
buffers as in the current SCSI bidi support, as it encodes the data
transfers in that SQE.  So IFF NVMe does bidi commands it would have
to use a single buffer for data in/out, which can be easily done
in the block layer without the current bidi support that chains
two struct request instances for data in and data out.

>>   2) reduce the size of the critical struct request structure by
>>  128 bits, thus reducing the memory used by every blk-mq driver
>>  significantly, never mind the cache effects
>
> Hmm, one pointer (that is null in the non-bidi case) should be enough,
> that's 64 or 32 bits.

Due to the way we use request chaining we need two fields at the
moment.  ->special and ->next_rq.  If we'd refactor the whole thing
for the basically non-existent user we could indeed probably get it
down to a single pointer. 

> While on the subject of bidi, the order of transfers: is the data-out
> (to the target) always before the data-in or is it the target device
> that decides (depending on the semantics of the command) who is first?

The way I read SAM data needs to be transferred to the device for
processing first, then the processing occurs and then it is transferred
out, so the order seems fixed.

>
> Doug Gilbert
>
>  *** there could already be vendor specific bidi NVMe commands out
> there (ditto for SCSI)

For NVMe they'd need to transfer data in and out in the same buffer
to sort work, and even then only if we don't happen to be bounce
buffering using swiotlb, or using a network transport.  Similarly for
SCSI only iSCSI at the moment supports bidi CDBs, so we could have
applications using vendor specific bidi commands on iSCSI, which
is exactly what we're trying to find out, but it is a bit of a very
niche use case.


Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-19 Thread Douglas Gilbert

On 2018-12-19 9:43 a.m., Christoph Hellwig wrote:

On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote:

On 11/11/18 15:32, Christoph Hellwig wrote:

The only real user of the T10 OSD protocol, the pNFS object layout
driver never went to the point of having shipping products, and we
removed it 1.5 years ago.  Exofs is just a simple example without
real life users.



You have failed to say what is your motivation for this patchset? What
is it you are trying to fix/improve.


Drop basically unused support, which allows us to

  1) reduce the size of every kernel with block layer support, and
 even more for every kernel with scsi support


By proposing the removal of bidi support from the block layer, it isn't
just the SCSI subsystem that will be impacted. Those NVMe documents
that you referred me to earlier in the year, in the command tables
in 1.3c and earlier you have noticed the 2 bit direction field and
what 11b means? Even if there aren't any bidi NVMe commands *** yet,
the fact that NVMe's 64 byte command format has provision for 4
(not 2) independent data transfers (data + meta, for each direction).
Surely NVMe will sooner or later take advantage of those ... a
command like READ GATHERED comes to mind.


  2) reduce the size of the critical struct request structure by
 128 bits, thus reducing the memory used by every blk-mq driver
 significantly, never mind the cache effects


Hmm, one pointer (that is null in the non-bidi case) should be enough,
that's 64 or 32 bits.


  3) stop having the maintainance overhead for this code in the
 block layer, which has been rather painful at times


You won't get any sympathy from me :-) The sg driver is trying to
inject _SCSI_ commands into the SCSI mid-level for onward processing
by SCSI LLDs. So WTF does it have to deal with the block layer.


While on the subject of bidi, the order of transfers: is the data-out
(to the target) always before the data-in or is it the target device
that decides (depending on the semantics of the command) who is first?

Doug Gilbert

 *** there could already be vendor specific bidi NVMe commands out
there (ditto for SCSI)


Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-19 Thread Christoph Hellwig
On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote:
> On 11/11/18 15:32, Christoph Hellwig wrote:
> > The only real user of the T10 OSD protocol, the pNFS object layout
> > driver never went to the point of having shipping products, and we
> > removed it 1.5 years ago.  Exofs is just a simple example without
> > real life users.
> > 
> 
> You have failed to say what is your motivation for this patchset? What
> is it you are trying to fix/improve.

Drop basically unused support, which allows us to

 1) reduce the size of every kernel with block layer support, and
even more for every kernel with scsi support
 2) reduce the size of the critical struct request structure by
128 bits, thus reducing the memory used by every blk-mq driver
significantly, never mind the cache effects
 3) stop having the maintainance overhead for this code in the
block layer, which has been rather painful at times


Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-11-26 Thread Boaz Harrosh
On 11/11/18 15:32, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 

You have failed to say what is your motivation for this patchset? What
is it you are trying to fix/improve.

For the sake of "not used much" I fail to see the risk taking of
this removal.

> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 
> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 

Yes I wrote that for V1. But I wrote the *opposite* thing in a later mail.
Which nullifies this statement above. So please remove this quote in future
submits.

Here is what I wrote later as response of V2 of this set:



I think I'm changing my mind about this.

Because of two reasons:

One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers
that are much older and fat-and-ugliness consuming then the clean osd
stack. For example the all ISA bus and ZONE_DMA stuff.

Two: I have offered many times, every time this came up. That if
anyone has a major (or even minor) change to the block and/or scsi layers
that he/she has done. And that now breaks the compilation/run time of
OSD or exofs.
  I'm personally willing to spend my weekends and fix it myself. Send me
  a URL of the tree with the work done, and I will send the patches needed
  to revitalize OSD/exofs as part of that change set.

I have never received any such requests to date.

So I would please like to protest on two of Christoph's statements above.

"The code has been mostly unmaintained for years and is getting in the
 way of block / SCSI changes"

1. What does "maintained" means? I have for all these years been immediately
   responsive to any inquiries and patches to the code in question.
   And am listed as MAINTAINER of this code.
2. I have regularly, for ever, every kernel release around the RC3-RC4
   time frame, compiled and ran my almost automatic setup and made sure
   the things still run as expected (No regressions).

So Yes the code has not seen any new fixtures for years. But it is regularly
tested and proven to work, on latest kernel. So it fails the definition 
of a "bit rot"

Christoph you've been saying for so long "getting in the way of block/SCSI
changes". And every time and again this time please tell me, you never answered
before. What are these changes you want to make? can I please help?

Send me any tree where exofs/osd compilation is broken and I will personally
fix it in "ONE WEEK" time.

(If compilation is fine but you know runtime will break, its nice to have an
 heads up, but if not my automatic system will detect it anyway)

Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS
compilation, and I failed to respond with a fix within "ONE WEEK". Then
this goes in as part of that change-set. And not with the argument of
"Not used, not maintained" - But as "Breaks compilation of the following 
changes"
I promise I will gladly ACK it then.

So for now. A personal NACK from me on the grounds that. You never told me
why / what this is breaking.

Thanks
Boaz



> Now the last time this caused a bit of a stir, but still no actual users,
> not even for SG_IO passthrough commands.  So here we go again, this time
> including removing everything in the scsi and block layer supporting it,
> and thus shrinking struct request.
> 

Again. T10-OSD or not. Bidi is currently actively used. By Linus rules
You are not allowed to remove it.

Two use paths:
1. Management CDBS of private vendors yes via iscsi. virt_io and usb-scsi
2. Target mode support of WRITE-RETURN-XOR, and COMPARE_AND_WRITE

---
You guys should do what you feel best. Even not answering my questions and
of course not agreeing with my advise, .i.e about breaking people's setups.

But please remove the wrong quote from me. Please quote my objection
of the matter. (pretty please because you may surly ignore that request as
well)

[I am not fighting about this at all. Please do what you need to do.
 Just want to set the record strait that's all]

Cheers :-)
Boaz



Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-11-26 Thread Boaz Harrosh
On 11/11/18 15:32, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 

You have failed to say what is your motivation for this patchset? What
is it you are trying to fix/improve.

For the sake of "not used much" I fail to see the risk taking of
this removal.

> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 
> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 

Yes I wrote that for V1. But I wrote the *opposite* thing in a later mail.
Which nullifies this statement above. So please remove this quote in future
submits.

Here is what I wrote later as response of V2 of this set:



I think I'm changing my mind about this.

Because of two reasons:

One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers
that are much older and fat-and-ugliness consuming then the clean osd
stack. For example the all ISA bus and ZONE_DMA stuff.

Two: I have offered many times, every time this came up. That if
anyone has a major (or even minor) change to the block and/or scsi layers
that he/she has done. And that now breaks the compilation/run time of
OSD or exofs.
  I'm personally willing to spend my weekends and fix it myself. Send me
  a URL of the tree with the work done, and I will send the patches needed
  to revitalize OSD/exofs as part of that change set.

I have never received any such requests to date.

So I would please like to protest on two of Christoph's statements above.

"The code has been mostly unmaintained for years and is getting in the
 way of block / SCSI changes"

1. What does "maintained" means? I have for all these years been immediately
   responsive to any inquiries and patches to the code in question.
   And am listed as MAINTAINER of this code.
2. I have regularly, for ever, every kernel release around the RC3-RC4
   time frame, compiled and ran my almost automatic setup and made sure
   the things still run as expected (No regressions).

So Yes the code has not seen any new fixtures for years. But it is regularly
tested and proven to work, on latest kernel. So it fails the definition 
of a "bit rot"

Christoph you've been saying for so long "getting in the way of block/SCSI
changes". And every time and again this time please tell me, you never answered
before. What are these changes you want to make? can I please help?

Send me any tree where exofs/osd compilation is broken and I will personally
fix it in "ONE WEEK" time.

(If compilation is fine but you know runtime will break, its nice to have an
 heads up, but if not my automatic system will detect it anyway)

Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS
compilation, and I failed to respond with a fix within "ONE WEEK". Then
this goes in as part of that change-set. And not with the argument of
"Not used, not maintained" - But as "Breaks compilation of the following 
changes"
I promise I will gladly ACK it then.

So for now. A personal NACK from me on the grounds that. You never told me
why / what this is breaking.

Thanks
Boaz



> Now the last time this caused a bit of a stir, but still no actual users,
> not even for SG_IO passthrough commands.  So here we go again, this time
> including removing everything in the scsi and block layer supporting it,
> and thus shrinking struct request.
> 

Again. T10-OSD or not. Bidi is currently actively used. By Linus rules
You are not allowed to remove it.

Two use paths:
1. Management CDBS of private vendors yes via iscsi. virt_io and usb-scsi
2. Target mode support of WRITE-RETURN-XOR, and COMPARE_AND_WRITE

---
You guys should do what you feel best. Even not answering my questions and
of course not agreeing with my advise, .i.e about breaking people's setups.

But please remove the wrong quote from me. Please quote my objection
of the matter. (pretty please because you may surly ignore that request as
well)

[I am not fighting about this at all. Please do what you need to do.
 Just want to set the record strait that's all]

Cheers :-)
Boaz