RE: Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Daejun Park
> > On Mon, 2020-12-07 at 19:35 +0100, Greg KH wrote:
> > > On Mon, Dec 07, 2020 at 06:26:03PM +, Christoph Hellwig wrote:
> > > > On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> > > > > What "real workload" test can be run on this to help show if it
> > > > > is useful or not?  These vendors seem to think it helps for some
> > > > > reason, otherwise they wouldn't have added it to their silicon :)
> > > > > 
> > > > > Should they run fio?  If so, any hints on a config that would be
> > > > > good to show any performance increases?
> > > > 
> > > > A real actual workload that matters.  Then again that was Martins
> > > > request to even justify it.  I don't think the broken addressing
> > > > that breaks a whole in the SCSI addressing has absolutely not
> > > > business being supported in Linux ever.  The vendors should have
> > > > thought about the design before committing transistors to something
> > > > that fundamentally does not make sense.
> > 
> > Actually, that's not the way it works: vendors add commands because
> > standards mandate.  That's why people who want weird commands go and
> > join standard committees.  Unfortunately this means that a lot of the
> > commands the standard mandates end up not being very useful in
> > practice.  For instance in SCSI we really only implement a fraction of
> > the commands in the standard.
> > 
> > In this case, the industry already tried a very similar approach with
> > GEN 1 hybrid drives and it turned into a complete disaster, which is
> > why the mode became optional in shingle drives and much better modes,
> > which didn't have the huge shared state problem, superseded it.  Plus
> > truncating the LBA of a READ 16 to 4 bytes is asking for capacity
> > problems down the line, so even the actual implementation seems to be
> > problematic.
> > 
> > All in all, this looks like a short term fix which will go away when
> > the drive capacity improves and thus all the effort changing the driver
> > will eventually be wasted.
> 
> "short term" in the embedded world means "this device is stuck with this
> chip for the next 8 years", it's not like a storage device you can
> replace, so this might be different than the shingle drive mess.  Also,
> I see many old SoCs still showing up in brand new devices many many
> years after they were first introduced, on-chip storage controllers is
> something we need to support well if we don't want to see huge
> out-of-tree patchsets like UFS traditionally has been lugging around for
> many years.
> 
> > > So "time to boot an android system with this enabled and disabled"
> > > would be a valid workload, right?  I'm guessing that's what the
> > > vendors here actually care about, otherwise there is no real stress-
> > > test on a UFS system that I know of.
> > 
> > Um, does it?  I don't believe even the UFS people have claimed this. 
> > The problem is that HPB creates a shared state between the driver and
> > the device.  That shared state has to be populated, which has to happen
> > at start of day, so it's entirely unclear if this is a win or a slow
> > down for boot.
> 
> Ok, showing that this actually matters is a good rule, Daejun, can you
> provide that if you resubmit this patchset?
> 

Sure, I will find out the case which has performance benefit by HPB.

Thanks,
Daejun


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Greg KH
On Mon, Dec 07, 2020 at 10:54:58AM -0800, James Bottomley wrote:
> On Mon, 2020-12-07 at 19:35 +0100, Greg KH wrote:
> > On Mon, Dec 07, 2020 at 06:26:03PM +, Christoph Hellwig wrote:
> > > On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> > > > What "real workload" test can be run on this to help show if it
> > > > is useful or not?  These vendors seem to think it helps for some
> > > > reason, otherwise they wouldn't have added it to their silicon :)
> > > > 
> > > > Should they run fio?  If so, any hints on a config that would be
> > > > good to show any performance increases?
> > > 
> > > A real actual workload that matters.  Then again that was Martins
> > > request to even justify it.  I don't think the broken addressing
> > > that breaks a whole in the SCSI addressing has absolutely not
> > > business being supported in Linux ever.  The vendors should have
> > > thought about the design before committing transistors to something
> > > that fundamentally does not make sense.
> 
> Actually, that's not the way it works: vendors add commands because
> standards mandate.  That's why people who want weird commands go and
> join standard committees.  Unfortunately this means that a lot of the
> commands the standard mandates end up not being very useful in
> practice.  For instance in SCSI we really only implement a fraction of
> the commands in the standard.
> 
> In this case, the industry already tried a very similar approach with
> GEN 1 hybrid drives and it turned into a complete disaster, which is
> why the mode became optional in shingle drives and much better modes,
> which didn't have the huge shared state problem, superseded it.  Plus
> truncating the LBA of a READ 16 to 4 bytes is asking for capacity
> problems down the line, so even the actual implementation seems to be
> problematic.
> 
> All in all, this looks like a short term fix which will go away when
> the drive capacity improves and thus all the effort changing the driver
> will eventually be wasted.

"short term" in the embedded world means "this device is stuck with this
chip for the next 8 years", it's not like a storage device you can
replace, so this might be different than the shingle drive mess.  Also,
I see many old SoCs still showing up in brand new devices many many
years after they were first introduced, on-chip storage controllers is
something we need to support well if we don't want to see huge
out-of-tree patchsets like UFS traditionally has been lugging around for
many years.

> > So "time to boot an android system with this enabled and disabled"
> > would be a valid workload, right?  I'm guessing that's what the
> > vendors here actually care about, otherwise there is no real stress-
> > test on a UFS system that I know of.
> 
> Um, does it?  I don't believe even the UFS people have claimed this. 
> The problem is that HPB creates a shared state between the driver and
> the device.  That shared state has to be populated, which has to happen
> at start of day, so it's entirely unclear if this is a win or a slow
> down for boot.

Ok, showing that this actually matters is a good rule, Daejun, can you
provide that if you resubmit this patchset?

thanks,

greg k-h


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread James Bottomley
On Mon, 2020-12-07 at 19:35 +0100, Greg KH wrote:
> On Mon, Dec 07, 2020 at 06:26:03PM +, Christoph Hellwig wrote:
> > On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> > > What "real workload" test can be run on this to help show if it
> > > is useful or not?  These vendors seem to think it helps for some
> > > reason, otherwise they wouldn't have added it to their silicon :)
> > > 
> > > Should they run fio?  If so, any hints on a config that would be
> > > good to show any performance increases?
> > 
> > A real actual workload that matters.  Then again that was Martins
> > request to even justify it.  I don't think the broken addressing
> > that breaks a whole in the SCSI addressing has absolutely not
> > business being supported in Linux ever.  The vendors should have
> > thought about the design before committing transistors to something
> > that fundamentally does not make sense.

Actually, that's not the way it works: vendors add commands because
standards mandate.  That's why people who want weird commands go and
join standard committees.  Unfortunately this means that a lot of the
commands the standard mandates end up not being very useful in
practice.  For instance in SCSI we really only implement a fraction of
the commands in the standard.

In this case, the industry already tried a very similar approach with
GEN 1 hybrid drives and it turned into a complete disaster, which is
why the mode became optional in shingle drives and much better modes,
which didn't have the huge shared state problem, superseded it.  Plus
truncating the LBA of a READ 16 to 4 bytes is asking for capacity
problems down the line, so even the actual implementation seems to be
problematic.

All in all, this looks like a short term fix which will go away when
the drive capacity improves and thus all the effort changing the driver
will eventually be wasted.

> So "time to boot an android system with this enabled and disabled"
> would be a valid workload, right?  I'm guessing that's what the
> vendors here actually care about, otherwise there is no real stress-
> test on a UFS system that I know of.

Um, does it?  I don't believe even the UFS people have claimed this. 
The problem is that HPB creates a shared state between the driver and
the device.  That shared state has to be populated, which has to happen
at start of day, so it's entirely unclear if this is a win or a slow
down for boot.

James




Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Greg KH
On Mon, Dec 07, 2020 at 07:35:03PM +0100, Greg KH wrote:
> On Mon, Dec 07, 2020 at 06:26:03PM +, Christoph Hellwig wrote:
> > On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> > > What "real workload" test can be run on this to help show if it is
> > > useful or not?  These vendors seem to think it helps for some reason,
> > > otherwise they wouldn't have added it to their silicon :)
> > > 
> > > Should they run fio?  If so, any hints on a config that would be good to
> > > show any performance increases?
> > 
> > A real actual workload that matters.  Then again that was Martins
> > request to even justify it.  I don't think the broken addressing that
> > breaks a whole in the SCSI addressing has absolutely not business being
> > supported in Linux ever.  The vendors should have thought about the
> > design before committing transistors to something that fundamentally
> > does not make sense.
> 
> So "time to boot an android system with this enabled and disabled" would
> be a valid workload, right?  I'm guessing that's what the vendors here
> actually care about, otherwise there is no real stress-test on a UFS
> system that I know of.

Oh, and "supporting stupid hardware specs" is what we do here all the
time, you know that :)

If someone is foolish enough to build it, we usually have to support the
thing, especially if someone else here is willing to do that.  I don't
see where the addressing is "broken", which patch causes that to happen?

thanks,

greg k-h


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Greg KH
On Mon, Dec 07, 2020 at 06:26:03PM +, Christoph Hellwig wrote:
> On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> > What "real workload" test can be run on this to help show if it is
> > useful or not?  These vendors seem to think it helps for some reason,
> > otherwise they wouldn't have added it to their silicon :)
> > 
> > Should they run fio?  If so, any hints on a config that would be good to
> > show any performance increases?
> 
> A real actual workload that matters.  Then again that was Martins
> request to even justify it.  I don't think the broken addressing that
> breaks a whole in the SCSI addressing has absolutely not business being
> supported in Linux ever.  The vendors should have thought about the
> design before committing transistors to something that fundamentally
> does not make sense.

So "time to boot an android system with this enabled and disabled" would
be a valid workload, right?  I'm guessing that's what the vendors here
actually care about, otherwise there is no real stress-test on a UFS
system that I know of.

thanks,

greg k-h


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Christoph Hellwig
On Mon, Dec 07, 2020 at 07:23:12PM +0100, Greg KH wrote:
> What "real workload" test can be run on this to help show if it is
> useful or not?  These vendors seem to think it helps for some reason,
> otherwise they wouldn't have added it to their silicon :)
> 
> Should they run fio?  If so, any hints on a config that would be good to
> show any performance increases?

A real actual workload that matters.  Then again that was Martins
request to even justify it.  I don't think the broken addressing that
breaks a whole in the SCSI addressing has absolutely not business being
supported in Linux ever.  The vendors should have thought about the
design before committing transistors to something that fundamentally
does not make sense.


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Greg KH
On Mon, Dec 07, 2020 at 06:06:55PM +, Christoph Hellwig wrote:
> On Mon, Dec 07, 2020 at 06:56:23PM +0100, Greg KH wrote:
> > On Tue, Nov 03, 2020 at 01:40:21PM +0900, Daejun Park wrote:
> > > Changelog:
> > > 
> > > v12 -> v13
> > > 1. Cleanup codes by comments from Can Guo.
> > > 2. Add HPB related descriptor/flag/attributes in sysfs.
> > > 3. Change base commit from 5.10/scsi-queue to 5.11/scsi-queue.
> > 
> > What ever happened to this patchset?  Did it get merged into a scsi tree
> > for 5.11-rc1, or is there something still pending that needs to be done
> > on it to make it acceptable?
> 
> I think the problem here is not the code, but that the features is
> fundamentally a bad idea, and one that so far has not even shown
> to help real workloads vs the usual benchmarketing.

What "real workload" test can be run on this to help show if it is
useful or not?  These vendors seem to think it helps for some reason,
otherwise they wouldn't have added it to their silicon :)

Should they run fio?  If so, any hints on a config that would be good to
show any performance increases?

thanks,

greg k-h


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Christoph Hellwig
On Mon, Dec 07, 2020 at 06:56:23PM +0100, Greg KH wrote:
> On Tue, Nov 03, 2020 at 01:40:21PM +0900, Daejun Park wrote:
> > Changelog:
> > 
> > v12 -> v13
> > 1. Cleanup codes by comments from Can Guo.
> > 2. Add HPB related descriptor/flag/attributes in sysfs.
> > 3. Change base commit from 5.10/scsi-queue to 5.11/scsi-queue.
> 
> What ever happened to this patchset?  Did it get merged into a scsi tree
> for 5.11-rc1, or is there something still pending that needs to be done
> on it to make it acceptable?

I think the problem here is not the code, but that the features is
fundamentally a bad idea, and one that so far has not even shown
to help real workloads vs the usual benchmarketing.


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-12-07 Thread Greg KH
On Tue, Nov 03, 2020 at 01:40:21PM +0900, Daejun Park wrote:
> Changelog:
> 
> v12 -> v13
> 1. Cleanup codes by comments from Can Guo.
> 2. Add HPB related descriptor/flag/attributes in sysfs.
> 3. Change base commit from 5.10/scsi-queue to 5.11/scsi-queue.

What ever happened to this patchset?  Did it get merged into a scsi tree
for 5.11-rc1, or is there something still pending that needs to be done
on it to make it acceptable?

thanks,

greg k-h


Re: [PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-11-05 Thread Can Guo

On 2020-11-03 12:40, Daejun Park wrote:

Changelog:

v12 -> v13
1. Cleanup codes by comments from Can Guo.
2. Add HPB related descriptor/flag/attributes in sysfs.
3. Change base commit from 5.10/scsi-queue to 5.11/scsi-queue.



If you have changed the code based by comments left on Google gerrit, 
here is


Reviewed-by: Can Guo 


v11 -> v12
1. Fixed to return error value when HPB fails to initialize pinned 
active

region.
2. Fixed to disable HPB feature if HPB fails to allocate essential 
memory

and workqueue.
3. Fixed to change proper sub-region state when region is already 
evicted.


v10 -> v11
Add a newline at end the last line on Kconfig file.

v9 -> v10
1. Fixed 64-bit division error
2. Fixed problems commentted in Bart's review.

v8 -> v9
1. Change sysfs initialization.
2. Change reading descriptor during HPB initialization
3. Fixed problems commentted in Bart's review.
4. Change base commit from 5.9/scsi-queue to 5.10/scsi-queue.

v7 -> v8
Remove wrongly added tags.

v6 -> v7
1. Remove UFS feature layer.
2. Cleanup for sparse error.

v5 -> v6
Change base commit to b53293fa662e28ae0cdd40828dc641c09f133405

v4 -> v5
Delete unused macro define.

v3 -> v4
1. Cleanup.

v2 -> v3
1. Add checking input module parameter value.
2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
3. Cleanup for unused variables and label.

v1 -> v2
1. Change the full boilerplate text to SPDX style.
2. Adopt dynamic allocation for sub-region data structure.
3. Cleanup.

NAND flash memory-based storage devices use Flash Translation Layer 
(FTL)

to translate logical addresses of I/O requests to corresponding flash
memory addresses. Mobile storage devices typically have RAM with
constrained size, thus lack in memory to keep the whole mapping table.
Therefore, mapping tables are partially retrieved from NAND flash on
demand, causing random-read performance degradation.

To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
(Host Performance Booster) which uses host system memory as a cache for 
the

FTL mapping table. By using HPB, FTL data can be read from host memory
faster than from NAND flash memory.

The current version only supports the DCM (device control mode).
This patch consists of 3 parts to support HPB feature.

1) HPB probe and initialization process
2) READ -> HPB READ using cached map information
3) L2P (logical to physical) map management

In the HPB probe and init process, the device information of the UFS is
queried. After checking supported features, the data structure for the 
HPB

is initialized according to the device information.

A read I/O in the active sub-region where the map is cached is changed 
to

HPB READ by the HPB.

The HPB manages the L2P map using information received from the
device. For active sub-region, the HPB caches through ufshpb_map
request. For the in-active region, the HPB discards the L2P map.
When a write I/O occurs in an active sub-region area, associated dirty
bitmap checked as dirty for preventing stale read.

HPB is shown to have a performance improvement of 58 - 67% for random 
read

workload. [1]

This series patches are based on the 5.11/scsi-queue branch.

[1]:
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong

Daejun park (3):
 scsi: ufs: Introduce HPB feature
 scsi: ufs: L2P map management for HPB read
 scsi: ufs: Prepare HPB read for cached sub-region

 drivers/scsi/ufs/Kconfig |9 +
 drivers/scsi/ufs/Makefile|1 +
 drivers/scsi/ufs/ufs-sysfs.c |   18 +
 drivers/scsi/ufs/ufs.h   |   49 +
 drivers/scsi/ufs/ufshcd.c|   53 ++
 drivers/scsi/ufs/ufshcd.h|   23 +-
 drivers/scsi/ufs/ufshpb.c| 1784 
+

 drivers/scsi/ufs/ufshpb.h|  230 +
 8 files changed, 2166 insertions(+), 1 deletion(-)
 created mode 100644 drivers/scsi/ufs/ufshpb.c
 created mode 100644 drivers/scsi/ufs/ufshpb.h


[PATCH v13 0/3] scsi: ufs: Add Host Performance Booster Support

2020-11-02 Thread Daejun Park
Changelog:

v12 -> v13
1. Cleanup codes by comments from Can Guo.
2. Add HPB related descriptor/flag/attributes in sysfs.
3. Change base commit from 5.10/scsi-queue to 5.11/scsi-queue.

v11 -> v12
1. Fixed to return error value when HPB fails to initialize pinned active 
region.
2. Fixed to disable HPB feature if HPB fails to allocate essential memory
and workqueue.
3. Fixed to change proper sub-region state when region is already evicted.

v10 -> v11
Add a newline at end the last line on Kconfig file.

v9 -> v10
1. Fixed 64-bit division error
2. Fixed problems commentted in Bart's review.

v8 -> v9
1. Change sysfs initialization.
2. Change reading descriptor during HPB initialization
3. Fixed problems commentted in Bart's review.
4. Change base commit from 5.9/scsi-queue to 5.10/scsi-queue.

v7 -> v8
Remove wrongly added tags.

v6 -> v7
1. Remove UFS feature layer.
2. Cleanup for sparse error.

v5 -> v6
Change base commit to b53293fa662e28ae0cdd40828dc641c09f133405

v4 -> v5
Delete unused macro define.

v3 -> v4
1. Cleanup.

v2 -> v3
1. Add checking input module parameter value.
2. Change base commit from 5.8/scsi-queue to 5.9/scsi-queue.
3. Cleanup for unused variables and label.

v1 -> v2
1. Change the full boilerplate text to SPDX style.
2. Adopt dynamic allocation for sub-region data structure.
3. Cleanup.

NAND flash memory-based storage devices use Flash Translation Layer (FTL)
to translate logical addresses of I/O requests to corresponding flash
memory addresses. Mobile storage devices typically have RAM with
constrained size, thus lack in memory to keep the whole mapping table.
Therefore, mapping tables are partially retrieved from NAND flash on
demand, causing random-read performance degradation.

To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB
(Host Performance Booster) which uses host system memory as a cache for the
FTL mapping table. By using HPB, FTL data can be read from host memory
faster than from NAND flash memory. 

The current version only supports the DCM (device control mode).
This patch consists of 3 parts to support HPB feature.

1) HPB probe and initialization process
2) READ -> HPB READ using cached map information
3) L2P (logical to physical) map management

In the HPB probe and init process, the device information of the UFS is
queried. After checking supported features, the data structure for the HPB
is initialized according to the device information.

A read I/O in the active sub-region where the map is cached is changed to
HPB READ by the HPB.

The HPB manages the L2P map using information received from the
device. For active sub-region, the HPB caches through ufshpb_map
request. For the in-active region, the HPB discards the L2P map.
When a write I/O occurs in an active sub-region area, associated dirty
bitmap checked as dirty for preventing stale read.

HPB is shown to have a performance improvement of 58 - 67% for random read
workload. [1]

This series patches are based on the 5.11/scsi-queue branch.

[1]:
https://www.usenix.org/conference/hotstorage17/program/presentation/jeong

Daejun park (3):
 scsi: ufs: Introduce HPB feature
 scsi: ufs: L2P map management for HPB read
 scsi: ufs: Prepare HPB read for cached sub-region
 
 drivers/scsi/ufs/Kconfig |9 +
 drivers/scsi/ufs/Makefile|1 +
 drivers/scsi/ufs/ufs-sysfs.c |   18 +
 drivers/scsi/ufs/ufs.h   |   49 +
 drivers/scsi/ufs/ufshcd.c|   53 ++
 drivers/scsi/ufs/ufshcd.h|   23 +-
 drivers/scsi/ufs/ufshpb.c| 1784 +
 drivers/scsi/ufs/ufshpb.h|  230 +
 8 files changed, 2166 insertions(+), 1 deletion(-)
 created mode 100644 drivers/scsi/ufs/ufshpb.c
 created mode 100644 drivers/scsi/ufs/ufshpb.h