Re: Another approach of UFSHPB

2020-05-27 Thread Bean Huo
On Wed, 2020-05-27 at 18:11 +0900, Daejun Park wrote:
> > > > I do not agree that the bus model is the best choice for
> > > > freeing cache
> > > > memory if it is no longer needed. A shrinker is probably a much
> > > > better
> > > > choice because the callback functions in a shrinker get invoked
> > > > when a
> > > > system is under memory pressure. See also register_shrinker(),
> > > > unregister_shrinker() and struct shrinker in
> > > > include/linux/shrinker.h.
> > > 
> > > Since this discussion is closely related to cache allocation,
> > > What is your opinion about allocating the pages dynamically as
> > > the regions
> > > Are being activated/deactivated, in oppose of how it is done
> > > today - 
> > > Statically on init for the entire max-active-subregions?
> > Memory that is statically allocated cannot be used for any other
> > purpose
> > (e.g. page cache) without triggering the associated shrinker. As
> > far as
> > I know shrinkers are only triggered when (close to) out of memory.
> > So
> > dynamically allocating memory as needed is probably a better
> > strategy
> > than statically allocating the entire region at initialization
> > time.
> 
> To improve UFS device performance using the HPB driver, 
> the number of active-subregions above a certain threshold is
> essential.
> If the number of active-subregions is lower than the threshold, 
> the performance improvement by using HPB will be reduced.

Right, but for the embedded system, it is different to mobile usage
case, there should be a maximum limit in the HPB driver, exactly
how much MB of HPB cache, should let the user choose.


> Also, due to frequent and active/inactive protocol overhead, 
> performance may be worse than when the HPB feature is not used.
> 
> Therefore, it is better to unbind/unload HPB driver than 
> to reduce the number of active subregions below the threshold. 
> We designed the HPB driver to make the UFS device work 
> even when the module is unloaded.
>

Actually, along with the HPB running, from point of the HPB cache
consumption, no big difference between dynamical HPB page allocation
and statical allocation. on the contrary, dynamically HPB page
allocation will increase the latency in the HPB entries loading path.

//Bean



Re: Another approach of UFSHPB

2020-05-27 Thread Daejun Park
On 2020-05-26 Bart Van Assche wrote:
>On 2020-05-25 23:15, Avri Altman wrote:
>>> On 2020-05-24 22:40, Daejun Park wrote:
 The HPB driver is close to the UFS core function, but it is not essential
 for operating UFS device. With reference to this article
 (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
 as bus model. Because the HPB driver consumes the user's main memory, it
>>> should
 support bind / unbind functionality as needed. We implemented the HPB
>>> driver
 can be unbind / unload on runtime.
>>>
>>> I do not agree that the bus model is the best choice for freeing cache
>>> memory if it is no longer needed. A shrinker is probably a much better
>>> choice because the callback functions in a shrinker get invoked when a
>>> system is under memory pressure. See also register_shrinker(),
>>> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
>>
>> Since this discussion is closely related to cache allocation,
>> What is your opinion about allocating the pages dynamically as the regions
>> Are being activated/deactivated, in oppose of how it is done today - 
>> Statically on init for the entire max-active-subregions?

> Memory that is statically allocated cannot be used for any other purpose
> (e.g. page cache) without triggering the associated shrinker. As far as
> I know shrinkers are only triggered when (close to) out of memory. So
> dynamically allocating memory as needed is probably a better strategy
> than statically allocating the entire region at initialization time.

To improve UFS device performance using the HPB driver, 
the number of active-subregions above a certain threshold is essential.
If the number of active-subregions is lower than the threshold, 
the performance improvement by using HPB will be reduced. 
Also, due to frequent and active/inactive protocol overhead, 
performance may be worse than when the HPB feature is not used.

Therefore, it is better to unbind/unload HPB driver than 
to reduce the number of active subregions below the threshold. 
We designed the HPB driver to make the UFS device work 
even when the module is unloaded.

Thanks,

Daejun


Re: Another approach of UFSHPB

2020-05-26 Thread Bart Van Assche
On 2020-05-25 23:15, Avri Altman wrote:
>> On 2020-05-24 22:40, Daejun Park wrote:
>>> The HPB driver is close to the UFS core function, but it is not essential
>>> for operating UFS device. With reference to this article
>>> (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
>>> as bus model. Because the HPB driver consumes the user's main memory, it
>> should
>>> support bind / unbind functionality as needed. We implemented the HPB
>> driver
>>> can be unbind / unload on runtime.
>>
>> I do not agree that the bus model is the best choice for freeing cache
>> memory if it is no longer needed. A shrinker is probably a much better
>> choice because the callback functions in a shrinker get invoked when a
>> system is under memory pressure. See also register_shrinker(),
>> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
>
> Since this discussion is closely related to cache allocation,
> What is your opinion about allocating the pages dynamically as the regions
> Are being activated/deactivated, in oppose of how it is done today - 
> Statically on init for the entire max-active-subregions?

Memory that is statically allocated cannot be used for any other purpose
(e.g. page cache) without triggering the associated shrinker. As far as
I know shrinkers are only triggered when (close to) out of memory. So
dynamically allocating memory as needed is probably a better strategy
than statically allocating the entire region at initialization time.

Thanks,

Bart.





RE: Another approach of UFSHPB

2020-05-25 Thread Avri Altman
> On 2020-05-24 22:40, Daejun Park wrote:
> > The HPB driver is close to the UFS core function, but it is not essential
> > for operating UFS device. With reference to this article
> > (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
> > as bus model. Because the HPB driver consumes the user's main memory, it
> should
> > support bind / unbind functionality as needed. We implemented the HPB
> driver
> > can be unbind / unload on runtime.
> 
> I do not agree that the bus model is the best choice for freeing cache
> memory if it is no longer needed. A shrinker is probably a much better
> choice because the callback functions in a shrinker get invoked when a
> system is under memory pressure. See also register_shrinker(),
> unregister_shrinker() and struct shrinker in include/linux/shrinker.h.
Since this discussion is closely related to cache allocation,
What is your opinion about allocating the pages dynamically as the regions
Are being activated/deactivated, in oppose of how it is done today - 
Statically on init for the entire max-active-subregions?


Re: Another approach of UFSHPB

2020-05-25 Thread Bart Van Assche
On 2020-05-24 22:40, Daejun Park wrote:
> The HPB driver is close to the UFS core function, but it is not essential
> for operating UFS device. With reference to this article
> (https://lwn.net/Articles/645810/), we implemented extended UFS-feature
> as bus model. Because the HPB driver consumes the user's main memory, it 
> should
> support bind / unbind functionality as needed. We implemented the HPB driver 
> can be unbind / unload on runtime.

I do not agree that the bus model is the best choice for freeing cache
memory if it is no longer needed. A shrinker is probably a much better
choice because the callback functions in a shrinker get invoked when a
system is under memory pressure. See also register_shrinker(),
unregister_shrinker() and struct shrinker in include/linux/shrinker.h.

>> Should these parameters be per module or per UFS device?
> 
> I think it is necessary to take parameters for each module. 
> This is because each extended UFS-feature module has different functions
> and may require different parameters.

My question was a rhetorical question. Please choose per device
parameters when appropriate instead of module parameters.

Thanks,

Bart.


Re: Another approach of UFSHPB

2020-05-24 Thread Daejun Park
I am Daejun Park who is working to patch HPB driver.
Thank you for your comment, and the following is our answer.

> Splitting the UFS driver into multiple modules would be great if the
> interface between these modules can be kept small and elegant. However,
> I'm not sure that this approach should be based on Linux device driver
> bus concept. Devices can be unbound at any time from their driver by
> writing into the "unbind" sysfs attribute. I don't think we want the UFS
> core functionality ever to be unbound while any other UFS functionality
> is still active. Has it been considered to implement each feature as a
> loadable module without relying on the bus model? The existing kernel
> module infrastructure already prevents to unload modules (e.g. the UFS
> core) that are in use by a kernel module that depends on it (e.g. UFS HPB).

The HPB driver is close to the UFS core function, but it is not essential
for operating UFS device. With reference to this article
(https://lwn.net/Articles/645810/), we implemented extended UFS-feature
as bus model. Because the HPB driver consumes the user's main memory, it should
support bind / unbind functionality as needed. We implemented the HPB driver 
can be unbind / unload on runtime.

> What will happen if a feature module is unloaded (e.g. HPB) while I/O is
> ongoing that relies on HPB?

In the HPB driver, it checks whether the HPB can be unload / unbind through
the reference counter. Therefore, there is no problem that the driver is 
unloaded / unbind when there is I/O related to HPB.

> Should these parameters be per module or per UFS device?

I think it is necessary to take parameters for each module. 
This is because each extended UFS-feature module has different functions
and may require different parameters.

Thanks,

Daejun Park.


Re: Another approach of UFSHPB

2020-05-22 Thread Bart Van Assche
On 2020-05-19 15:31, yongmyung lee wrote:
> Currently, UFS driver (usually ufshcd.c) has become bulky and complex.
> So, I would like to split these codes into layers 
> like the works of Bean Huo and Avril Altman.
> Especially, I suggest the UFS-Feature Driver model based on Linux Bus-Driver 
> Model,
> which is suitable to manage all Extended UFS-Feature drivers like the Figure 
> as below:
> 
> 
> UFS Driver data structure (struct ufs_hba)
>|
>|----- ufshpb driver -- <- attach ufshpb 
> device driver (it can be loadable)
>|---| ufs-ext feature layer |   -- ufs-wb driver -- <- attach ufswb device 
> driver
>|   |   |   -- ...   -- <- ...
>|----- next ufs feature driver  -- <- attach 
> ufs-next feature driver
> 
> * wb : write-booster

Splitting the UFS driver into multiple modules would be great if the
interface between these modules can be kept small and elegant. However,
I'm not sure that this approach should be based on Linux device driver
bus concept. Devices can be unbound at any time from their driver by
writing into the "unbind" sysfs attribute. I don't think we want the UFS
core functionality ever to be unbound while any other UFS functionality
is still active. Has it been considered to implement each feature as a
loadable module without relying on the bus model? The existing kernel
module infrastructure already prevents to unload modules (e.g. the UFS
core) that are in use by a kernel module that depends on it (e.g. UFS HPB).

> Furthermore, each ufs-ext feature driver will be written as a loadable kernel 
> module.
> Vendors (e.g., Android Phone manufacturer) could optionally load and remove 
> each module.

What will happen if a feature module is unloaded (e.g. HPB) while I/O is
ongoing that relies on HPB?

> Also they can customize the parameters of ufs-ext feature drivers
> while each module is being loaded.
> (For example, vendor would set the maximum memory size
>  that can be reclaimed in the Host Control mode in HPB)

Should these parameters be per module or per UFS device?

> In addition, we plan to provide QEMU with UFS-simulator
> for a test environment for UFS driver development.

A UFS simulator for QEMU support would definitely be welcome.

Thanks,

Bart.


Re: Another approach of UFSHPB

2020-05-22 Thread Bart Van Assche
On 2020-05-20 14:19, Bart Van Assche wrote:
> On 2020-05-20 10:55, Christoph Hellwig wrote:
>> HPB is a completely fucked up concept and we shoud not merge it at all.
>> Especially not with a crazy bullshit vendor extension layer that makes
>> it even easier for vendors to implement even worse things than the
>> already horrible spec says.  Just stop this crap and implement sane
>> interfaces for the next generation hardware instead of wasting your
>> time on this idiotic idea.
> 
> What exactly is it that you are not happy about? Is it the concept of
> using host memory to store L2P translation information or how that
> concept has been translated into SCSI commands (HPB READ BUFFER, HPB
> READ and HPB WRITE BUFFER)?
> 
> In the former case: aren't Open-Channel SSDs another example of storage
> devices for which the L2P translation tables are maintained in host
> memory? Didn't the driver for Fusion-io SSDs also maintain the L2P
> mapping in host memory?
> 
> Do you agree that HPB UFS storage devices are already being used widely
> and hence that not accepting this functionality in the upstream kernel
> will force users of HPB devices to maintain HPB code outside the kernel
> tree? Isn't one of the goals of the Linux kernel project to increase its
> user base?

The following quote from
https://www.anandtech.com/show/13474/the-google-pixel-3-review/2 is
interesting: "Another big improvement for file I/O is the implementation
of “Host Performance Booster” in the kernel and UFS controller firmware
stack. HPB is essentially caching of the NAND chip’s FTL (flash
translation layer) L2P (logical to physical) mapping tables into the
hosts (SoCs) main memory. This allows the host driver to look up the
target L2P entry directly without betting on UFS’s limited SRAM to have
a cache-hit, reducing latency and greatly increasing random read
performance. The authors of the feature showcase an improvement of
59-67% in random I/O read performance due to the new feature. It’s worth
to mention that traditional Android I/O benchmarks won’t be able to show
this as as those tend to test read speeds with the files they’ve just
created."

Given the cost of SRAM in embedded controllers I think there is a strong
incentive for manufacturers of flash storage devices to reduce the
amount of SRAM on the storage controller. I think this means that
proposals to use host memory for caching L2P mappings will keep popping
up, no matter what we tell the storage controller vendors about what we
think about such a design.

Bart.


Re: Another approach of UFSHPB

2020-05-20 Thread Bart Van Assche
On 2020-05-20 10:55, Christoph Hellwig wrote:
> HPB is a completely fucked up concept and we shoud not merge it at all.
> Especially not with a crazy bullshit vendor extension layer that makes
> it even easier for vendors to implement even worse things than the
> already horrible spec says.  Just stop this crap and implement sane
> interfaces for the next generation hardware instead of wasting your
> time on this idiotic idea.

Hi Christoph,

What exactly is it that you are not happy about? Is it the concept of
using host memory to store L2P translation information or how that
concept has been translated into SCSI commands (HPB READ BUFFER, HPB
READ and HPB WRITE BUFFER)?

In the former case: aren't Open-Channel SSDs another example of storage
devices for which the L2P translation tables are maintained in host
memory? Didn't the driver for Fusion-io SSDs also maintain the L2P
mapping in host memory?

Do you agree that HPB UFS storage devices are already being used widely
and hence that not accepting this functionality in the upstream kernel
will force users of HPB devices to maintain HPB code outside the kernel
tree? Isn't one of the goals of the Linux kernel project to increase its
user base?

Bart.




Re: Another approach of UFSHPB

2020-05-20 Thread Christoph Hellwig
Serious,

HPB is a completely fucked up concept and we shoud not merge it at all.
Especially not with a crazy bullshit vendor extension layer that makes
it even easier for vendors to implement even worse things than the
already horrible spec says.  Just stop this crap and implement sane
interfaces for the next generation hardware instead of wasting your
time on this idiotic idea.


Another approach of UFSHPB

2020-05-19 Thread yongmyung lee
Hello, All!

I write this email that is my opinion regard to HPB.
This mail is very long. So, summary is ... :

[Summary]
1. HPB, Write Booster and other new UFS features would 
   be managed by ufs-extended feature layer (additional LLD layer)
2. Map table of HPB will be managed in LLD (UFS Driver)
3. Each ufs feature driver can be implemented as a loadable kernel module
4. Samsung prepares test environment based on QEMU also
5. We will push our patch as soon! Please review this.


This is body: 

I am Yongmyung Lee, who is one of the original authors of the HPB driver,
which was out-of-tree implementation from linux kernel mainline.

I have seen the works done by Bean Hou and Avri Altman, which are very 
impressive.
I also have some opinion on the driver 
and would like to consult with you about it and prepare a set of patches.

The basic idea of my approach is similar to Bean's work
of managing HPB map-data in LLD (UFS Driver).
As HPB Driver is closely related to the behavior of the
UFS Device, I think it would be nice if HPB Driver runs on the LLD layer.

HPB Driver is closely related to the behavior of the UFS Device. 
Thus I think it would be nice if HPB Driver works on additional feature layer 
on LLD. 

Of course, Avri's work makes sense.
I believe HPB's scheme can be a good alternative to HMB(Host Memory Buffer)of 
NVMe
for storage devices like eMMC or SATA SSD (may be DRAMless).
Once HPB-like scheme is applied, 
these storage devices could benefit from performance improvement
in read latency as well as throughput.

However, it would not be desirable to relocate 
the map table management to the upper layer such as scsi-dh
as there are no such specification supported for the moment.

When the similar specifications to HPB will be proposed in JEDEC or T13,
it would be a great chance to have a deep discussion on 
which layer to be the best fit for the map table management.

Before disclosing my opinion on the HPB driver development approach,
I would like to introduce the main characteristics of Extended UFS-Feature in 
progress: 

  * Providing extended functionality.
That is, UFS device shall work well without extended UFS-Features.
By using them, however, it will be more powerful in certain application 
scenarios.

  * Designed to suit the Embedded system (such as Android)
Due to the characteristics of UFS device, 
it is being utilized mainly as a mobile storage.
We are mainly trying to improve performance in Android system. 

  * Closely connected to the UFS devices.
The Extended UFS-Features use UFS query requests
which are UFS private commands.
In addition, other features are closely related with
electrical and physical characteristics of UFS
   (such as Voltage, Gear-scaling and Runtime suspend/resume etc.,)
and should be taken into consideration
 
Therefore, it is desirable to have those features managed 
separately from the original UFS driver.

Current extended UFS-Features which are approved by the JEDEC are as below:

  * UFS HPB
  * UFS WriteBooster

In addition to that, we are developing and testing various UFS-features
that would be beneficial in the Embedded system 
(At the moment, most of them would be Android system).
As we have already successfully provided enhanced features 
in cooperation with major Android Phone vendors,
now we willingly want to contribute our works to the Linux kernel mainline.
 

Currently, UFS driver (usually ufshcd.c) has become bulky and complex.
So, I would like to split these codes into layers 
like the works of Bean Huo and Avril Altman.
Especially, I suggest the UFS-Feature Driver model based on Linux Bus-Driver 
Model,
which is suitable to manage all Extended UFS-Feature drivers like the Figure as 
below:


UFS Driver data structure (struct ufs_hba)
   |
   |----- ufshpb driver -- <- attach ufshpb device 
driver (it can be loadable)
   |---| ufs-ext feature layer |   -- ufs-wb driver -- <- attach ufswb device 
driver
   |   |   |   -- ...   -- <- ...
   |----- next ufs feature driver  -- <- attach 
ufs-next feature driver

* wb : write-booster
 

Each extended UFS-Feature Driver has a bus of ufs-ext feature type
and it is bound to the ufs-ext feature layer.
The ufs-ext feature layer manages common APIs used by each Extended UFS-Feature 
Driver.
The APIs consist of UFS Query request and 
UFS Vendor commands related to each ufs-ext feature driver. 


Furthermore, each ufs-ext feature driver will be written as a loadable kernel 
module.
Vendors (e.g., Android Phone manufacturer) could optionally load and remove 
each module.
Also they can customize the parameters of ufs-ext feature drivers
while each module is being loaded.
(For example, vendor would set the maximum memory size
 that can be reclaimed in the Host Control mode in HPB)


We are expecting that this kind of standardized and 
abstracted model will surely make it easier to m